sam {siggenes} | R Documentation |

Performs a Significance Analysis of Microarrays (SAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a SAM procedure for categorical data such as SNP data.

sam(data, cl, method = "d.stat", delta = NULL, n.delta = 10, p0 = NA, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL, gene.names = dimnames(data)[[1]], q.version = 1, ...)

`data` |
a matrix, data frame or exprSet object. Each row of `data`
(or `exprs(data)` , respectively) must correspond to a gene, and
each column to a sample |

`cl` |
a numeric vector of length `ncol(data)` containing the class
labels of the samples. In the two class paired case, `cl` can also
be a matrix with `ncol(data)` rows and 2 columns. If `data` is
a exprSet object, `cl` can also be a character string naming the column
of `pData(data)` that contains the class labels of the samples.
In the one-class case, `cl` should be a vector of 1's.
In the two class unpaired case, `cl` should be a vector containing 0's
(specifying the samples of, e.g., the control group) and 1's (specifying,
e.g., the case group).
In the two class paired case, `cl` can be either a vector or a matrix.
If it is a vector, then `cl` has to consist of the integers between -1 and
-n/2 (e.g., before treatment group) and between 1 and n/2 (e.g.,
after treatment group), where n is the length of `cl` and k
is paired with -k, k=1,...,n/2. If `cl` is a matrix, one
column should contain -1's and 1's specifying, e.g., the before and the after
treatment samples, respectively, and the other column should contain integer
between 1 and n/2 specifying the n/2 pairs of observations.
In the multiclass case and if `method="cat.stat"` , `cl` should be a vector containing integers
between 1 and g, where g is the number of groups.
For examples of how `cl` can be specified, see the manual of siggenes |

`method` |
a character string specifying the method that should be used
in the computation of the expression scores d. If `method="d.stat"` ,
a modified t-statistic or F-statistic, respectively, will be computed
as proposed by Tusher et al. (2001). If `method="wilc.stat"` , a
Wilcoxon rank sum statistic or Wilcoxon signed rank statistic will be used
as expression score. For an analysis of categorical data such as SNP data,
`method` can be set to `"cat.stat"` . In this case Pearson's
Chi-squared statistic is computed for each row. It is also possible to use
a user-written function to compute the expression scores.
For details, see `Details` |

`delta` |
a numeric vector specifying a set of values for the threshold
Delta that should be used. If `NULL` , `n.delta`
Delta values will be computed automatically |

`n.delta` |
a numeric value specifying the number of Delta values
that will be computed over the range of all possible values for Delta
if `delta` is not specified |

`p0` |
a numeric value specifying the prior probability pi0
that a gene is not differentially expressed. If `NA` , `p0` will
be computed by the function `pi0.est` |

`lambda` |
a numeric vector or value specifying the lambda
values used in the estimation of the prior probability. For details, see
`?pi0.est` |

`ncs.value` |
a character string. Only used if `lambda` is a
vector. Either `"max"` or `"paper"` . For details, see `?pi0.est` |

`ncs.weights` |
a numerical vector of the same length as `lambda`
containing the weights used in the estimation of pi0. By default
no weights are used. For details, see `?pi0.est` |

`gene.names` |
a character vector of length `nrow(data)` containing the
names of the genes. By default the row names of `data` are used |

`q.version` |
a numeric value indicating which version of the q-value should
be computed. If `q.version=2` , the original version of the q-value, i.e.
min{pFDR}, will be computed. If `q.version=1` , min{FDR} will be used
in the calculation of the q-value. Otherwise, the q-value is not computed.
For details, see `?qvalue.cal` |

`...` |
further arguments of the specific SAM methods. If `method="d.stat"` ,
see `?sam.dstat` , if `method="wilc.stat"` , see `?sam.wilc` , and if
`method="cat.stat"` , see `?sam.snp` for these arguments |

`sam`

provides SAM procedures for several types of analysis (one and two class analyses
with either a modified t-statistic or a Wilcoxon rank statistic, a multiclass analysis
with a modified F statistic, and an analysis of categorical data). It is, however, also
possible to write your own function for another type of analysis. The required arguments
of this function must be `data`

and `cl`

. This function can also have other
arguments. The output of this function must be a list containing

`d`

:- a numeric vector consisting of the expression scores of the genes
`d.bar`

:- a numeric vector of the same length as
`na.exclude(d)`

specifying the expected expression scores under the null hypothesis `p.value`

:- a numeric vector of the same length as
`d`

containing the raw, unadjusted p-values of the genes `vec.false`

:- a numeric vector of the same length as
`d`

consisting of the one-sided numbers of falsely called genes, i.e. if*d>0*the numbers of genes expected to be larger than*d*under the null hypothesis, and if*d<0*, the number of genes expected to be smaller than*d*under the null hypothesis `s`

:- a numeric vector of the same length as
`d`

containing the standard deviations of the genes. If no standard deviation can be calculated, set`s=numeric(0)`

`s0`

:- a numeric value specifying the fudge factor. If no fudge factor is calculated,
set
`s0=numeric(0)`

`mat.samp`

:- a matrix with B rows and
`ncol(data)`

columns, where B is the number of permutations, containing the permutations used in the computation of the permuted d-values. If such a matrix is not computed, set`mat.samp=matrix(numeric(0))`

`msg`

:- a character string or vector containing information about, e.g., which type of analysis
has been performed.
`msg`

is printed when the function`print`

or`summary`

, respectively, is called. If no such message should be printed, set`msg=""`

`fold`

:- a numeric vector of the same length as
`d`

consisting of the fold changes of the genes. If no fold change has been computed, set`fold=numeric(0)`

If this function is, e.g., called `foo`

, it can be used by setting `method="foo"`

in `sam`

. More detailed information and an example will be contained in the siggenes
manual.

an object of class SAM

SAM was deveoped by Tusher et al. (2001).

!!! There is a patent pending for the SAM technology at Stanford University. !!!

Holger Schwender, holger.schw@gmx.de

Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of
the Empirical Bayes and the Significance Analysis of Microarrays.
*Technical Report*, SFB 475, University of Dortmund, Germany.
http://www.sfb475.uni-dortmund.de/berichte/tr44-03.pdf.

Schwender, H. (2004). Modifying Microarray Analysis Methods for
Categorical Data – SAM and PAM for SNPs. To appear in: *Proceedings
of the the 28th Annual Conference of the GfKl*.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays
applied to the ionizing radiation response. *PNAS*, 98, 5116-5121.

`SAM-class`

,`sam.dstat`

,`sam.wilc`

,
`sam.snp`

,`sam.plot2`

,`delta.plot`

## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # golub.cl contains the class labels. golub.cl # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.out<-sam(golub,golub.cl,B=100,rand=123) sam.out # Obtain the Delta plots for the default set of Deltas plot(sam.out) # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2 plot(sam.out,seq(0.2,0.4,2)) # Obtain the SAM plot for Delta = 2 plot(sam.out,2) # Get information about the genes called significant using # Delta = 3 (since neither the gene names nor the chip type # has been specified ll is set to FALSE to avoid a warning) sam.sum3<-summary(sam.out,3,ll=FALSE) # Obtain the rows of golub containing the genes called # differentially expressed sam.sum3@row.sig.genes # and their names golub.gnames[sam.sum3@row.sig.genes,3] # The matrix containing the d-values, q-values etc. of the # differentially expressed genes can be obtained by sam.out@mat.sig # Perform a SAM analysis using Wilcoxon rank sums sam(golub,golub.cl,method="wilc.stat",rand=123) # Now consider only the first ten columns of the Golub et al. (1999) # data set. For now, let's assume the first five columns were # before treatment measurements and the next five columns were # after treatment measurements, where column 1 and 6, column 2 # and 7, ..., build a pair. In this case, the class labels # would be new.cl<-c(-(1:5),1:5) new.cl # and the corresponding SAM analysis for the two-class paired # case would be sam(golub[,1:10],new.cl,B=100,rand=123) # Another way of specifying the class labels for the above paired # analysis is mat.cl<-matrix(c(rep(c(-1,1),e=5),rep(1:5,2)),10) mat.cl # and the above SAM analysis can also be done by sam(golub[,1:10],mat.cl,B=100,rand=123) ## End(Not run)

[Package *siggenes* version 1.4.0 Index]