sam.dstat {siggenes}R Documentation

SAM Analysis Using a Modified t-statistic

Description

Performs a SAM (Significance Analysis of Microarrays) analysis as proposed by Tusher et al. (2001). It is possible to perform one and two class analyses using a modified t-statistic and a multiclass analysis using a modified F-statistic. In the two class case, either Welch's t-statistic (unequal variance) or the t-statistic assuming equal variances can be computed.

Usage

  sam.dstat(data, cl, var.equal = FALSE, B = 100, med = FALSE, s0 = NA, 
      s.alpha = seq(0, 1, 0.05), include.zero = TRUE, p0 = NA, n.subset = 10, 
      mat.samp = NULL, B.more = 0.1, B.max = 30000, lambda = seq(0, 0.95, 0.05), 
      ncs.value = "max", ncs.weights = NULL, delta = NULL, n.delta = 10, 
      gene.names = dimnames(data)[[1]], q.version = 1, R.fold = 1, R.unlog = TRUE, 
      na.replace = TRUE, na.method = "mean", rand = NA)

Arguments

data a matrix, data frame or exprSet object. Each row of data (or exprs(data), respectively) must correspond to a gene, and each column to a sample
cl a numeric vector of length ncol(data) containing the class labels of the samples. In the two class paired case, cl can also be a matrix with ncol(data) rows and 2 columns. If data is a exprSet object, cl can also be a character string. For details on how cl should be specified, see ?sam
var.equal if FALSE (default), Welch's t-statistic will be computed. If TRUE, the pooled variance will be used in the computation of the t-statistic
B numeric value indicating how many permutations should be used in the estimation of the null distribution
med if FALSE (default), the mean number of falsely called genes will be computed. Otherwise, the median number is calculated
s0 a numeric value specifying the fudge factor. If NA (default), s0 will be computed automatically
s.alpha a numeric vector or value specifying the quantiles of the standard deviations of the genes used in the computation of s0. If s.alpha is a vector, the fudge factor is computed as proposed by Tusher et al. (2001). Otherwise, the quantile of the standard deviations specified by s.alpha is used as fudge factor
include.zero if TRUE, s0=0 will also be a possible choice for the fudge factor. Hence, the usual t-statistic or F statistic, respectively, can also be a possible choice for the expression score d. If FALSE, s0=0 will not be a possible choice for the fudge factor. The latter follows Tusher et al. (2001) definition of the fudge factor in which only strictly positive values are considered
p0 a numeric value specifying the prior probability pi0 that a gene is not differentially expressed. If NA, p0 will be automatically computed by the function pi0.est
n.subset a numeric value indicating how many permutations are considered simultaneously when computing the p-value and the number of falsely called genes. If med=TRUE, n.subset will be set to 1
mat.samp a matrix having ncol(data) columns except for the two class paired case in which mat.samp has ncol(data)/2 columns. Each row specifies one permutation of the group labels used in the computation of the expected expression scores d.bar. If not specified (mat.samp=NULL), a matrix having B rows and ncol(data) is generated automatically and used in the computation of d.bar. In the two class unpaired case and the multiclass case, each row of mat.samp must contain the same group labels as cl. In the one class and the two class paired case, each row must contain -1's and 1's. In the one class case, the expression values are multiplied by these -1's and 1's. In the two class paired case, each column corresponds to one observation pair whose difference is multiplied by either -1 or 1. For more details and examples, see the manual of siggenes
B.more a numeric value. If the number of all possible permutations is smaller than or equal to (1+B.more)*B, full permutation will be done. Otherwise, B permutations are used. This avoids that B permutations will be used – and not all permutations – if the number of all possible permutations is just a little larger than B
B.max a numeric value. If the number of all possible permutations is smaller than or equal to B.max, B randomly selected permutations will be used in the computation of the null distribution. Otherwise, B random draws of the group labels are used. In the latter way of permuting it is possible that some of the permutations are used more than once
lambda a numeric vector or value specifying the lambda values used in the estimation of the prior probability. For details, see ?pi0.est
ncs.value a character string. Only used if lambda is a vector. Either "max" or "paper". For details, see ?pi0.est
ncs.weights a numerical vector of the same length as lambda containing the weights used in the estimation of pi0. By default no weights are used. For details, see ?pi0.est
delta a numeric vector specifying a set of values for the threshold Delta that should be used. If NULL, n.delta Delta values will be computed automatically
n.delta a numeric value specifying the number of Delta values that will be computed over the range of possible values of Delta if delta is not specified
gene.names a character vector of length nrow(data) containing the names of the genes. By default the row names of data are used
q.version a numeric value indicating which version of the q-value should be computed. If q.version=2, the original version of the q-value, i.e. min{pFDR}, will be computed. If q.version=1, min{FDR} will be used in the calculation of the q-value. Otherwise, the q-value is not computed. For details, see ?qvalue.cal
R.fold a numeric value. If the fold change of a gene is smaller than or equal to R.fold, or larger than or equal to 1/R.fold,respectively, then this gene will be excluded from the SAM analysis. The expression score d of excluded genes is set to NA. By default, R.fold is set to 1 such that all genes are included in the SAM analysis. Setting R.fold to 0 or a negative value will avoid the computation of the fold change. The fold change is only computed in the two-class cases
R.unlog if TRUE, the anti-log of data will be used in the computation of the fold change. Otherwise, data is used. This transformation should be done when data is log2-tranformed (in a SAM analysis it is highly recommended to use log2-transformed expression data)
na.replace if TRUE, missing values will be removed by the genewise/rowwise statistic specified by na.method. If a gene has less than 2 non-missing values, this gene will be excluded from further analysis. If na.replace=FALSE, all genes with one or more missing values will be excluded from further analysis. The expression score d of excluded genes is set to NA
na.method a character string naming the statistic with which missing values will be replaced if na.replace=TRUE. Must be either "mean" (default) or median
rand numeric value. If specified, i.e. not NA, the random number generator will be set into a reproducible state

Value

an object of class SAM

Note

SAM was deveoped by Tusher et al. (2001).

!!! There is a patent pending for the SAM technology at Stanford University. !!!

Author(s)

Holger Schwender, holger.schw@gmx.de

References

Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany. http://www.sfb475.uni-dortmund.de/berichte/tr44-03.pdf.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.

See Also

SAM-class,sam,sam.wilc, sam.snp

Examples

## Not run: 
  # Load the package multtest and the data of Golub et al. (1999)
  # contained in multtest.
  library(multtest)
  data(golub)
  
  # Perform a SAM analysis for the two class unpaired case assuming
  # unequal variances.
  sam.dstat(golub,golub.cl,B=100,rand=123)
  
  # Alternative way of performing the same SAM analysis
  sam(golub,golub.cl,B=100,rand=123)
## End(Not run)

[Package siggenes version 1.4.0 Index]