sam.wilc {siggenes}R Documentation

SAM Analysis Using Wilcoxon Rank Statistics


Performs a SAM (Significance Analysis of Microarrays) analysis using standardized Wilcoxon rank statistics. In the two class unpaired analysis, the standardized Wilcoxon rank sum statistic is computed, while in the one class analysis and in the the two class paired analysis, the standardized Wilcoxon signed rank statistic is used as expression score.


   sam.wilc(data, cl, delta = NULL, = 10, p0 = NA,
       lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL,
       gene.names = dimnames(data)[[1]], q.version = 1, R.fold = 1,
       R.unlog = TRUE, na.replace = TRUE, na.method = "mean", approx50 = TRUE,
       check.ties = FALSE, rand = NA)


data a matrix, data frame or exprSet object. Each row of data (or exprs(data), respectively) must correspond to a gene, and each column to a sample
cl a numeric vector of length ncol(data) containing the class labels of the samples. In the two class paired case, cl can also be a matrix with ncol(data) rows and 2 columns. If data is a exprSet object, cl can also be a character string. For details on how cl should be specified, see ?sam
delta a numeric vector specifying a set of values for the threshold Delta that should be used. If NULL, Delta values will be computed automatically a numeric value specifying the number of Delta values that will be computed over the range of possible values of Delta if delta is not specified
p0 a numeric value specifying the prior probability pi0 that a gene is not differentially expressed. If NA, p0 will be computed by the function pi0.est
lambda a numeric vector or value specifying the lambda values used in the estimation of the prior probability. For details, see ?pi0.est
ncs.value a character string. Only used if lambda is a vector. Either "max" or "paper". For details, see ?pi0.est
ncs.weights a numerical vector of the same length as lambda containing the weights used in the estimation of pi0. By default no weights are used. For details, see ?pi0.est
gene.names a character vector of length nrow(data) containing the names of the genes. By default the row names of data are used
q.version a numeric value indicating which version of the q-value should be computed. If q.version=2, the original version of the q-value, i.e. min{pFDR}, will be computed. If q.version=1, min{FDR} will be used in the calculation of the q-value. Otherwise, the q-value is not computed. For details, see ?
R.fold a numeric value. If the fold change of a gene is smaller than or equal to R.fold, or larger than or equal to 1/R.fold,respectively, then this gene will be excluded from the SAM analysis. The expression score d of excluded genes is set to NA. By default, R.fold is set to 1 such that all genes are included in the SAM analysis. Setting R.fold to 0 or a negative value will avoid the computation of the fold change. The fold change is only computed in the two-class cases
R.unlog if TRUE, the anti-log of data will be used in the computation of the fold change. Otherwise, data is used. This transformation should be done if data is log2-tranformed (in a SAM analysis it is highly recommended to use log2-transformed expression data)
na.replace if TRUE, missing values will be removed by the genewise/rowwise statistic specified by na.method. If a gene has less than 2 non-missing values, this gene will be excluded from further analysis. If na.replace=FALSE, all genes with one or more missing values will be excluded from further analysis. The expression score d of excluded genes is set to NA
na.method a character string naming the statistic with which missing values will be replaced if na.replace=TRUE. Must be either "mean" (default) or median
approx50 if TRUE, the null distribution will be approximated by the standard normal distribution. Otherwise, the exact null distribution is computed. This argument will automatically be set to FALSE if there are less than 50 samples in each of the groups
check.ties if TRUE, a warning will be generated if there are ties or Zeros. This warning contains information about how many genes have ties or Zeros. Otherwise, this warning is not generated. Default is FALSE since checking for ties can take some time
rand numeric value. If specified, i.e. not NA, the random number generator will be set into a reproducible state


Standardized versions of the Wilcoxon rank statistics are computed. This means that W*=(W-mean(W))/sd(W) is used as expression score d, where W is the usual Wilcoxon rank sum statistic or Wilcoxon signed rank statistic, respectively.

In the computation of these statistics, the ranks of ties are randomly assigned. In the computation of the Wilcoxon signed rank statistic, Zeros are randomly set either to a very small positive or negative value.

If there are less than 50 observations in each of the groups, the exact null distribution will be used. If there are more than 50 observations in at least one group, the null distribution will by default be approximated by the standard normal distribution. It is, however, still possible to compute the exact null distribution by setting approx50 to FALSE.


an object of class SAM


SAM was deveoped by Tusher et al. (2001).

!!! There is a patent pending for the SAM technology at Stanford University. !!!


Holger Schwender,


Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.

See Also

SAM-class,sam,sam.dstat, sam.snp


## Not run: 
  # Load the package multtest and the data of Golub et al. (1999)
  # contained in multtest.
  # Perform a SAM analysis using Wilcoxon rank sum statistics.
  # Alternative way of performing the same analysis
## End(Not run)

[Package siggenes version 1.4.0 Index]