sam.snp {siggenes}R Documentation

SAM Analysis for Categorical Data


Performs a SAM (Significance Analysis of Microarrays) analysis for categorical data such a SNP data


  sam.snp(data, cl, B = 1000, med = FALSE, delta = NULL, = 10, 
     p0 = NA, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL,
     gene.names = dimnames(data)[[1]], q.version = 1, na.replace = TRUE, 
     check.levels = TRUE, rand = NA)


data a matrix or data frame. Each row must correspond to a SNP, and each column to a sample
cl a numeric vector of length ncol(data) indicating to which class a sample belongs. Recommended way of specifying cl is the use of the integers between 1 and g, where g is the number of different groups, or in the two-class case the use of 0's and 1's
B the number of permutations used in the estimation of the null distribution
med if FALSE (default), the mean number of falsely called SNPs will be computed. Otherwise, the median number is calculated
delta a numeric vector specifying a set of values for the threshold Delta that should be used. If NULL, Delta values will be computed automatically a numeric value specifying the number of Delta values that will be computed over the range of possible values of Delta if delta is not specified
p0 a numeric value specifying the prior probability pi0 that a SNP is not differentially expressed. If NA, p0 will be computed by the function pi0.est
lambda a numeric vector or value specifying the lambda values used in the estimation of the prior probability. For details, see ?pi0.est
ncs.value a character string. Only used if lambda is a vector. Either "max" or "paper". For details, see ?pi0.est
ncs.weights a numerical vector of the same length as lambda containing the weights used in the estimation of pi0. By default no weights are used. For details, see ?pi0.est
gene.names a character vector of length nrow(data) containing the names of the SNPs. By default the row names of data are used
q.version a numeric value indicating which version of the q-value should be computed. If q.version=2, the original version of the q-value, i.e. min{pFDR}, will be computed. If q.version=1, min{FDR} will be used in the calculation of the q-value. Otherwise, the q-value is not computed. For details, see ?
na.replace if TRUE, the missing values of a SNP will be replaced by random draws from the empirical distribution of that SNP
check.levels if TRUE, it will be checked if all variables/SNPs have the same number of levels/categories
rand numeric value. If specified, i.e. not NA, the random number generator will be set into a reproducible state


For each SNP, Pearson's Chi-Square statistic is computed to test if the distribution of the SNP differs between several groups. Since it is very likely that the assumptions for the Chi-square-approximation are not fulfilled a permutation based method is used to estimate the null distribution. Since only one null distribution is estimated for all SNPs as proposed in the original SAM procedure of Tusher et al. (2001) all SNPs must have the same number of levels/categories.


an object of class SAM


This procedure will only work correctly if all SNPs/variables have the same number of levels/categories.


SAM was deveoped by Tusher et al. (2001).

!!! There is a patent pending for the SAM technology at Stanford University. !!!


Holger Schwender,


Schwender, H. (2004). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. To appear in: Proceedings of the the 28th Annual Conference of the GfKl.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.

See Also

SAM-class,sam,sam.dstat, sam.wilc

[Package siggenes version 1.4.0 Index]