sam.snp {siggenes} R Documentation

## SAM Analysis for Categorical Data

### Description

Performs a SAM (Significance Analysis of Microarrays) analysis for categorical data such a SNP data

### Usage

```  sam.snp(data, cl, B = 1000, med = FALSE, delta = NULL, n.delta = 10,
p0 = NA, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL,
gene.names = dimnames(data)[[1]], q.version = 1, na.replace = TRUE,
check.levels = TRUE, rand = NA)
```

### Arguments

 `data` a matrix or data frame. Each row must correspond to a SNP, and each column to a sample `cl` a numeric vector of length `ncol(data)` indicating to which class a sample belongs. Recommended way of specifying `cl` is the use of the integers between 1 and g, where g is the number of different groups, or in the two-class case the use of 0's and 1's `B` the number of permutations used in the estimation of the null distribution `med` if `FALSE` (default), the mean number of falsely called SNPs will be computed. Otherwise, the median number is calculated `delta` a numeric vector specifying a set of values for the threshold Delta that should be used. If `NULL`, `n.delta` Delta values will be computed automatically `n.delta` a numeric value specifying the number of Delta values that will be computed over the range of possible values of Delta if `delta` is not specified `p0` a numeric value specifying the prior probability pi0 that a SNP is not differentially expressed. If `NA`, `p0` will be computed by the function `pi0.est` `lambda` a numeric vector or value specifying the lambda values used in the estimation of the prior probability. For details, see `?pi0.est` `ncs.value` a character string. Only used if `lambda` is a vector. Either `"max"` or `"paper"`. For details, see `?pi0.est` `ncs.weights` a numerical vector of the same length as `lambda` containing the weights used in the estimation of pi0. By default no weights are used. For details, see `?pi0.est` `gene.names` a character vector of length `nrow(data)` containing the names of the SNPs. By default the row names of `data` are used `q.version` a numeric value indicating which version of the q-value should be computed. If `q.version=2`, the original version of the q-value, i.e. min{pFDR}, will be computed. If `q.version=1`, min{FDR} will be used in the calculation of the q-value. Otherwise, the q-value is not computed. For details, see `?qvalue.cal` `na.replace` if `TRUE`, the missing values of a SNP will be replaced by random draws from the empirical distribution of that SNP `check.levels` if `TRUE`, it will be checked if all variables/SNPs have the same number of levels/categories `rand` numeric value. If specified, i.e. not `NA`, the random number generator will be set into a reproducible state

### Details

For each SNP, Pearson's Chi-Square statistic is computed to test if the distribution of the SNP differs between several groups. Since it is very likely that the assumptions for the Chi-square-approximation are not fulfilled a permutation based method is used to estimate the null distribution. Since only one null distribution is estimated for all SNPs as proposed in the original SAM procedure of Tusher et al. (2001) all SNPs must have the same number of levels/categories.

### Value

an object of class SAM

### Warning

This procedure will only work correctly if all SNPs/variables have the same number of levels/categories.

### Note

SAM was deveoped by Tusher et al. (2001).

!!! There is a patent pending for the SAM technology at Stanford University. !!!

### Author(s)

Holger Schwender, holger.schw@gmx.de

### References

Schwender, H. (2004). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. To appear in: Proceedings of the the 28th Annual Conference of the GfKl.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.

`SAM-class`,`sam`,`sam.dstat`, `sam.wilc`