Performs a SAM (Significance Analysis of Microarrays) analysis as proposed by Tusher et al. (2001). It is possible to perform one and two class analyses using a modified t-statistic and a multiclass analysis using a modified F-statistic. In the two class case, either Welch's t-statistic (unequal variance) or the t-statistic assuming equal variances can be computed.

sam.dstat(data, cl, var.equal = FALSE, B = 100, med = FALSE, s0 = NA, s.alpha = seq(0, 1, 0.05), include.zero = TRUE, p0 = NA, n.subset = 10, mat.samp = NULL, B.more = 0.1, B.max = 30000, lambda = seq(0, 0.95, 0.05), ncs.value = "max", ncs.weights = NULL, delta = NULL, n.delta = 10, gene.names = dimnames(data)[[1]], q.version = 1, R.fold = 1, R.unlog = TRUE, na.replace = TRUE, na.method = "mean", rand = NA)

`data` |
a matrix, data frame or exprSet object. Each row of `data`
(or `exprs(data)` , respectively) must correspond to a gene, and
each column to a sample |

`cl` |
a numeric vector of length `ncol(data)` containing the class
labels of the samples. In the two class paired case, `cl` can also
be a matrix with `ncol(data)` rows and 2 columns. If `data` is
a exprSet object, `cl` can also be a character string. For details
on how `cl` should be specified, see `?sam` |

`var.equal` |
if `FALSE` (default), Welch's t-statistic will be computed.
If `TRUE` , the pooled variance will be used in the computation of
the t-statistic |

`B` |
numeric value indicating how many permutations should be used in the estimation of the null distribution |

`med` |
if `FALSE` (default), the mean number of falsely called genes
will be computed. Otherwise, the median number is calculated |

`s0` |
a numeric value specifying the fudge factor. If `NA` (default),
`s0` will be computed automatically |

`s.alpha` |
a numeric vector or value specifying the quantiles of the
standard deviations of the genes used in the computation of `s0` . If
`s.alpha` is a vector, the fudge factor is computed as proposed by
Tusher et al. (2001). Otherwise, the quantile of the standard deviations
specified by `s.alpha` is used as fudge factor |

`include.zero` |
if `TRUE` , `s0` =0 will also be a possible choice
for the fudge factor. Hence, the usual t-statistic or F statistic, respectively,
can also be a possible choice for the expression score d. If `FALSE` ,
`s0=0` will not be a possible choice for the fudge factor. The latter
follows Tusher et al. (2001) definition of the fudge factor in which only strictly
positive values are considered |

`p0` |
a numeric value specifying the prior probability pi0
that a gene is not differentially expressed. If `NA` , `p0` will
be automatically computed by the function `pi0.est` |

`n.subset` |
a numeric value indicating how many permutations are considered
simultaneously when computing the p-value and the number of falsely called
genes. If `med=TRUE` , `n.subset` will be set to 1 |

`mat.samp` |
a matrix having `ncol(data)` columns except for the two class
paired case in which `mat.samp` has `ncol(data)` /2 columns.
Each row specifies one permutation of the group labels used in the computation
of the expected expression scores d.bar. If not specified
(`mat.samp=NULL` ), a matrix having `B` rows and `ncol(data)` is
generated automatically and used in the computation of d.bar. In
the two class unpaired case and the multiclass case, each row of `mat.samp`
must contain the same group labels as `cl` . In the one class and the two
class paired case, each row must contain -1's and 1's. In the one class case,
the expression values are multiplied by these -1's and 1's. In the two class paired
case, each column corresponds to one observation pair whose difference is multiplied
by either -1 or 1. For more details and examples, see the manual of siggenes |

`B.more` |
a numeric value. If the number of all possible permutations is smaller
than or equal to (1+`B.more` )*`B` , full permutation will be done.
Otherwise, `B` permutations are used. This avoids that `B` permutations
will be used – and not all permutations – if the number of all possible permutations
is just a little larger than `B` |

`B.max` |
a numeric value. If the number of all possible permutations is smaller
than or equal to `B.max` , `B` randomly selected permutations will be used
in the computation of the null distribution. Otherwise, `B` random draws
of the group labels are used. In the latter way of permuting it is possible that
some of the permutations are used more than once |

`lambda` |
a numeric vector or value specifying the lambda
values used in the estimation of the prior probability. For details, see
`?pi0.est` |

`ncs.value` |
a character string. Only used if `lambda` is a
vector. Either `"max"` or `"paper"` . For details, see `?pi0.est` |

`ncs.weights` |
a numerical vector of the same length as `lambda`
containing the weights used in the estimation of pi0. By default
no weights are used. For details, see `?pi0.est` |

`delta` |
a numeric vector specifying a set of values for the threshold
Delta that should be used. If `NULL` , `n.delta`
Delta values will be computed automatically |

`n.delta` |
a numeric value specifying the number of Delta values
that will be computed over the range of possible values of Delta
if `delta` is not specified |

`gene.names` |
a character vector of length `nrow(data)` containing the
names of the genes. By default the row names of `data` are used |

`q.version` |
a numeric value indicating which version of the q-value should
be computed. If `q.version=2` , the original version of the q-value, i.e.
min{pFDR}, will be computed. If `q.version=1` , min{FDR} will be used
in the calculation of the q-value. Otherwise, the q-value is not computed.
For details, see `?qvalue.cal` |

`R.fold` |
a numeric value. If the fold change of a gene is smaller than or
equal to `R.fold` , or larger than or equal to 1/`R.fold` ,respectively,
then this gene will be excluded from the SAM analysis. The expression score
d of excluded genes is set to `NA` . By default, `R.fold`
is set to 1 such that all genes are included in the SAM analysis. Setting
`R.fold` to 0 or a negative value will avoid the computation of the fold
change. The fold change is only computed in the two-class cases |

`R.unlog` |
if `TRUE` , the anti-log of `data` will be used in the computation of the
fold change. Otherwise, `data` is used. This transformation should be done
when `data` is log2-tranformed (in a SAM analysis it is highly recommended
to use log2-transformed expression data) |

`na.replace` |
if `TRUE` , missing values will be removed by the genewise/rowwise
statistic specified by `na.method` . If a gene has less than 2 non-missing
values, this gene will be excluded from further analysis. If `na.replace=FALSE` ,
all genes with one or more missing values will be excluded from further analysis.
The expression score d of excluded genes is set to `NA` |

`na.method` |
a character string naming the statistic with which missing values
will be replaced if `na.replace=TRUE` . Must be either `"mean"` (default)
or `median` |

`rand` |
numeric value. If specified, i.e. not `NA` , the random number generator
will be set into a reproducible state |

an object of class SAM

SAM was deveoped by Tusher et al. (2001).

!!! There is a patent pending for the SAM technology at Stanford University. !!!

Holger Schwender, holger.schw@gmx.de

Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of
the Empirical Bayes and the Significance Analysis of Microarrays.
*Technical Report*, SFB 475, University of Dortmund, Germany.
http://www.sfb475.uni-dortmund.de/berichte/tr44-03.pdf.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays
applied to the ionizing radiation response. *PNAS*, 98, 5116-5121.

## Not run: # Load the package multtest and the data of Golub et al. (1999) # contained in multtest. library(multtest) data(golub) # Perform a SAM analysis for the two class unpaired case assuming # unequal variances. sam.dstat(golub,golub.cl,B=100,rand=123) # Alternative way of performing the same SAM analysis sam(golub,golub.cl,B=100,rand=123) ## End(Not run)

