geneSetTest {limma}R Documentation

Gene Set Test


Test whether a given statistic is larger or over-represented in a given subset of genes.




selected vector specifying the elements of statistic in the test group. This can be a vector of indices, or a logical vector of the same length as statistics, or any vector such as statistic[selected] contains the statistic values for the selected group.
statistics numeric vector giving the values of the test statistic for every gene or probe in the reference set, usually every probe on the microarray.
alternative character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.
nsim number of random samples to take in computing the p-value. Not used if ranks.only=TRUE.
ranks.only logical, should the values statistics be used only to rank the genes or does it make sense to average statistics for selected sets?


This function computes a p-value to test the hypothesis that the selected genes tend to be more highly ranked on the given statistic. If it makes sense to average values of the statistic, which would be so for example if the statistic was a t-statistic, then a permutation test is conducted. In that case the function returns the proportion of nsim randomly selected groups from the set of all statistics which have mean statistic equal or more extreme than that of the test group.

If it doesn't make sense to average the values of the statistic for any reason, then only the ranks of the statistics are used and a Wilcoxon two-sample test, also known as a Mann-Whitney test, is performed.

This is essentially a stream-lined approach to Gene Set Enrichment Analysis introduced by Mootha et al (2003).

Usually, statistics is intended to hold t-like statistics, meaning that the genewise null hypotheses would be rejected for large positive or large negative values. Then alternative="greater" can be used to test whether genes in the set tend to be up-regulated, alternative="less" can be used to test whether the gene set is down-regulated, while alternative="two.sided" tests whether the gene set holds highly ranked genes without regard to direction of change. Important note: if statistics is an F-like statistic for which only large values are relevant for rejecting the null hypothesis, then you must use alternative="greater" to get meaningful results.


Numeric value giving the estimated p-value.


Gordon Smyth


Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D., Groop, L. C. (2003). PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genetics 34, 267-273.

See Also



sel <- c(2,4,5)
stat <- -9:9

[Package limma version 2.4.7 Index]