MTP {multtest}  R Documentation 
A userlevel function to perform multiple testing procedures (MTP). A variety of t and ftests, including robust versions of each test, are implemented. Singlestep and stepdown minP and maxT methods are used to control the chosen type I error rate (FWER, gFWER, TPPFP, or FDR). Bootstrap and permutation null distributions are available. Arguments are provided for user control of output. Gene selection in microarray experiments is one application.
MTP(X, W = NULL, Y = NULL, Z = NULL, Z.incl = NULL, Z.test = NULL, na.rm = TRUE, test = "t.twosamp.unequalvar", robust = FALSE, standardize = TRUE, alternative = "two.sided", psi0 = 0, typeone = "fwer", k = 0, q = 0.1, fdr.method = "conservative", alpha = 0.05, smooth.null = FALSE, nulldist = "boot", B = 1000, method = "ss.maxT", get.cr = FALSE, get.cutoff = FALSE, get.adjp = TRUE, keep.nulldist = TRUE, seed = NULL)
X 
A matrix, data.frame or exprSet containing the raw data. In the case of an exprSet, exprs(X) is the data of interest and pData(X) may contain outcomes and covariates of interest. For currently implemented tests, one hypothesis is tested for each row of the data. 
W 
A vector or matrix containing nonnegative weights to be used in computing the test statistics. If a matrix, W must be the same dimension as X with one weight for each value in X . If a vector, W may contain one weight for each observation (i.e. column) of X or one weight for each variable (i.e. row) of X . In either case, the weights are duplicated apporpraiately. Weighted ftests are not available. Default is 'NULL'. 
Y 
A vector, factor, or Surv object containing the outcome of interest. This may be class labels (ftests and two sample ttests) or a continuous or polycotomous dependent variable (linear regression based ttests), or survival data (Cox proportional hazards based ttests). If X is an exprSet, Y can be a character string referring to the column of pData(X) to use as outcome. Default is 'NULL'. 
Z 
A vector, factor, or matrix containing covariate data to be used in the regression (linear and Cox) models. Each variable should be in one column, so that nrow(Z)=ncol(X) . If X is an exprSet, Z can be a character string referring to the column of pData(X) to use as covariates. The variables Z.incl and Z.adj allow one to specify which covariates to use in a particular test without modifying the input Z . Default is 'NULL'. 
Z.incl 
The indices of the columns of Z (i.e. which variables) to include in the model. These can be numbers or column names (if the columns are names). Default is 'NULL'. 
Z.test 
The index or names of the column of Z (i.e. which variable) to use to test for association with each row of X in a linear model. Only used for test="lm.XvsZ" , where it is necessary to specify which covariate's regression parameter is of interest. Default is 'NULL'. 
na.rm 
Logical indicating whether to remove observations with an NA. Default is 'TRUE'. 
test 
Character string specifying the test statistics to use, by default 't.twosamp.unequalvar'. See details (below) for a list of tests. 
robust 
Logical indicating whether to use the robust version of the chosen test, e.g. Wilcoxon singed rank test for robust onesample ttest or rlm instead of lm in linear models. Default is 'FALSE'. 
standardize 
Logical indicating whether to use the standardized version of the test statistics (usual tstatistics are standardized). Default is 'TRUE'. 
alternative 
Character string indicating the alternative hypotheses, by default 'two.sided'. For onesided tests, use 'less' or 'greater' for null hypotheses of 'greater than or equal' (i.e. alternative is 'less') and 'less than or equal', respectively. 
psi0 
The hypothesized null value, typically zero (default). Currently, this should be a single value, which is used for all hypotheses. 
typeone 
Character string indicating which type I error rate to control, by default familywise error rate ('fwer'). Other options include generalized familywise error rate ('gfwer'), with parameter k giving the allowed number of false positives, and tail probability of the proportion of false positives ('tppfp'), with parameter q giving the allowed proportion of false positives. The false discovery rate ('fdr') can also be conrtolled. 
k 
The allowed number of false positives for gFWER control. Default is 0 (FWER). 
q 
The allowed proportion of false positives for TPPFP control. Default is 0.1. 
fdr.method 
Character string indicating which FDR controlling method should be used when typeone="fdr" . The options are "conservative" (default) for the more conservative, general FDR controlling procedure and "restricted" for the method which requires more assumptions. 
alpha 
The target nominal type I error rate, which may be a vector of error rates. Default is 0.05. 
smooth.null 
Indicator of whether to use a kernal density estimate for the tail of the null distributon for computing raw pvalues close to zero. Only used if 'rawp' would be zero without smoothing. Default is 'FALSE'. 
nulldist 
Character string indicating which resampling method to use for estimating the joint test statistics null distribution, by default nonparametric bootstrap ('boot'). 
B 
The number of bootstrap iterations (i.e. how many resampled data sets) or the number of permutations (if nulldist is 'perm'). Can be reduced to increase the speed of computation, at a cost to precision. Default is 1000. 
method 
The multiple testing procedure to use. Options are singlestep maxT ('ss.maxT', default), singlestep minP ('ss.minP'), stepdown maxT ('sd.maxT'), and stepdown minP ('sd.minP'). 
get.cr 
Logical indicating whether to compute confidence intervals for the estimates. Not available for ftests. Default is 'FALSE'. 
get.cutoff 
Logical indicating whether to compute thresholds for the test statistics. Default is 'FALSE'. 
get.adjp 
Logical indicating whether to compute adjusted pvalues. Default is 'TRUE'. 
keep.nulldist 
Logical indicating whether to return the computed null distribution, by default 'TRUE'. Note that this matrix can be quite large. 
seed 
Integer to be used as argument to set.seed to set the seed for the random number generator for bootstrap resampling. This argument can be used to repeat exactly a test performed with a given seed. If the seed is specified via this argument, the same seed will be returned in the seed slot of the MTP object created. Else a random seed will be generated, used and returned. 
A multiple testing procedure (MTP) is defined by choices of test statistics, type I error rate, null distribution and method for error rate control. Each component is described here. See references for more detail.
Test statistics are determined by the values of test
:
Z.test
in linear models, each with a row of X as outcome, possibly adjusted by covariates Z.incl
from the matrix Z
(in the case of no covariates, one recovers the onesample tstatistic, t.onesamp
);Z.incl
from the matrix Z
;Z.incl
from the matrix Z
.
When robust=TRUE
, nonparametric versions of each test are performed. For the linear models, this means rlm
is used instead of lm
. There is not currently a robust version of test=coxph.YvsXZ
. For the t and ftests, data values are simply replaced by their ranks. This is equivalent to performing the following familiar named rankbased tests. The conversion after each test is the formula to convert from the MTP test to the statistic reported by the listed R function (where num is the numerator of the MTP test statistics, n is total sample size, nk is group k sample size, K is total number of groups or treatments, and rk are the ranks in group k).
wilcox.test
with y=NULL
or paired=TRUE
, wilcox.test
, kruskal.test
, friedman.test
, The implemented MTPs are based on control of the familywise error rate, defined as the probability of any false positives. Let Vn denote the (unobserved) number of false positives. Then, control of FWER at level alpha means that Pr(Vn>0)<=alpha. The set of rejected hypotheses under a FWER controlling procedure can be augmented to increase the number of rejections, while controlling other error rates. The generalized familywise error rate is defined as Pr(Vn>k)<=alpha, and it is clear that one can simply take an FWER controlling procedure, reject k more hypotheses and have control of gFWER at level alpha. The tail probability of the proportion of false positives depends on both the number of false postives (Vn) and the number of rejections (Rn). Control of TPPFP at level alpha means Pr(Vn/Rn>q)<=alpha, for some proportion q. Control of the false discovery rate refers to the expected proportion of false positives (rather than a tail probability). Control of FDR at level alpha means E(Vn/Rn)<=alpha.
In practice, one must choose a method for estimating the test statistics null distribution. We have implemented an ordinary nonparametric bootstrap estimator and a permutation estimator (which makes sense in certain settings, see references). The nonparametric bootstrap estimator (default) provides asymptotic control of the type I error rate for any data generating distribution, whereas the permutation estimator requires the subset pivotality assumption. One draw back of both methods is the discreteness of the estimated null distribution when the sample size is small. Furthermore, when the sample size is small enough, it is possible that ties will lead to a very small variance estimate. Using standardize=FALSE
allows one to avoid these unusually small test statistic denominators. Parametric bootstrap estimators are another option (not yet implemented).
Given observed test statistics, a type I error rate (with nominal level), and a test statistics null distribution, MTPs provide adjusted pvalues, cutoffs for test statistics, and possibly confidence regions for estimates. Four methods are implemented, based on minima of pvalues and maxima of test statistics. Only the step down methods are currently available with the permutation null distribution.
An object of class MTP
, with the following slots:

Object of class numeric , observed test statistics for each hypothesis, specified by the values of the MTP arguments test , robust , standardize , and psi0 . 

For the test of singleparameter null hypotheses using tstatistics (i.e., not the Ftests), the numeric vector of estimated parameters corresponding to each hypothesis, e.g. means, differences in means, regression parameters. 

Object of class numeric , number of columns (i.e. observations) in the input data set. 

Object of class numeric , unadjusted, marginal pvalues for each hypothesis. 

Object of class numeric , adjusted (for multiple testing) pvalues for each hypothesis (computed only if the get.adjp argument is TRUE). 

For the test of singleparameter null hypotheses using tstatistics (i.e., not the Ftests), the numeric array of lower and upper simultaneous confidence limits for the parameter vector, for each value of the nominal Type I error rate alpha (computed only if the get.cr argument is TRUE). 

The numeric matrix of cutoffs for the vector of test statistics for each value of the nominal Type I error rate alpha (computed only if the get.cutoff argument is TRUE). 

Object of class "matrix" , rejection indicators (TRUE for a rejected null hypothesis), for each value of the nominal Type I error rate alpha . 

The numeric matrix for the estimated test statistics null distribution (returned only if keep.nulldist=TRUE ; option not currently available for permutation null distribution, i.e., nulldist="perm" ). By default (i.e., for nulldist="boot" ), the entries of nulldist are the null value shifted and scaled bootstrap test statistics, with one null test statistic value for each hypothesis (rows) and bootstrap iteration (columns). 

Object of class call , the call to the MTP function. 

An integer for specifying the state of the random number generator used to create the resampled datasets. The seed can be reused for reproducibility in a repeat call to MTP . This argument is currently used only for the bootstrap null distribution (i.e., for nulldist="boot" ). See ? set.seed for details. 
Katherine S. Pollard, http://lowelab.ucsc.edu/katie/
with design contributions from Sandrine Dudoit and Mark J. van der Laan.
M.J. van der Laan, S. Dudoit, K.S. Pollard (2004), Augmentation Procedures for Control of the Generalized FamilyWise Error Rate and Tail Probabilities for the Proportion of False Positives, Statistical Applications in Genetics and Molecular Biology, 3(1). http://www.bepress.com/sagmb/vol3/iss1/art15/
M.J. van der Laan, S. Dudoit, K.S. Pollard (2004), Multiple Testing. Part II. StepDown Procedures for Control of the FamilyWise Error Rate, Statistical Applications in Genetics and Molecular Biology, 3(1). http://www.bepress.com/sagmb/vol3/iss1/art14/
S. Dudoit, M.J. van der Laan, K.S. Pollard (2004), Multiple Testing. Part I. SingleStep Procedures for Control of General Type I Error Rates, Statistical Applications in Genetics and Molecular Biology, 3(1). http://www.bepress.com/sagmb/vol3/iss1/art13/
Katherine S. Pollard and Mark J. van der Laan, "Resamplingbased Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data" (June 24, 2003). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 121. http://www.bepress.com/ucbbiostat/paper121
MTPclass
, MTPmethods
, mt.minP
, mt.maxT
, ss.maxT
, fwer2gfwer
#data set.seed(99) data<matrix(rnorm(90),nr=9) group<c(rep(1,5),rep(0,5)) #fwer control with bootstrap null distribution (B=100 for speed) m1<MTP(X=data,Y=group,alternative="less",B=100,method="sd.minP") print(m1) summary(m1) par(mfrow=c(2,2)) plot(m1,top=9)