bagging {GeneTS}R Documentation

Bagged Versions of Covariance and (Partial) Correlation Matrix

Description

bagged.cov, bagged.cor, and bagged.pcor calculate the bootstrap aggregated (=bagged) versions of the covariance and (partial) covariance estimators.

Theses estimators are advantageous especially for small sample size problems. For example, the bagged correlation matrix typically remains positive definite even when the sample size is much smaller than the number of variables.

In Schaefer and Strimmer (2004) the inverse of the bagged correlation matrix is used to estimate graphical Gaussian models from sparse microarray data - see also ggm.estimate.pcor for various strategies to estimate partial correlation coefficients.

Usage

bagged.cov(x, R=1000, ...)
bagged.cor(x, R=1000, ...)
bagged.pcor(x, R=1000, ...)

Arguments

x data matrix or data frame
R number of bootstrap replicates (default: 1000)
... options passed to cov, cor, and partial.cor (e.g., to control handling of missing values)

Details

Bagging was first suggested by Breiman (1996) as a means to improve and estimator using the bootstrap. The bagged estimate is simply the mean of the bootstrap sampling distribution. Thus, bagging is essentially a variance reduction method. The bagged estimate may also be interpreted as (approximate) posterior mean estimate assuming some implicit prior.

Value

A symmetric matrix.

Author(s)

Juliane Schaefer (http://www.stat.uni-muenchen.de/~schaefer/) and Korbinian Strimmer (http://www.stat.uni-muenchen.de/~strimmer/).

References

Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.

Schaefer, J., and Strimmer, K. (2004). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics in press.

See Also

cov, cor, partial.cor, ggm.estimate.pcor, robust.boot.

Examples

# load GeneTS library
library(GeneTS)

# small example data set 
data(caulobacter)
dat <- caulobacter[,1:15]
dim(dat)

# bagged estimates
b.cov <- bagged.cov(dat)
b.cor <- bagged.cor(dat)
b.pcor <- bagged.pcor(dat)

# total squared difference
sum( (b.cov - cov(dat))^2  )
sum( (b.cor - cor(dat))^2  )
sum( (b.pcor - partial.cor(dat))^2  )

# positive definiteness of bagged correlation
is.positive.definite(cor(dat))
is.positive.definite(b.cor)

[Package GeneTS version 2.3 Index]