scanone {qtl}R Documentation

Genome scan with a single QTL model

Description

Genome scan with a single QTL model, with possible allowance for covariates, using any of several possible models for the phenotype and any of several possible numerical methods.

Usage

scanone(cross, chr, pheno.col=1, model=c("normal","binary","2part","np"),
        method=c("em","imp","hk","mr","mr-imp","mr-argmax"),
        addcovar=NULL, intcovar=NULL, weights=NULL,
        upper=FALSE, ties.random=FALSE, start=NULL, maxit=4000,
        tol=1e-4, n.perm, trace=TRUE)

Arguments

cross An object of class cross. See read.cross for details.
chr Vector indicating the chromosomes for which LOD scores should be calculated.
pheno.col Column number in the phenotype matrix which should be used as the phenotype.
model The phenotypic model: the usual normal model, a model for binary traits, a two-part model or non-parametric.
method Indicates whether to use the EM algorithm, imputation, Haley-Knott regression, or marker regression. Not all methods are available for all models. Marker regression is performed either by dropping individuals with missing genotypes ("mr"), or by first filling in missing data using a single imputation ("mr-imp") or by the Viterbi algorithm ("mr-argmax").
addcovar Additive covariates; allowed only for the normal model.
intcovar Interactive covariates (interact with QTL genotype); allowed only for the normal model.
weights Optional weights of individuals. Should be either NULL or a vector of length n.ind containing positive weights. Used only in the case model="normal"
upper Used only for the two-part model; if true, the ``undefined'' phenotype is the maximum observed phenotype; otherwise, it is the smallest observed phenotype.
ties.random Used only for the non-parametric ``model;'' if TRUE, ties in the phenotypes are ranked at random. If FALSE, average ranks are used and a corrected LOD score is calculated.
start Used only for the EM algorithm with the normal model and no covariates. If NULL, use the usual starting values; if length 1, use random initial weights for EM; otherwise, this should be a vector of length n+1 (where n is the number of possible genotypes for the cross), giving the initial values for EM.
maxit Maximum number of iterations in the EM algorithm; used only in interval mapping.
tol Tolerance value for determining convergence in the EM algorithm; used only in interval mapping.
n.perm If specified, a permutation test is performed rather than an analysis of the observed data. This argument defines the number of permutation replicates.
trace In the case n.perm is specified, display information about the progress of the permutation tests.

Details

Use of the EM algorithm or Haley-Knott regression require that multipoint genotype probabilities are first calculated using calc.genoprob. The imputation method uses the results of sim.geno.

Individuals with missing phenotypes are dropped.

In the case that n.perm is not missing, so that a permutation test is performed, the R function scanone is called repeatedly.

See further details on the models, the methods and the use of covariates below.

Value

If n.perm is missing, the function returns a data.frame whose first two columns contain the chromosome IDs and cM positions. The third column contains the LOD score. In the case of the two-part model, the third column is LOD(p,mu), while the fourth and fifth columns are LOD(p) and LOD(mu). In the case of no covariates, further columns specify the parameter estimates. The data frame is given class "scanone" and attributes "model", "method" and "type" (the latter is the type of cross analyzed).

If n.perm is specified, the function returns either a vector of length n.perm, containing the maximum LOD scores, genome-wide, for the permutation replicates. In the case of the two-part model, the return value is a matrix of size n.perm x 3, with columns corresponding to the three different LOD scores.

Models

The normal model is the standard model for QTL mapping. The residual phenotypic variation is assumed to follow a normal distribution, and analysis is analogous to linear regression.

The binary model is for the case of a binary phenotype, which must have values 0 and 1. The proportions of 1's in the different genotype groups are compared. Currently only methods em and mr are available for this model.

The two-part model is appropriate for the case of a spike in the phenotype distribution (for example, metastatic density when many individuals show no metastasis, or survival time following an infection when individuals may recover from the infection and fail to die). The two-part model was described by Broman et al. (2000) and Boyartchuk et al. (2001). Individuals with QTL genotype g have probability p[g] of having an undefined phenotype (the spike), while if their phenotype is defined, it comes from a normal distribution with mean mu[g] and common standard deviation s. Three LOD scores are calculated: LOD(p,mu) is for the test of the hypothesis that p[g] = p and mu[g] = mu. LOD(p) is for the test that p[g] = p while the mu[g] may vary. LOD(mu) is for the test that mu[g] = mu while the p[g] may vary.

With the non-parametric ``model'', an extension of the Kruskal-Wallis test is used; this is similar to the method described by Kruglyak and Lander (1995). In the case of incomplete genotype information (such as at locations between genetic markers), the Kruskal-Wallis statistic is modified so that the rank for each individual is weighted by the genotype probabilities, analgous to Haley-Knott regression. For this method, if the argument ties.random is TRUE, ties in the phenotypes are assigned random ranks; if it is FALSE, average ranks are used and a corrected LOD score is calculate. Currently the method argument is ignored for this model.

Methods

em: maximum likelihood is performed via the EM algorithm (Dempster et al. 1977), first used in this context by Lander and Botstein (1989).

imp: multiple imputation is used, as described by Sen and Churchill (2001).

hk: Haley-Knott regression is used (regression of the phenotypes on the multipoint QTL genotype probabilities), as described by Haley and Knott (1992).

mr: Marker regression is used. Analysis is performed only at the genetic markers, and individuals with missing genotypes are discarded.

Covariates

Covariates are allowed only for the normal model, in which case the model is y = b[q] + A g + Z d[q] + e where q is the unknown QTL genotype, A is a matrix of additive covariates, and Z is a matrix of covariates that interact with the QTL genotype. The columns of z are forced to be contained in the matrix A.

The LOD score is calculated comparing the likelihood of the above model to that of the null model y = m + A g + e.

Covariates must be numeric matrices. Individuals with any missing covariates are discarded.

X chromosome

The X chromosome must be treated specially in QTL mapping.

If both males and females are included, male hemizygotes are allowed to be different from female homozygotes. Thus, in a backcross, we will fit separate means for the genotype classes AA, AB, AY, and BY. In such cases, sex differences in the phenotype could cause spurious linkage to the X chromosome, and so the null hypothesis must be changed to allow for a sex difference in the phenotype.

BC Sexes Null Alternative df
both sexes sex AA/AB/AY/BY 2
all female grand mean AA/AB 1
all male grand mean AY/BY 1

F2

Direction Sexes Null Alternative df
Both both sexes femaleF/femaleR/male AA/ABf/ABr/BB/AY/BY 3
all female pgm AA/ABf/ABr/BB 2
all male grand mean AY/BY 1
Forward both sexes sex AA/AB/AY/BY 2
all female grand mean AA/AB 1
all male grand mean AY/BY 1
Backward both sexes sex AB/BB/AY/BY 2
all female grand mean AB/BB 1
all male grand mean AY/BY 1

Author(s)

Karl W Broman, kbroman@jhsph.edu; Hao Wu, hao@jax.org

References

Boyartchuk V. L., Broman, K. W., Mosher, R. E., D'Orazio S. E. F., Starnbach, M. N. and Dietrich, W. F. (2001) Multigenic control of Listeria monocytogenes susceptibility in mice. Nature Genetics 27, 259–260.

Broman, K. W., Boyartchuk, V. L. and Dietrich, W. F. (2000) Mapping time-to-death quantitative trait loci in a mouse cross with high survival rates. Technical Report MS00-04, Department of Biostatistics, Johns Hopkins University, Baltimore, MD.

Churchill, G. A. and Doerge, R. W. (1994) Empirical threshold values for quantitative trait mapping. Genetics 138, 963–971.

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B, 39, 1–38.

Haley, C. S. and Knott, S. A. (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69, 315–324.

Kruklyak, L. and Lander, E. S. (1995) A nonparametric approach for mapping quantitative trait loci. Genetics 139, 1421–1428.

Lander, E. S. and Botstein, D. (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199.

Sen, S. and Churchill, G. A. (2001) A statistical framework for quantitative trait mapping. Genetics 159, 371–387.

Soller, M., Brody, T. and Genizi, A. (1976) On the power of experimental designs for the detection of linkage between marker loci and quantitative loci in crosses between inbred lines. Theor. Appl. Genet. 47, 35–39.

See Also

plot.scanone, summary.scanone, scantwo, calc.genoprob, sim.geno, max.scanone

Examples

###################
# Normal Model
###################
data(hyper)

# Genotype probabilities for EM and H-K
hyper <- calc.genoprob(hyper, step=2.5)
out.em <- scanone(hyper, method="em")
out.hk <- scanone(hyper, method="hk")

# Summarize results: peaks above 3
summary(out.em, 3)
summary(out.hk, 3)

# Plot the results
plot(out.hk, out.em)
plot(out.hk, out.em, chr=c(1,4), lty=1, col=c("blue","black"))

# Imputation; first need to run sim.geno
# Do just chromosomes 1 and 4, to save time
hyper.c1n4 <- sim.geno(subset(hyper, chr=c(1,4)),
                       step=2.5, n.draws=8)
out.imp <- scanone(hyper.c1n4, method="imp")
summary(out.imp, 3)

# Plot all three results
plot(out.imp, out.hk, out.em, chr=c(1,4), lty=1,
     col=c("red","blue","black"))

# Permutation tests
## Not run: 
permo <- scanone(hyper, method="hk", n.perm=1000)
## End(Not run)quantile(permo, 0.95)

###################
# Non-parametric
###################
out.np <- scanone(hyper, model="np")
summary(out.np, 3)

# Plot with previous results
plot(out.np, chr=c(1,4), lty=1, col="green")
plot(out.imp, out.hk, out.em, chr=c(1,4), lty=1,
     col=c("red","blue","black"), add=TRUE)

###################
# Two-part Model
###################
data(listeria)

listeria <- calc.genoprob(listeria,step=2.5)
out.2p <- scanone(listeria, model="2part", upper=TRUE)
summary(out.2p, 5)

# Plot all three LOD scores together
plot(out.2p, out.2p, out.2p, lodcolumn=c(4,5,3), lty=1, chr=c(1,5,13),
     col=c("red","blue","black"))

# Permutation test
## Not run: 
permo <- scanone(listeria, model="2part", upper=TRUE,
                 n.perm=1000)
## End(Not run)apply(permo, 2, quantile, 0.95)

###################
# Binary model
###################
listeria <- subset(listeria, ind=!is.na(listeria$pheno[,1]))
listeria$pheno[,2] <- rep(0,nind(listeria))
listeria$pheno[listeria$pheno[,1]==264,2] <- 1
out.bin <- scanone(listeria, pheno.col=2, model="binary")
summary(out.bin, 3)

# Plot LOD for binary model with LOD(p) from 2-part model
plot(out.bin, out.2p, lodcolumn=c(3,4), lty=1, col=c("black", "red"),
     chr=c(1,5,13))

# Permutation test
## Not run: 
permo <- scanone(listeria, pheno.col=2, model="binary",
                 n.perm=1000)
## End(Not run)quantile(permo, 0.95)

###################
# Covariates
###################
data(fake.bc)

plot(fake.bc)
fake.bc <- calc.genoprob(fake.bc, step=2.5)
# genome scans without covariates
out.nocovar <- scanone(fake.bc)
# genome scans with covariates
ac <- fake.bc$pheno[,c("sex","age")]
ic <- fake.bc$pheno[,"sex"]
out.covar <- scanone(fake.bc, pheno.col=1,
                     addcovar=ac, intcovar=ic)
summary(out.nocovar,3)
summary(out.covar,3)
plot(out.covar,out.nocovar,chr=c(2,5,10))

[Package qtl version 0.98-57 Index]