A starting point {qtl} | R Documentation |

A brief introduction to the R/qtl package, with a walk-through of an analysis.

- In order to use the R/qtl package, you must type (within R)
`library(qtl)`

. You may wish to include this in a`.First`

function or`.Rprofile`

file. - Documention and several tutorials are available at the R archive (http://cran.r-project.org).
- Use the
`help.start`

function to start the html version of the R help. In Windows or MacOS, you may wish to use`options(htmlhelp=TRUE)`

to get easy access to the html version of the help files; this could be included in a`.First`

function or`.Rprofile`

file. - Type
`library(help=qtl)`

to get a list of the functions in R/qtl. - Use the
`example`

function to run examples of the various functions in R/qtl. - A tutorial on the use of R/qtl is distributed with the
package, ‘
`rqtltour.pdf`’. - Download the latest version of R/qtl from http://www.biostat.jhsph.edu/~kbroman/qtl.

Here we briefly describe the use of R/qtl to analyze an experimental
cross. A more extensive tutorial on its use is distributed with the
package, ‘`rqtltour.pdf`’.

A difficult first step in the use of most data analysis software is the
import of data. With R/qtl, one may import data in several different
formats by use of the function `read.cross`

. The
internal data structure used by R/qtl is rather complicated, and is
described in the help file for `read.cross`

. We won't
discuss data import any further here, except to say that the
comma-delimited format (`"csv"`

) is recommended. If you have
trouble importing data, send an email to Karl Broman,
kbroman@jhsph.edu, perhaps attaching examples of your data
files. (Such data will be kept confidential.)

We consider the example data `hyper`

, an experiment on
hypertension in the mouse, kindly provided
by Bev Paigen and Gary Churchill. Use the `data`

function to load the data.

`data(hyper)`

The `hyper`

data set has class `cross`

. The
function `summary.cross`

gives some summary information
on the data, and checks the data for internal consistency. A number
of other utility functions are available; hopefully these are
self-explanatory.

`summary(hyper)`

`nind(hyper)`

`nphe(hyper)`

`nchr(hyper)`

`nmar(hyper)`

`totmar(hyper)`

The function `plot.cross`

gives a graphical summary of
the data; it calls `plot.missing`

(to plot a matrix
displaying missing genotypes) and `plot.map`

(to plot
the genetic maps), and also displays histograms or barplots of the
phenotypes. The `plot.missing`

function can plot
individuals ordered by their phenotypes; you can see that for most
markers, only individuals with extreme phenotypes were genotyped.

`plot(hyper)`

`plot.missing(hyper)`

`plot.missing(hyper, reorder=TRUE)`

`plot.map(hyper)`

Note that one marker (on chromosome 14) has no genotype data. The
function `drop.nullmarkers`

removes such markers from
the data.

`hyper <- drop.nullmarkers(hyper)`

`totmar(hyper)`

The function `est.rf`

estimates the recombination
fraction between each pair of markers, and calculates a LOD score for
the test of *r* = 1/2. This is useful for identifying markers that
are placed on the wrong chromosome. Note that since, for these data,
many markers were typed only on recombinant individuals, the pairwise
recombination fractions show rather odd patterns.

`hyper <- est.rf(hyper)`

`plot.rf(hyper)`

`plot.rf(hyper, chr=c(1,4))`

To re-estimate the genetic map for an experimental cross, use the
function `est.map`

. The function
`plot.map`

, in addition to plotting a single map, can
plot the comparison of two genetic maps (as long as they are composed of
the same numbers of chromosomes and markers per chromosome). The
function `replace.map`

map be used to replace the
genetic map in a cross with a new one.

`newmap <- est.map(hyper, error.prob=0.01, trace=TRUE)`

`plot.map(hyper, newmap)`

`hyper <- replace.map(hyper, newmap)`

Before doing QTL analyses, a number of intermediate calculations may
need to be performed. The function `calc.genoprob`

calculates conditional genotype probabilities given the multipoint
marker data. `sim.geno`

simulates sequences of
genotypes from their joint distribution, given the observed marker data.
`argmax.geno`

calculates the most likely sequence of
underlying genotypes, given the observed marker data.

These three functions return a modified version of the input cross, with the intermediate calculations included.

`hyper <- calc.genoprob(hyper, step=2.5, error.prob=0.01)`

`hyper <- sim.geno(hyper, step=2.5, n.draws=64, error.prob=0.01)`

`hyper <- argmax.geno(hyper, error.prob=0.01)`

The function `calc.errorlod`

may be used to assist in
identifying possible genotyping errors; it calculates the error LOD
scores described by Lincoln and Lander (1992). It requires the results
of `calc.genoprob`

, run with a matching value for the
assumed genotyping error probability, `error.prob`

.

`hyper <- calc.errorlod(hyper, error.prob=0.01)`

`plot.errorlod(hyper)`

`top.errorlod(hyper)`

`plot.errorlod(hyper, chr=c(4,11,16))`

The function `plot.geno`

may be used to inspect the
observed genotypes for a chromosome, with likely genotyping errors
flagged.

`plot.geno(hyper, chr=16, ind=71:90, min.sep=4)`

The function `scanone`

performs a genome scan with a
single QTL model. By default, it performs standard interval mapping
(Lander and Botstein 1989): use of a normal model and the EM algorithm.
If one specifies `method="hk"`

, Haley-Knott regression is performed
(Haley and Knott 1992). These two methods require the results from
`calc.genoprob`

.

`out.em <- scanone(hyper)`

`out.hk <- scanone(hyper, method="hk")`

If one specifies `method="imp"`

, a genome scan is performed by the
multiple imputation method of Sen and Churchill (2001). This method
requires the results from `sim.geno`

.

`out.imp <- scanone(hyper, method="imp")`

The output of `scanone`

is a data.frame with class
`scanone`

. The function `plot.scanone`

may be
used to plot the results, and may plot up to three sets of results
against each other, as long as they conform appropriately.

`plot(out.em)`

`plot(out.hk, col="blue", add=TRUE)`

`plot(out.imp, col="red", add=TRUE)`

`plot(out.hk, out.imp, out.em, chr=c(1,4), lty=1, col=c("blue","red","black"))`

The function `summary.scanone`

may be used to list
information on the peak LOD for each chromosome for which the LOD
exceeds a specified threshold.

`summary(out.em)`

`summary(out.em, 3)`

`summary(out.hk, 3)`

`summary(out.imp, 3)`

The function `max.scanone`

returns the maximum LOD
score, genome-wide.

`max(out.em)`

`max(out.hk)`

`max(out.imp)`

One may also use `scanone`

to perform a permutation
test to get a genome-wide LOD significance threshold. This will take
some time.

`operm.hk <- scanone(hyper, method="hk", n.perm=100)`

`quantile(operm.hk, 0.95)`

We should say at this point that the function
`save.image`

will save your workspace to disk. You'll
wish you had used this if R crashes.

`save.image()`

The function `scantwo`

performs a two-dimensional
genome scan with a two-QTL model. Methods `"em"`

, `"hk"`

and
`"imp"`

are all available. `scantwo`

is
considerably slower than `scanone`

, and can require a
great deal of memory. Thus, you may wish to create a version of
`hyper`

for a more coarse grid.

`hyper.coarse <- calc.genoprob(hyper, step=10, err=0.01)`

`hyper.coarse <- sim.geno(hyper, step=10, n.draws=64, err=0.01)`

`out2.hk <- scantwo(hyper.coarse, method="hk")`

`out2.em <- scantwo(hyper.coarse)`

`out2.imp <- scantwo(hyper.coarse, method="imp")`

The output is an object with class `scantwo`

. The function
`plot.scantwo`

may be used to plot the results. The
upper triangle contains LOD scores for tests of epistasis, while the
lower triangle contains joint LOD scores.

`plot(out2.hk)`

`plot(out2.em)`

`plot(out2.imp)`

The function `summary.scantwo`

lists the interesting
aspects of the output. You must provide three LOD thresholds: for the joint
LOD, epistasis LOD, and conditional, single-QTL LOD scores. The locus
pairs giving the highest joint LOD for each pair of chromosomes are
inspected, and those whose LOD score exceed the joint LOD threshold and
for which either the interaction LOD exceeds its threshold or both the
conditional single-QTL LODs exceed their threshold, are printed.

`summary(out2.em, c(8, 3, 3))`

`summary(out2.em, c(0, 1000, 4))`

`summary(out2.em, c(0, 4, 1000))`

The function `max.scantwo`

returns the maximum joint
and interaction LODs for a two-dimensional genome scan.

`max(out2.em)`

Permutation tests may also performed with `scantwo`

;
it may take a few days of CPU time. The output is a matrix with two
columns: the maximum joint and epistasis LODs, across the
two-dimensional genome scan, for each simulation replicate.

`operm2 <- scantwo(hyper.coarse, method="hk", n.perm=100)`

`apply(operm2, 2, quantile, 0.95)`

`hist(operm.hk,breaks=20)`

To cite R/qtl in publications, use the Broman et al. (2003) reference listed below.

Also see the package BQTL (Bayesian QTL mapping toolkit) by Charles C. Berry. His package gave me the idea to have this ``A starting point'' help page.

Karl W Broman, kbroman@jhsph.edu

Broman, K. W., Wu, H., Sen, S. and Churchill, G. A. (2003) R/qtl: QTL
mapping in experimental crosses. *Bioinformatics* **19**,
889–890.

Haley, C. S. and Knott, S. A. (1992) A simple regression method for mapping
quantitative trait loci in line crosses using flanking markers.
*Heredity* **69**, 315–324.

Lander, E. S. and Botstein, D. (1989) Mapping Mendelian factors underlying
quantitative traits using RFLP linkage maps. *Genetics*
**121**, 185–199.

Lincoln, S. E. and Lander, E. S. (1992) Systematic detection of
errors in genetic linkage data. *Genomics* **14**, 604–610.

Sen, S. and Churchill, G. A. (2001) A statistical framework for quantitative
trait mapping. *Genetics* **159**, 371–387.

[Package *qtl* version 0.98-57 Index]