inbagg {ipred} | R Documentation |

Function to perform the indirect bagging and subagging.

inbagg.data.frame(formula, data, pFUN=NULL, cFUN=list(model = NULL, predict = NULL, training.set = NULL), nbagg = 25, ns = 0.5, replace = FALSE, ...)

`formula` |
formula. A `formula` specified as `y~w1+w2+w3~x1+x2+x3` describes how to model the intermediate variables `w1, w2, w3` and the response variable `y` , if no other formula is specified by the elements of `pFUN` or in `cFUN` |

`data` |
data frame of explanatory, intermediate and response variables. |

`pFUN` |
list of lists, which describe models for the intermediate variables, details are given below. |

`cFUN` |
either a fixed function with argument `newdata` and returning the class membership by default, or a list specifying a classifying model, similar to one element of `pFUN` . Details are given below. |

`nbagg` |
number of bootstrap samples. |

`ns` |
proportion of sample to be drawn from the learning sample. By default, subagging with 50% is performed, i.e. draw 0.5*n out of n without replacement. |

`replace` |
logical. Draw with or without replacement. |

`...` |
additional arguments (e.g. `subset` ). |

A given data set is subdivided into three types of variables: explanatory, intermediate and response variables.

Here, each specified intermediate variable is modelled separately
following `pFUN`

, a list of lists with elements specifying an
arbitrary number of models for the intermediate variables and an
optional element `training.set = c("oob", "bag", "all")`

. The
element `training.set`

determines whether, predictive models for
the intermediate are calculated based on the out-of-bag sample
(`"oob"`

), the default, on the bag sample (`"bag"`

) or on all
available observations (`"all"`

). The elements of `pFUN`

,
specifying the models for the intermediate variables are lists as
described in `inclass`

.
Note that, if no formula is given in these elements, the functional
relationship of `formula`

is used.

The response variable is modelled following `cFUN`

.
This can either be a fixed classifying function as described in Peters
et al. (2003) or a list,
which specifies the modelling technique to be applied. The list
contains the arguments `model`

(which model to be fitted),
`predict`

(optional, how to predict), `formula`

(optional, of
type `y~w1+w2+w3+x1+x2`

determines the variables the classifying
function is based on) and the optional argument ```
training.set =
c("fitted.bag", "original", "fitted.subset")
```

specifying whether the classifying function is trained on the predicted
observations of the bag sample (`"fitted.bag"`

),
on the original observations (`"original"`

) or on the
predicted observations not included in a defined subset
(`"fitted.subset"`

). Per default the formula specified in
`formula`

determines the variables, the classifying function is
based on.

Note that the default of `cFUN = list(model = NULL, training.set = "fitted.bag")`

uses the function `rpart`

and
the predict function `predict(object, newdata, type = "class")`

.

An object of class `"inbagg"`

, that is a list with elements

`mtrees` |
a list of length `nbagg` , describing the prediction
models corresponding
to each bootstrap sample. Each element of `mtrees`
is a list with elements `bindx` (observations of bag sample),
`btree` (classifying function of bag sample) and `bfct` (predictive models for intermediates of bag sample). |

`y` |
vector of response values. |

`W` |
data frame of intermediate variables. |

`X` |
data frame of explanatory variables. |

Andrea Peters <Peters.Andrea@imbe.imed.uni-erlangen.de>

David J. Hand, Hua Gui Li, Niall M. Adams (2001),
Supervised classification with structured class definitions.
*Computational Statistics & Data Analysis* **36**,
209–225.

Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller (2003),
Diagnosis of glaucoma by indirect classifiers.
*Methods of Information in Medicine* **1**, 99-103.

y <- as.factor(sample(1:2, 100, replace = TRUE)) W <- mvrnorm(n = 200, mu = rep(0, 3), Sigma = diag(3)) X <- mvrnorm(n = 200, mu = rep(2, 3), Sigma = diag(3)) colnames(W) <- c("w1", "w2", "w3") colnames(X) <- c("x1", "x2", "x3") DATA <- data.frame(y, W, X) pFUN <- list(list(formula = w1~x1+x2, model = lm, predict = mypredict.lm), list(model = rpart)) inbagg(y~w1+w2+w3~x1+x2+x3, data = DATA, pFUN = pFUN)

[Package *ipred* version 0.8-1 Index]