aggregate {stats} | R Documentation |

Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

aggregate(x, ...) ## Default S3 method: aggregate(x, ...) ## S3 method for class 'data.frame': aggregate(x, by, FUN, ...) ## S3 method for class 'ts': aggregate(x, nfrequency = 1, FUN = sum, ndeltat = 1, ts.eps = getOption("ts.eps"), ...)

`x` |
an R object. |

`by` |
a list of grouping elements, each as long as the variables
in `x` . Names for the grouping variables are provided if
they are not given. The elements of the list will be coerced to
factors (if they are not already factors). |

`FUN` |
a scalar function to compute the summary statistics which can be applied to all data subsets. |

`nfrequency` |
new number of observations per unit of time; must
be a divisor of the frequency of `x` . |

`ndeltat` |
new fraction of the sampling period between
successive observations; must be a divisor of the sampling
interval of `x` . |

`ts.eps` |
tolerance used to decide if `nfrequency` is a
sub-multiple of the original frequency. |

`...` |
further arguments passed to or used by methods. |

`aggregate`

is a generic function with methods for data frames
and time series.

The default method `aggregate.default`

uses the time series
method if `x`

is a time series, and otherwise coerces `x`

to a data frame and calls the data frame method.

`aggregate.data.frame`

is the data frame method. If `x`

is not a data frame, it is coerced to one. Then, each of the
variables (columns) in `x`

is split into subsets of cases
(rows) of identical combinations of the components of `by`

, and
`FUN`

is applied to each such subset with further arguments in
`...`

passed to it.
(I.e., `tapply(VAR, by, FUN, ..., simplify = FALSE)`

is done
for each variable `VAR`

in `x`

, conveniently wrapped into
one call to `lapply()`

.)
Empty subsets are removed, and the result is reformatted into a data
frame containing the variables in `by`

and `x`

. The ones
arising from `by`

contain the unique combinations of grouping
values used for determining the subsets, and the ones arising from
`x`

the corresponding summary statistics for the subset of the
respective variables in `x`

.

`aggregate.ts`

is the time series method. If `x`

is not a
time series, it is coerced to one. Then, the variables in `x`

are split into appropriate blocks of length
`frequency(x) / nfrequency`

, and `FUN`

is applied to each
such block, with further (named) arguments in `...`

passed to
it. The result returned is a time series with frequency
`nfrequency`

holding the aggregated values.

Kurt Hornik

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
*The New S Language*.
Wadsworth & Brooks/Cole.

## Compute the averages for the variables in 'state.x77', grouped ## according to the region (Northeast, South, North Central, West) that ## each state belongs to. aggregate(state.x77, list(Region = state.region), mean) ## Compute the averages according to region and the occurrence of more ## than 130 days of frost. aggregate(state.x77, list(Region = state.region, Cold = state.x77[,"Frost"] > 130), mean) ## (Note that no state in 'South' is THAT cold.) ## Compute the average annual approval ratings for American presidents. aggregate(presidents, nf = 1, FUN = mean) ## Give the summer less weight. aggregate(presidents, nf = 1, FUN = weighted.mean, w = c(1, 1, 0.5, 1))

[Package *stats* version 2.2.1 Index]