svm {e1071}  R Documentation 
svm
is used to train a support vector machine. It can be used to carry
out general regression and classification (of nu and epsilontype), as
well as densityestimation. A formula interface is provided.
## S3 method for class 'formula': svm(formula, data = NULL, ..., subset, na.action = na.omit, scale = TRUE) ## Default S3 method: svm(x, y = NULL, scale = TRUE, type = NULL, kernel = "radial", degree = 3, gamma = 1 / ncol(as.matrix(x)), coef0 = 0, cost = 1, nu = 0.5, class.weights = NULL, cachesize = 40, tolerance = 0.001, epsilon = 0.1, shrinking = TRUE, cross = 0, probability = FALSE, fitted = TRUE, ..., subset, na.action = na.omit)
formula 
a symbolic description of the model to be fit. Note, that an intercept is always included, whether given in the formula or not. 
data 
an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘svm’ is called from. 
x 
a data matrix, a vector, or a sparse matrix (object of class
matrix.csr as provided by the package SparseM). 
y 
a response vector with one label for each row/component of x . Can be either
a factor (for classification tasks) or a numeric vector (for
regression). 
scale 
A logical vector indicating the variables to be
scaled. If scale is of length 1, the value is recycled as
many times as needed.
Per default, data are scaled internally (both x and y
variables) to zero mean and unit variance. The center and scale
values are returned and used for later predictions. 
type 
svm can be used as a classification
machine, as a regresson machine, or for novelty detection.
Depending of whether y is
a factor or not, the default setting for type is Cclassification or epsregression , respectively, but may be overwritten by setting an explicit value.Valid options are:

kernel 
the kernel used in training and predicting. You
might consider changing some of the following parameters, depending
on the kernel type.

degree 
parameter needed for kernel of type polynomial (default: 3) 
gamma 
parameter needed for all kernels except linear
(default: 1/(data dimension)) 
coef0 
parameter needed for kernels of type polynomial
and sigmoid (default: 0) 
cost 
cost of constraints violation (default: 1)—it is the ‘C’constant of the regularization term in the Lagrange formulation. 
nu 
parameter needed for nuclassification and oneclassification 
class.weights 
a named vector of weights for the different classes, used for asymetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named. 
cachesize 
cache memory in MB (default 40) 
tolerance 
tolerance of termination criterion (default: 0.001) 
epsilon 
epsilon in the insensitiveloss function (default: 0.1) 
shrinking 
option whether to use the shrinkingheuristics
(default: TRUE ) 
cross 
if a integer value k>0 is specified, a kfold cross validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Sqared Error for regression 
fitted 
logical indicating whether the fitted values should be computed
and included in the model or not (default: TRUE ) 
probability 
logical indicating whether the model should allow for probability predictions. 
... 
additional parameters for the low level fitting function
svm.default 
subset 
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) 
na.action 
A function to specify the action to be taken if NA s are
found. The default action is na.omit , which leads to rejection of cases
with missing values on any required variable. An alternative
is na.fail , which causes an error if NA cases
are found. (NOTE: If given, this argument must be named.) 
For multiclassclassification with k levels, k>2, libsvm
uses the
‘oneagainstone’approach, in which k(k1)/2 binary classifiers are
trained; the appropriate class is found by a voting scheme.
libsvm
internally uses a sparse data representation, which is
also highlevel supported by the package SparseM.
If the predictor variables include factors, the formula interface must be used to get a
correct model matrix. plot.svm
allows a simple graphical
visualization of classification models.
The probability model for classification fits a logistic distribution
using maximum likelihood to the decision values of all binary
classifiers, and computes the aposteriori class probabilities for the
multiclass problem using quadratic optimization. The probabilistic
regression model assumes (zeromean) laplacedistributed errors for the
predictions, and estimates the scale parameter using maximum likelihood.
An object of class "svm"
containing the fitted model, including:
SV 
The resulting support vectors (possibly scaled). 
index 
The index of the resulting support vectors in the data
matrix. Note that this index refers to the preprocessed data (after
the possible effect of na.omit and subset ) 
coefs 
The corresponding coefficients times the training labels. 
rho 
The negative intercept. 
sigma 
In case of a probabilistic regression model, the scale parameter of the hypothesized (zeromean) laplace distribution estimated by maximum likelihood. 
probA, probB 
numeric vectors of length k(k1)/2, k number of classes, containing the parameters of the logistic distributions fitted to the decision values of the binary classifers (1 / (1 + exp(a x + b))). 
Data are scaled internally, usually yielding better results.
David Meyer (based on C/C++code by ChihChung Chang and ChihJen Lin)
david.meyer@ci.tuwien.ac.at
predict.svm
plot.svm
matrix.csr
(in package SparseM)
data(iris) attach(iris) ## classification mode # default with factor response: model < svm(Species ~ ., data = iris) # alternatively the traditional interface: x < subset(iris, select = Species) y < Species model < svm(x, y) print(model) summary(model) # test with train data pred < predict(model, x) # (same as:) pred < fitted(model) # Check accuracy: table(pred, y) # compute decision values and probabilities: pred < predict(model, x, decision.values = TRUE) attr(pred, "decision.values")[1:4,] # visualize (classes by color, SV by crosses): plot(cmdscale(dist(iris[,5])), col = as.integer(iris[,5]), pch = c("o","+")[1:150 %in% model$index + 1]) ## try regression mode on two dimensions # create data x < seq(0.1, 5, by = 0.05) y < log(x) + rnorm(x, sd = 0.2) # estimate model and predict input values m < svm(x, y) new < predict(m, x) # visualize plot(x, y) points(x, log(x), col = 2) points(x, new, col = 4) ## densityestimation # create 2dim. normal with rho=0: X < data.frame(a = rnorm(1000), b = rnorm(1000)) attach(X) # traditional way: m < svm(X, gamma = 0.1) # formula interface: m < svm(~., data = X, gamma = 0.1) # or: m < svm(~ a + b, gamma = 0.1) # test: newdata < data.frame(a = c(0, 4), b = c(0, 4)) predict (m, newdata) # visualize: plot(X, col = 1:1000 %in% m$index + 1, xlim = c(5,5), ylim=c(5,5)) points(newdata, pch = "+", col = 2, cex = 5) # weights: (example not particularly sensible) i2 < iris levels(i2$Species)[3] < "versicolor" summary(i2$Species) wts < 100 / table(i2$Species) wts m < svm(Species ~ ., data = i2, class.weights = wts)