xcluster {ctc} R Documentation

## Hierarchical clustering

### Description

Performs a hierarchical cluster analysis on a set of dissimilarities.

### Usage

```xcluster(data,distance="euclidean",clean=FALSE,tmp.in="tmp.txt",tmp.out="tmp.gtr")
```

### Arguments

 `data` a matrix (or data frame) which provides the data to analyze `distance` The distance measure used with Xcluster. This must be one of `"euclidean"`, `"pearson"` or `"notcenteredpearson"`. Any unambiguous substring can be given. `clean` a logical value indicating whether you want the true distances (`clean=FALSE`), or you want a clean dendogramme `tmp.in, tmp.out` temporary files for Xcluster

### Details

Available distance measures are (written for two vectors x and y):

• Euclidean: Usual square distance between the two vectors (2 norm).
• Pearson: 1 - cor(x,y)
• Pearson not centered: 1 - [ sum x_i y_i ] / sqrt[ sum x_i^2 * sum y_i^2 ]

Xcluster does not use usual agglomerative methods (single, average, complete), but compute the distance between each groups' barycenter for the distance between two groups.

This have a problem for this kind of data:

 A 0 0 B 0 1 C 0.9 0.5

Ie: a triangular in {bf R}\$^2\$, the distance between A and B is larger than the distance between the group A,B and C (with euclidean distance).

For that case it can be useful to use `clean=TRUE` and that mean that you must not consider A and B as a group without C.

### Value

An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:

 `merge` an n-1 by 2 matrix. Row i of `merge` describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in `merge` indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons. `height` a set of n-1 non-decreasing real values. The clustering height: that is, the value of the criterion associated with the clustering `method` for the particular agglomeration. `order` a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix `merge` will not have crossings of the branches. `labels` labels for each of the objects being clustered. `call` the call which produced the result. `method` the cluster method that has been used. `dist.method` the distance that has been used to create `d` (only returned if the distance object has a `"method"` attribute).

### Note

Xcluster is a C program made by Gavin Sherlock that performs hierarchical clustering, K-means and SOM.

### Author(s)

Antoine Lucas, http://genopole.toulouse.inra.fr/~lucas/R

`r2xcluster`, `xcluster2r`,`hclust`

### Examples

```#    Create data
.Random.seed <- c(1,  416884367 ,1051235439)
m <- matrix(rep(1,3*24),ncol=3)
m[9:16,3] <- 3 ; m[17:24,] <- 3    #create 3 groups