org.apache.commons.math.stat.clustering
Class KMeansPlusPlusClusterer<T extends Clusterable<T>>

java.lang.Object
  extended by org.apache.commons.math.stat.clustering.KMeansPlusPlusClusterer<T>
Type Parameters:
T - type of the points to cluster

public class KMeansPlusPlusClusterer<T extends Clusterable<T>>
extends java.lang.Object

Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.

Since:
2.0
Version:
$Revision: 771076 $ $Date: 2009-05-03 12:28:48 -0400 (Sun, 03 May 2009) $
See Also:
K-means++ (wikipedia)

Field Summary
private  java.util.Random random
          Random generator for choosing initial centers.
 
Constructor Summary
KMeansPlusPlusClusterer(java.util.Random random)
          Build a clusterer.
 
Method Summary
private static
<T extends Clusterable<T>>
void
assignPointsToClusters(java.util.Collection<Cluster<T>> clusters, java.util.Collection<T> points)
          Adds the given points to the closest Cluster.
private static
<T extends Clusterable<T>>
java.util.List<Cluster<T>>
chooseInitialCenters(java.util.Collection<T> points, int k, java.util.Random random)
          Use K-means++ to choose the initial centers.
 java.util.List<Cluster<T>> cluster(java.util.Collection<T> points, int k, int maxIterations)
          Runs the K-means++ clustering algorithm.
private static
<T extends Clusterable<T>>
Cluster<T>
getNearestCluster(java.util.Collection<Cluster<T>> clusters, T point)
          Returns the nearest Cluster to the given point
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

random

private final java.util.Random random
Random generator for choosing initial centers.

Constructor Detail

KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(java.util.Random random)
Build a clusterer.

Parameters:
random - random generator to use for choosing initial centers
Method Detail

cluster

public java.util.List<Cluster<T>> cluster(java.util.Collection<T> points,
                                          int k,
                                          int maxIterations)
Runs the K-means++ clustering algorithm.

Parameters:
points - the points to cluster
k - the number of clusters to split the data into
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used
Returns:
a list of clusters containing the points

assignPointsToClusters

private static <T extends Clusterable<T>> void assignPointsToClusters(java.util.Collection<Cluster<T>> clusters,
                                                                      java.util.Collection<T> points)
Adds the given points to the closest Cluster.

Type Parameters:
T - type of the points to cluster
Parameters:
clusters - the Clusters to add the points to
points - the points to add to the given Clusters

chooseInitialCenters

private static <T extends Clusterable<T>> java.util.List<Cluster<T>> chooseInitialCenters(java.util.Collection<T> points,
                                                                                          int k,
                                                                                          java.util.Random random)
Use K-means++ to choose the initial centers.

Type Parameters:
T - type of the points to cluster
Parameters:
points - the points to choose the initial centers from
k - the number of centers to choose
random - random generator to use
Returns:
the initial centers

getNearestCluster

private static <T extends Clusterable<T>> Cluster<T> getNearestCluster(java.util.Collection<Cluster<T>> clusters,
                                                                       T point)
Returns the nearest Cluster to the given point

Type Parameters:
T - type of the points to cluster
Parameters:
clusters - the Clusters to search
point - the point to find the nearest Cluster for
Returns:
the nearest Cluster to the given point


Copyright (c) 2003-2009 Apache Software Foundation