nltk.cluster.gaac module¶
- class nltk.cluster.gaac.GAAClusterer[source]¶
Bases:
VectorSpaceClusterer
The Group Average Agglomerative starts with each of the N vectors as singleton clusters. It then iteratively merges pairs of clusters which have the closest centroids. This continues until there is only one cluster. The order of merges gives rise to a dendrogram: a tree with the earlier merges lower than later merges. The membership of a given number of clusters c, 1 <= c <= N, can be found by cutting the dendrogram at depth c.
This clusterer uses the cosine similarity metric only, which allows for efficient speed-up in the clustering process.
- __init__(num_clusters=1, normalise=True, svd_dimensions=None)[source]¶
- Parameters
normalise (boolean) – should vectors be normalised to length 1
svd_dimensions (int) – number of dimensions to use in reducing vector dimensionsionality with SVD
- cluster(vectors, assign_clusters=False, trace=False)[source]¶
Assigns the vectors to clusters, learning the clustering parameters from the data. Returns a cluster identifier for each vector.
- cluster_vectorspace(vectors, trace=False)[source]¶
Finds the clusters using the given set of vectors.