Module « scipy.cluster.hierarchy »
Signature de la fonction fclusterdata
def fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)
Description
fclusterdata.__doc__
Cluster observation data using a given metric.
Clusters the original observations in the n-by-m data
matrix X (n observations in m dimensions), using the euclidean
distance metric to calculate distances between original observations,
performs hierarchical clustering using the single linkage algorithm,
and forms flat clusters using the inconsistency method with `t` as the
cut-off threshold.
A 1-D array ``T`` of length ``n`` is returned. ``T[i]`` is
the index of the flat cluster to which the original observation ``i``
belongs.
Parameters
----------
X : (N, M) ndarray
N by M data matrix with N observations in M dimensions.
t : scalar
For criteria 'inconsistent', 'distance' or 'monocrit',
this is the threshold to apply when forming flat clusters.
For 'maxclust' or 'maxclust_monocrit' criteria,
this would be max number of clusters requested.
criterion : str, optional
Specifies the criterion for forming flat clusters. Valid
values are 'inconsistent' (default), 'distance', or 'maxclust'
cluster formation algorithms. See `fcluster` for descriptions.
metric : str or function, optional
The distance metric for calculating pairwise distances. See
``distance.pdist`` for descriptions and linkage to verify
compatibility with the linkage method.
depth : int, optional
The maximum depth for the inconsistency calculation. See
`inconsistent` for more information.
method : str, optional
The linkage method to use (single, complete, average,
weighted, median centroid, ward). See `linkage` for more
information. Default is "single".
R : ndarray, optional
The inconsistency matrix. It will be computed if necessary
if it is not passed.
Returns
-------
fclusterdata : ndarray
A vector of length n. T[i] is the flat cluster number to
which original observation i belongs.
See Also
--------
scipy.spatial.distance.pdist : pairwise distance metrics
Notes
-----
This function is similar to the MATLAB function ``clusterdata``.
Examples
--------
>>> from scipy.cluster.hierarchy import fclusterdata
This is a convenience method that abstracts all the steps to perform in a
typical SciPy's hierarchical clustering workflow.
* Transform the input data into a condensed matrix with `scipy.spatial.distance.pdist`.
* Apply a clustering method.
* Obtain flat clusters at a user defined distance threshold ``t`` using `scipy.cluster.hierarchy.fcluster`.
>>> X = [[0, 0], [0, 1], [1, 0],
... [0, 4], [0, 3], [1, 4],
... [4, 0], [3, 0], [4, 1],
... [4, 4], [3, 4], [4, 3]]
>>> fclusterdata(X, t=1)
array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32)
The output here (for the dataset ``X``, distance threshold ``t``, and the
default settings) is four clusters with three data points each.
Améliorations / Corrections
Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.
Emplacement :
Description des améliorations :