Fonction fclusterdata - module scipy.cluster.hierarchy

Signature de la fonction fclusterdata

def fclusterdata(X, t, criterion='inconsistent', metric='euclidean', depth=2, method='single', R=None)

Description

help(scipy.cluster.hierarchy.fclusterdata)

Cluster observation data using a given metric.

Clusters the original observations in the n-by-m data
matrix X (n observations in m dimensions), using the euclidean
distance metric to calculate distances between original observations,
performs hierarchical clustering using the single linkage algorithm,
and forms flat clusters using the inconsistency method with `t` as the
cut-off threshold.

A 1-D array ``T`` of length ``n`` is returned. ``T[i]`` is
the index of the flat cluster to which the original observation ``i``
belongs.

Parameters
----------
X : (N, M) ndarray
    N by M data matrix with N observations in M dimensions.
t : scalar
    For criteria 'inconsistent', 'distance' or 'monocrit',
     this is the threshold to apply when forming flat clusters.
    For 'maxclust' or 'maxclust_monocrit' criteria,
     this would be max number of clusters requested.
criterion : str, optional
    Specifies the criterion for forming flat clusters. Valid
    values are 'inconsistent' (default), 'distance', or 'maxclust'
    cluster formation algorithms. See `fcluster` for descriptions.
metric : str or function, optional
    The distance metric for calculating pairwise distances. See
    ``distance.pdist`` for descriptions and linkage to verify
    compatibility with the linkage method.
depth : int, optional
    The maximum depth for the inconsistency calculation. See
    `inconsistent` for more information.
method : str, optional
    The linkage method to use (single, complete, average,
    weighted, median centroid, ward). See `linkage` for more
    information. Default is "single".
R : ndarray, optional
    The inconsistency matrix. It will be computed if necessary
    if it is not passed.

Returns
-------
fclusterdata : ndarray
    A vector of length n. T[i] is the flat cluster number to
    which original observation i belongs.

See Also
--------
scipy.spatial.distance.pdist : pairwise distance metrics

Notes
-----
This function is similar to the MATLAB function ``clusterdata``.

Examples
--------
>>> from scipy.cluster.hierarchy import fclusterdata

This is a convenience method that abstracts all the steps to perform in a
typical SciPy's hierarchical clustering workflow.

* Transform the input data into a condensed matrix with
  `scipy.spatial.distance.pdist`.

* Apply a clustering method.

* Obtain flat clusters at a user defined distance threshold ``t`` using
  `scipy.cluster.hierarchy.fcluster`.

>>> X = [[0, 0], [0, 1], [1, 0],
...      [0, 4], [0, 3], [1, 4],
...      [4, 0], [3, 0], [4, 1],
...      [4, 4], [3, 4], [4, 3]]

>>> fclusterdata(X, t=1)
array([3, 3, 3, 4, 4, 4, 2, 2, 2, 1, 1, 1], dtype=int32)

The output here (for the dataset ``X``, distance threshold ``t``, and the
default settings) is four clusters with three data points each.

Vous êtes un professionnel et vous avez besoin d'une formation ? Machine Learning
avec Scikit-Learn Voir le programme détaillé

Le tutoriel Python complet (Text+Vidéos)

Le tutoriel Python en vidéos

Evaluez vos compétences en Python

Améliorations / Corrections

Fonction fclusterdata - module scipy.cluster.hierarchy

Signature de la fonction fclusterdata

Description

help(scipy.cluster.hierarchy.fclusterdata)