Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Vous êtes un professionnel et vous avez besoin d'une formation ? RAG (Retrieval-Augmented Generation)
et Fine Tuning d'un LLM
Voir le programme détaillé
Module « scipy.cluster.vq »

Fonction kmeans2 - module scipy.cluster.vq

Signature de la fonction kmeans2

def kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True, *, rng=None) 

Description

help(scipy.cluster.vq.kmeans2)

    


Classify a set of observations into k clusters using the k-means algorithm.

The algorithm attempts to minimize the Euclidean distance between
observations and centroids. Several initialization methods are
included.

Parameters
----------
data : ndarray
    A 'M' by 'N' array of 'M' observations in 'N' dimensions or a length
    'M' array of 'M' 1-D observations.
k : int or ndarray
    The number of clusters to form as well as the number of
    centroids to generate. If `minit` initialization string is
    'matrix', or if a ndarray is given instead, it is
    interpreted as initial cluster to use instead.
iter : int, optional
    Number of iterations of the k-means algorithm to run. Note
    that this differs in meaning from the iters parameter to
    the kmeans function.
thresh : float, optional
    (not used yet)
minit : str, optional
    Method for initialization. Available methods are 'random',
    'points', '++' and 'matrix':
    
    'random': generate k centroids from a Gaussian with mean and
    variance estimated from the data.
    
    'points': choose k observations (rows) at random from data for
    the initial centroids.
    
    '++': choose k observations accordingly to the kmeans++ method
    (careful seeding)
    
    'matrix': interpret the k parameter as a k by M (or length k
    array for 1-D data) array of initial centroids.
missing : str, optional
    Method to deal with empty clusters. Available methods are
    'warn' and 'raise':
    
    'warn': give a warning and continue.
    
    'raise': raise an ClusterError and terminate the algorithm.
check_finite : bool, optional
    Whether to check that the input matrices contain only finite numbers.
    Disabling may give a performance gain, but may result in problems
    (crashes, non-termination) if the inputs do contain infinities or NaNs.
    Default: True
rng : {None, int, `numpy.random.Generator`}, optional
    If `rng` is passed by keyword, types other than `numpy.random.Generator` are
    passed to `numpy.random.default_rng` to instantiate a ``Generator``.
    If `rng` is already a ``Generator`` instance, then the provided instance is
    used. Specify `rng` for repeatable function behavior.

    If this argument is passed by position or `seed` is passed by keyword,
    legacy behavior for the argument `seed` applies:

    - If `seed` is None (or `numpy.random`), the `numpy.random.RandomState`
      singleton is used.
    - If `seed` is an int, a new ``RandomState`` instance is used,
      seeded with `seed`.
    - If `seed` is already a ``Generator`` or ``RandomState`` instance then
      that instance is used.

    .. versionchanged:: 1.15.0
        As part of the `SPEC-007 <https://scientific-python.org/specs/spec-0007/>`_
        transition from use of `numpy.random.RandomState` to
        `numpy.random.Generator`, this keyword was changed from `seed` to `rng`.
        For an interim period, both keywords will continue to work, although only one
        may be specified at a time. After the interim period, function calls using the
        `seed` keyword will emit warnings. The behavior of both `seed` and
        `rng` are outlined above, but only the `rng` keyword should be used in new code.
        

Returns
-------
centroid : ndarray
    A 'k' by 'N' array of centroids found at the last iteration of
    k-means.
label : ndarray
    label[i] is the code or index of the centroid the
    ith observation is closest to.

See Also
--------

:func:`kmeans`
    ..

References
----------
.. [1] D. Arthur and S. Vassilvitskii, "k-means++: the advantages of
   careful seeding", Proceedings of the Eighteenth Annual ACM-SIAM Symposium
   on Discrete Algorithms, 2007.

Examples
--------
>>> from scipy.cluster.vq import kmeans2
>>> import matplotlib.pyplot as plt
>>> import numpy as np

Create z, an array with shape (100, 2) containing a mixture of samples
from three multivariate normal distributions.

>>> rng = np.random.default_rng()
>>> a = rng.multivariate_normal([0, 6], [[2, 1], [1, 1.5]], size=45)
>>> b = rng.multivariate_normal([2, 0], [[1, -1], [-1, 3]], size=30)
>>> c = rng.multivariate_normal([6, 4], [[5, 0], [0, 1.2]], size=25)
>>> z = np.concatenate((a, b, c))
>>> rng.shuffle(z)

Compute three clusters.

>>> centroid, label = kmeans2(z, 3, minit='points')
>>> centroid
array([[ 2.22274463, -0.61666946],  # may vary
       [ 0.54069047,  5.86541444],
       [ 6.73846769,  4.01991898]])

How many points are in each cluster?

>>> counts = np.bincount(label)
>>> counts
array([29, 51, 20])  # may vary

Plot the clusters.

>>> w0 = z[label == 0]
>>> w1 = z[label == 1]
>>> w2 = z[label == 2]
>>> plt.plot(w0[:, 0], w0[:, 1], 'o', alpha=0.5, label='cluster 0')
>>> plt.plot(w1[:, 0], w1[:, 1], 'd', alpha=0.5, label='cluster 1')
>>> plt.plot(w2[:, 0], w2[:, 1], 's', alpha=0.5, label='cluster 2')
>>> plt.plot(centroid[:, 0], centroid[:, 1], 'k*', label='centroids')
>>> plt.axis('equal')
>>> plt.legend(shadow=True)
>>> plt.show()


Vous êtes un professionnel et vous avez besoin d'une formation ? Deep Learning avec Python
et Keras et Tensorflow
Voir le programme détaillé