Fonction cdist - module scipy.cluster.vq

Signature de la fonction cdist

def cdist(XA, XB, metric='euclidean', *, out=None, **kwargs)

Description

help(scipy.cluster.vq.cdist)

Compute distance between each pair of the two collections of inputs.

See Notes for common calling conventions.

Parameters
----------
XA : array_like
    An :math:`m_A` by :math:`n` array of :math:`m_A`
    original observations in an :math:`n`-dimensional space.
    Inputs are converted to float type.
XB : array_like
    An :math:`m_B` by :math:`n` array of :math:`m_B`
    original observations in an :math:`n`-dimensional space.
    Inputs are converted to float type.
metric : str or callable, optional
    The distance metric to use. If a string, the distance function can be
    'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation',
    'cosine', 'dice', 'euclidean', 'hamming', 'jaccard', 'jensenshannon',
    'kulczynski1', 'mahalanobis', 'matching', 'minkowski',
    'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener',
    'sokalsneath', 'sqeuclidean', 'yule'.
**kwargs : dict, optional
    Extra arguments to `metric`: refer to each metric documentation for a
    list of all possible arguments.

    Some possible arguments:

    p : scalar
    The p-norm to apply for Minkowski, weighted and unweighted.
    Default: 2.

    w : array_like
    The weight vector for metrics that support weights (e.g., Minkowski).

    V : array_like
    The variance vector for standardized Euclidean.
    Default: var(vstack([XA, XB]), axis=0, ddof=1)

    VI : array_like
    The inverse of the covariance matrix for Mahalanobis.
    Default: inv(cov(vstack([XA, XB].T))).T

    out : ndarray
    The output array
    If not None, the distance matrix Y is stored in this array.

Returns
-------
Y : ndarray
    A :math:`m_A` by :math:`m_B` distance matrix is returned.
    For each :math:`i` and :math:`j`, the metric
    ``dist(u=XA[i], v=XB[j])`` is computed and stored in the
    :math:`ij` th entry.

Raises
------
ValueError
    An exception is thrown if `XA` and `XB` do not have
    the same number of columns.

Notes
-----
The following are common calling conventions:

1. ``Y = cdist(XA, XB, 'euclidean')``

   Computes the distance between :math:`m` points using
   Euclidean distance (2-norm) as the distance metric between the
   points. The points are arranged as :math:`m`
   :math:`n`-dimensional row vectors in the matrix X.

2. ``Y = cdist(XA, XB, 'minkowski', p=2.)``

   Computes the distances using the Minkowski distance
   :math:`\|u-v\|_p` (:math:`p`-norm) where :math:`p > 0` (note
   that this is only a quasi-metric if :math:`0 < p < 1`).

3. ``Y = cdist(XA, XB, 'cityblock')``

   Computes the city block or Manhattan distance between the
   points.

4. ``Y = cdist(XA, XB, 'seuclidean', V=None)``

   Computes the standardized Euclidean distance. The standardized
   Euclidean distance between two n-vectors ``u`` and ``v`` is

   .. math::

      \sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.

   V is the variance vector; V[i] is the variance computed over all
   the i'th components of the points. If not passed, it is
   automatically computed.

5. ``Y = cdist(XA, XB, 'sqeuclidean')``

   Computes the squared Euclidean distance :math:`\|u-v\|_2^2` between
   the vectors.

6. ``Y = cdist(XA, XB, 'cosine')``

   Computes the cosine distance between vectors u and v,

   .. math::

      1 - \frac{u \cdot v}
               {{\|u\|}_2 {\|v\|}_2}

   where :math:`\|*\|_2` is the 2-norm of its argument ``*``, and
   :math:`u \cdot v` is the dot product of :math:`u` and :math:`v`.

7. ``Y = cdist(XA, XB, 'correlation')``

   Computes the correlation distance between vectors u and v. This is

   .. math::

      1 - \frac{(u - \bar{u}) \cdot (v - \bar{v})}
               {{\|(u - \bar{u})\|}_2 {\|(v - \bar{v})\|}_2}

   where :math:`\bar{v}` is the mean of the elements of vector v,
   and :math:`x \cdot y` is the dot product of :math:`x` and :math:`y`.


8. ``Y = cdist(XA, XB, 'hamming')``

   Computes the normalized Hamming distance, or the proportion of
   those vector elements between two n-vectors ``u`` and ``v``
   which disagree. To save memory, the matrix ``X`` can be of type
   boolean.

9. ``Y = cdist(XA, XB, 'jaccard')``

   Computes the Jaccard distance between the points. Given two
   vectors, ``u`` and ``v``, the Jaccard distance is the
   proportion of those elements ``u[i]`` and ``v[i]`` that
   disagree where at least one of them is non-zero.

10. ``Y = cdist(XA, XB, 'jensenshannon')``

    Computes the Jensen-Shannon distance between two probability arrays.
    Given two probability vectors, :math:`p` and :math:`q`, the
    Jensen-Shannon distance is

    .. math::

       \sqrt{\frac{D(p \parallel m) + D(q \parallel m)}{2}}

    where :math:`m` is the pointwise mean of :math:`p` and :math:`q`
    and :math:`D` is the Kullback-Leibler divergence.

11. ``Y = cdist(XA, XB, 'chebyshev')``

    Computes the Chebyshev distance between the points. The
    Chebyshev distance between two n-vectors ``u`` and ``v`` is the
    maximum norm-1 distance between their respective elements. More
    precisely, the distance is given by

    .. math::

       d(u,v) = \max_i {|u_i-v_i|}.

12. ``Y = cdist(XA, XB, 'canberra')``

    Computes the Canberra distance between the points. The
    Canberra distance between two points ``u`` and ``v`` is

    .. math::

      d(u,v) = \sum_i \frac{|u_i-v_i|}
                           {|u_i|+|v_i|}.

13. ``Y = cdist(XA, XB, 'braycurtis')``

    Computes the Bray-Curtis distance between the points. The
    Bray-Curtis distance between two points ``u`` and ``v`` is


    .. math::

         d(u,v) = \frac{\sum_i (|u_i-v_i|)}
                       {\sum_i (|u_i+v_i|)}

14. ``Y = cdist(XA, XB, 'mahalanobis', VI=None)``

    Computes the Mahalanobis distance between the points. The
    Mahalanobis distance between two points ``u`` and ``v`` is
    :math:`\sqrt{(u-v)(1/V)(u-v)^T}` where :math:`(1/V)` (the ``VI``
    variable) is the inverse covariance. If ``VI`` is not None,
    ``VI`` will be used as the inverse covariance matrix.

15. ``Y = cdist(XA, XB, 'yule')``

    Computes the Yule distance between the boolean
    vectors. (see `yule` function documentation)

16. ``Y = cdist(XA, XB, 'matching')``

    Synonym for 'hamming'.

17. ``Y = cdist(XA, XB, 'dice')``

    Computes the Dice distance between the boolean vectors. (see
    `dice` function documentation)

18. ``Y = cdist(XA, XB, 'kulczynski1')``

    Computes the kulczynski distance between the boolean
    vectors. (see `kulczynski1` function documentation)

    .. deprecated:: 1.15.0
       This metric is deprecated and will be removed in SciPy 1.17.0.
       Replace usage of ``cdist(XA, XB, 'kulczynski1')`` with
       ``1 / cdist(XA, XB, 'jaccard') - 1``.

19. ``Y = cdist(XA, XB, 'rogerstanimoto')``

    Computes the Rogers-Tanimoto distance between the boolean
    vectors. (see `rogerstanimoto` function documentation)

20. ``Y = cdist(XA, XB, 'russellrao')``

    Computes the Russell-Rao distance between the boolean
    vectors. (see `russellrao` function documentation)

21. ``Y = cdist(XA, XB, 'sokalmichener')``

    Computes the Sokal-Michener distance between the boolean
    vectors. (see `sokalmichener` function documentation)

    .. deprecated:: 1.15.0
       This metric is deprecated and will be removed in SciPy 1.17.0.
       Replace usage of ``cdist(XA, XB, 'sokalmichener')`` with
       ``cdist(XA, XB, 'rogerstanimoto')``.

22. ``Y = cdist(XA, XB, 'sokalsneath')``

    Computes the Sokal-Sneath distance between the vectors. (see
    `sokalsneath` function documentation)

23. ``Y = cdist(XA, XB, f)``

    Computes the distance between all pairs of vectors in X
    using the user supplied 2-arity function f. For example,
    Euclidean distance between the vectors could be computed
    as follows::

      dm = cdist(XA, XB, lambda u, v: np.sqrt(((u-v)**2).sum()))

    Note that you should avoid passing a reference to one of
    the distance functions defined in this library. For example,::

      dm = cdist(XA, XB, sokalsneath)

    would calculate the pair-wise distances between the vectors in
    X using the Python function `sokalsneath`. This would result in
    sokalsneath being called :math:`{n \choose 2}` times, which
    is inefficient. Instead, the optimized C version is more
    efficient, and we call it using the following syntax::

      dm = cdist(XA, XB, 'sokalsneath')

Examples
--------
Find the Euclidean distances between four 2-D coordinates:

>>> from scipy.spatial import distance
>>> import numpy as np
>>> coords = [(35.0456, -85.2672),
...           (35.1174, -89.9711),
...           (35.9728, -83.9422),
...           (36.1667, -86.7833)]
>>> distance.cdist(coords, coords, 'euclidean')
array([[ 0.    ,  4.7044,  1.6172,  1.8856],
       [ 4.7044,  0.    ,  6.0893,  3.3561],
       [ 1.6172,  6.0893,  0.    ,  2.8477],
       [ 1.8856,  3.3561,  2.8477,  0.    ]])


Find the Manhattan distance from a 3-D point to the corners of the unit
cube:

>>> a = np.array([[0, 0, 0],
...               [0, 0, 1],
...               [0, 1, 0],
...               [0, 1, 1],
...               [1, 0, 0],
...               [1, 0, 1],
...               [1, 1, 0],
...               [1, 1, 1]])
>>> b = np.array([[ 0.1,  0.2,  0.4]])
>>> distance.cdist(a, b, 'cityblock')
array([[ 0.7],
       [ 0.9],
       [ 1.3],
       [ 1.5],
       [ 1.5],
       [ 1.7],
       [ 2.1],
       [ 2.3]])

Vous êtes un professionnel et vous avez besoin d'une formation ? Machine Learning
avec Scikit-Learn Voir le programme détaillé

Le tutoriel Python complet (Text+Vidéos)

Le tutoriel Python en vidéos

Evaluez vos compétences en Python

Améliorations / Corrections

Fonction cdist - module scipy.cluster.vq

Signature de la fonction cdist

Description

help(scipy.cluster.vq.cdist)