Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Vous êtes un professionnel et vous avez besoin d'une formation ? Calcul scientifique
avec Python
Voir le programme détaillé
Module « scipy.stats »

Fonction chatterjeexi - module scipy.stats

Signature de la fonction chatterjeexi

def chatterjeexi(x, y, *, axis=0, y_continuous=False, method='asymptotic', nan_policy='propagate', keepdims=False) 

Description

help(scipy.stats.chatterjeexi)

    


Compute the xi correlation and perform a test of independence

The xi correlation coefficient is a measure of association between two
variables; the value tends to be close to zero when the variables are
independent and close to 1 when there is a strong association. Unlike
other correlation coefficients, the xi correlation is effective even
when the association is not monotonic.

Parameters
----------
x, y : array-like
    The samples: corresponding observations of the independent and
    dependent variable. The (N-d) arrays must be broadcastable.
axis : int or None, default: 0
    If an int, the axis of the input along which to compute the statistic.
    The statistic of each axis-slice (e.g. row) of the input will appear in a
    corresponding element of the output.
    If ``None``, the input will be raveled before computing the statistic.
method : 'asymptotic' or `PermutationMethod` instance, optional
    Selects the method used to calculate the *p*-value.
    Default is 'asymptotic'. The following options are available.
    
    * ``'asymptotic'``: compares the standardized test statistic
      against the normal distribution.
    * `PermutationMethod` instance. In this case, the p-value
      is computed using `permutation_test` with the provided
      configuration options and other appropriate settings.
y_continuous : bool, default: False
    Whether `y` is assumed to be drawn from a continuous distribution.
    If `y` is drawn from a continuous distribution, results are valid
    whether this is assumed or not, but enabling this assumption will
    result in faster computation and typically produce similar results.
nan_policy : {'propagate', 'omit', 'raise'}
    Defines how to handle input NaNs.
    
    - ``propagate``: if a NaN is present in the axis slice (e.g. row) along
      which the  statistic is computed, the corresponding entry of the output
      will be NaN.
    - ``omit``: NaNs will be omitted when performing the calculation.
      If insufficient data remains in the axis slice along which the
      statistic is computed, the corresponding entry of the output will be
      NaN.
    - ``raise``: if a NaN is present, a ``ValueError`` will be raised.
keepdims : bool, default: False
    If this is set to True, the axes which are reduced are left
    in the result as dimensions with size one. With this option,
    the result will broadcast correctly against the input array.

Returns
-------
res : SignificanceResult
    An object containing attributes:
    
    statistic : float
        The xi correlation statistic.
    pvalue : float
        The associated *p*-value: the probability of a statistic at least as
        high as the observed value under the null hypothesis of independence.

See Also
--------

:func:`scipy.stats.pearsonr`, :func:`scipy.stats.spearmanr`, :func:`scipy.stats.kendalltau`
    ..

Notes
-----
There is currently no special handling of ties in `x`; they are broken arbitrarily
by the implementation.

[1]_ notes that the statistic is not symmetric in `x` and `y` *by design*:
"...we may want to understand if :math:`Y` is a function :math:`X`, and not just
if one of the variables is a function of the other." See [1]_ Remark 1.

Beginning in SciPy 1.9, ``np.matrix`` inputs (not recommended for new
code) are converted to ``np.ndarray`` before the calculation is performed. In
this case, the output will be a scalar or ``np.ndarray`` of appropriate shape
rather than a 2D ``np.matrix``. Similarly, while masked elements of masked
arrays are ignored, the output will be a scalar or ``np.ndarray`` rather than a
masked array with ``mask=False``.

References
----------
.. [1] Chatterjee, Sourav. "A new coefficient of correlation." Journal of
       the American Statistical Association 116.536 (2021): 2009-2022.
       :doi:`10.1080/01621459.2020.1758115`.

Examples
--------
Generate perfectly correlated data, and observe that the xi correlation is
nearly 1.0.

>>> import numpy as np
>>> from scipy import stats
>>> rng = np.random.default_rng(348932549825235)
>>> x = rng.uniform(0, 10, size=100)
>>> y = np.sin(x)
>>> res = stats.chatterjeexi(x, y)
>>> res.statistic
np.float64(0.9012901290129013)

The probability of observing such a high value of the statistic under the
null hypothesis of independence is very low.

>>> res.pvalue
np.float64(2.2206974648177804e-46)

As noise is introduced, the correlation coefficient decreases.

>>> noise = rng.normal(scale=[[0.1], [0.5], [1]], size=(3, 100))
>>> res = stats.chatterjeexi(x, y + noise, axis=-1)
>>> res.statistic
array([0.79507951, 0.41824182, 0.16651665])

Because the distribution of `y` is continuous, it is valid to pass
``y_continuous=True``. The statistic is identical, and the p-value
(not shown) is only slightly different.

>>> stats.chatterjeexi(x, y + noise, y_continuous=True, axis=-1).statistic
array([0.79507951, 0.41824182, 0.16651665])


Vous êtes un professionnel et vous avez besoin d'une formation ? Mise en oeuvre d'IHM
avec Qt et PySide6
Voir le programme détaillé