Fonction cramervonmises_2samp - module scipy.stats

Signature de la fonction cramervonmises_2samp

def cramervonmises_2samp(x, y, method='auto', *, axis=0, nan_policy='propagate', keepdims=False)

Description

help(scipy.stats.cramervonmises_2samp)

    


Perform the two-sample Cramér-von Mises test for goodness of fit.

This is the two-sample version of the Cramér-von Mises test ([1]_):
for two independent samples :math:`X_1, ..., X_n` and
:math:`Y_1, ..., Y_m`, the null hypothesis is that the samples
come from the same (unspecified) continuous distribution.

Parameters
----------
x : array_like
    A 1-D array of observed values of the random variables :math:`X_i`.
    Must contain at least two observations.
y : array_like
    A 1-D array of observed values of the random variables :math:`Y_i`.
    Must contain at least two observations.
method : {'auto', 'asymptotic', 'exact'}, optional
    The method used to compute the p-value, see Notes for details.
    The default is 'auto'.
axis : int or None, default: 0
    If an int, the axis of the input along which to compute the statistic.
    The statistic of each axis-slice (e.g. row) of the input will appear in a
    corresponding element of the output.
    If ``None``, the input will be raveled before computing the statistic.
nan_policy : {'propagate', 'omit', 'raise'}
    Defines how to handle input NaNs.
    
    - ``propagate``: if a NaN is present in the axis slice (e.g. row) along
      which the  statistic is computed, the corresponding entry of the output
      will be NaN.
    - ``omit``: NaNs will be omitted when performing the calculation.
      If insufficient data remains in the axis slice along which the
      statistic is computed, the corresponding entry of the output will be
      NaN.
    - ``raise``: if a NaN is present, a ``ValueError`` will be raised.
keepdims : bool, default: False
    If this is set to True, the axes which are reduced are left
    in the result as dimensions with size one. With this option,
    the result will broadcast correctly against the input array.

Returns
-------
res : object with attributes
    statistic : float
        Cramér-von Mises statistic.
    pvalue : float
        The p-value.

See Also
--------

:func:`cramervonmises`, :func:`anderson_ksamp`, :func:`epps_singleton_2samp`, :func:`ks_2samp`
    ..

Notes
-----
.. versionadded:: 1.7.0

The statistic is computed according to equation 9 in [2]_. The
calculation of the p-value depends on the keyword `method`:

- ``asymptotic``: The p-value is approximated by using the limiting
  distribution of the test statistic.
- ``exact``: The exact p-value is computed by enumerating all
  possible combinations of the test statistic, see [2]_.

If ``method='auto'``, the exact approach is used
if both samples contain equal to or less than 20 observations,
otherwise the asymptotic distribution is used.

If the underlying distribution is not continuous, the p-value is likely to
be conservative (Section 6.2 in [3]_). When ranking the data to compute
the test statistic, midranks are used if there are ties.

Beginning in SciPy 1.9, ``np.matrix`` inputs (not recommended for new
code) are converted to ``np.ndarray`` before the calculation is performed. In
this case, the output will be a scalar or ``np.ndarray`` of appropriate shape
rather than a 2D ``np.matrix``. Similarly, while masked elements of masked
arrays are ignored, the output will be a scalar or ``np.ndarray`` rather than a
masked array with ``mask=False``.

References
----------
.. [1] https://en.wikipedia.org/wiki/Cramer-von_Mises_criterion
.. [2] Anderson, T.W. (1962). On the distribution of the two-sample
       Cramer-von-Mises criterion. The Annals of Mathematical
       Statistics, pp. 1148-1159.
.. [3] Conover, W.J., Practical Nonparametric Statistics, 1971.

Examples
--------
Suppose we wish to test whether two samples generated by
``scipy.stats.norm.rvs`` have the same distribution. We choose a
significance level of alpha=0.05.

>>> import numpy as np
>>> from scipy import stats
>>> rng = np.random.default_rng()
>>> x = stats.norm.rvs(size=100, random_state=rng)
>>> y = stats.norm.rvs(size=70, random_state=rng)
>>> res = stats.cramervonmises_2samp(x, y)
>>> res.statistic, res.pvalue
(0.29376470588235293, 0.1412873014573014)

The p-value exceeds our chosen significance level, so we do not
reject the null hypothesis that the observed samples are drawn from the
same distribution.

For small sample sizes, one can compute the exact p-values:

>>> x = stats.norm.rvs(size=7, random_state=rng)
>>> y = stats.t.rvs(df=2, size=6, random_state=rng)
>>> res = stats.cramervonmises_2samp(x, y, method='exact')
>>> res.statistic, res.pvalue
(0.197802197802198, 0.31643356643356646)

The p-value based on the asymptotic distribution is a good approximation
even though the sample size is small.

>>> res = stats.cramervonmises_2samp(x, y, method='asymptotic')
>>> res.statistic, res.pvalue
(0.197802197802198, 0.2966041181527128)

Independent of the method, one would not reject the null hypothesis at the
chosen significance level in this example.

Vous êtes un professionnel et vous avez besoin d'une formation ? Machine Learning
avec Scikit-Learn Voir le programme détaillé

Le tutoriel Python complet (Text+Vidéos)

Le tutoriel Python en vidéos

Evaluez vos compétences en Python

Améliorations / Corrections

Fonction cramervonmises_2samp - module scipy.stats

Signature de la fonction cramervonmises_2samp

Description

help(scipy.stats.cramervonmises_2samp)