Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Vous êtes un professionnel et vous avez besoin d'une formation ? Programmation Python
Les fondamentaux
Voir le programme détaillé
Module « scipy.stats.mstats »

Fonction chisquare - module scipy.stats.mstats

Signature de la fonction chisquare

def chisquare(f_obs, f_exp=None, ddof=0, axis=0, *, sum_check=True) 

Description

help(scipy.stats.mstats.chisquare)

Perform Pearson's chi-squared test.

Pearson's chi-squared test [1]_ is a goodness-of-fit test for a multinomial
distribution with given probabilities; that is, it assesses the null hypothesis
that the observed frequencies (counts) are obtained by independent
sampling of *N* observations from a categorical distribution with given
expected frequencies.

Parameters
----------
f_obs : array_like
    Observed frequencies in each category.
f_exp : array_like, optional
    Expected frequencies in each category. By default, the categories are
    assumed to be equally likely.
ddof : int, optional
    "Delta degrees of freedom": adjustment to the degrees of freedom
    for the p-value.  The p-value is computed using a chi-squared
    distribution with ``k - 1 - ddof`` degrees of freedom, where ``k``
    is the number of categories.  The default value of `ddof` is 0.
axis : int or None, optional
    The axis of the broadcast result of `f_obs` and `f_exp` along which to
    apply the test.  If axis is None, all values in `f_obs` are treated
    as a single data set.  Default is 0.
sum_check : bool, optional
    Whether to perform a check that ``sum(f_obs) - sum(f_exp) == 0``. If True,
    (default) raise an error when the relative difference exceeds the square root
    of the precision of the data type. See Notes for rationale and possible
    exceptions.

Returns
-------
res: Power_divergenceResult
    An object containing attributes:

    statistic : float or ndarray
        The chi-squared test statistic.  The value is a float if `axis` is
        None or `f_obs` and `f_exp` are 1-D.
    pvalue : float or ndarray
        The p-value of the test.  The value is a float if `ddof` and the
        result attribute `statistic` are scalars.

See Also
--------
scipy.stats.power_divergence
scipy.stats.fisher_exact : Fisher exact test on a 2x2 contingency table.
scipy.stats.barnard_exact : An unconditional exact test. An alternative
    to chi-squared test for small sample sizes.
:ref:`hypothesis_chisquare` : Extended example

Notes
-----
This test is invalid when the observed or expected frequencies in each
category are too small.  A typical rule is that all of the observed
and expected frequencies should be at least 5. According to [2]_, the
total number of observations is recommended to be greater than 13,
otherwise exact tests (such as Barnard's Exact test) should be used
because they do not overreject.

The default degrees of freedom, k-1, are for the case when no parameters
of the distribution are estimated. If p parameters are estimated by
efficient maximum likelihood then the correct degrees of freedom are
k-1-p. If the parameters are estimated in a different way, then the
dof can be between k-1-p and k-1. However, it is also possible that
the asymptotic distribution is not chi-square, in which case this test
is not appropriate.

For Pearson's chi-squared test, the total observed and expected counts must match
for the p-value to accurately reflect the probability of observing such an extreme
value of the statistic under the null hypothesis.
This function may be used to perform other statistical tests that do not require
the total counts to be equal. For instance, to test the null hypothesis that
``f_obs[i]`` is Poisson-distributed with expectation ``f_exp[i]``, set ``ddof=-1``
and ``sum_check=False``. This test follows from the fact that a Poisson random
variable with mean and variance ``f_exp[i]`` is approximately normal with the
same mean and variance; the chi-squared statistic standardizes, squares, and sums
the observations; and the sum of ``n`` squared standard normal variables follows
the chi-squared distribution with ``n`` degrees of freedom.

References
----------
.. [1] "Pearson's chi-squared test".
       *Wikipedia*. https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test
.. [2] Pearson, Karl. "On the criterion that a given system of deviations from the probable
       in the case of a correlated system of variables is such that it can be reasonably
       supposed to have arisen from random sampling", Philosophical Magazine. Series 5. 50
       (1900), pp. 157-175.

Examples
--------
When only the mandatory `f_obs` argument is given, it is assumed that the
expected frequencies are uniform and given by the mean of the observed
frequencies:

>>> import numpy as np
>>> from scipy.stats import chisquare
>>> chisquare([16, 18, 16, 14, 12, 12])
Power_divergenceResult(statistic=2.0, pvalue=0.84914503608460956)

The optional `f_exp` argument gives the expected frequencies.

>>> chisquare([16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8])
Power_divergenceResult(statistic=3.5, pvalue=0.62338762774958223)

When `f_obs` is 2-D, by default the test is applied to each column.

>>> obs = np.array([[16, 18, 16, 14, 12, 12], [32, 24, 16, 28, 20, 24]]).T
>>> obs.shape
(6, 2)
>>> chisquare(obs)
Power_divergenceResult(statistic=array([2.        , 6.66666667]), pvalue=array([0.84914504, 0.24663415]))

By setting ``axis=None``, the test is applied to all data in the array,
which is equivalent to applying the test to the flattened array.

>>> chisquare(obs, axis=None)
Power_divergenceResult(statistic=23.31034482758621, pvalue=0.015975692534127565)
>>> chisquare(obs.ravel())
Power_divergenceResult(statistic=23.310344827586206, pvalue=0.01597569253412758)

`ddof` is the change to make to the default degrees of freedom.

>>> chisquare([16, 18, 16, 14, 12, 12], ddof=1)
Power_divergenceResult(statistic=2.0, pvalue=0.7357588823428847)

The calculation of the p-values is done by broadcasting the
chi-squared statistic with `ddof`.

>>> chisquare([16, 18, 16, 14, 12, 12], ddof=[0, 1, 2])
Power_divergenceResult(statistic=2.0, pvalue=array([0.84914504, 0.73575888, 0.5724067 ]))

`f_obs` and `f_exp` are also broadcast.  In the following, `f_obs` has
shape (6,) and `f_exp` has shape (2, 6), so the result of broadcasting
`f_obs` and `f_exp` has shape (2, 6).  To compute the desired chi-squared
statistics, we use ``axis=1``:

>>> chisquare([16, 18, 16, 14, 12, 12],
...           f_exp=[[16, 16, 16, 16, 16, 8], [8, 20, 20, 16, 12, 12]],
...           axis=1)
Power_divergenceResult(statistic=array([3.5 , 9.25]), pvalue=array([0.62338763, 0.09949846]))

For a more detailed example, see :ref:`hypothesis_chisquare`.


Vous êtes un professionnel et vous avez besoin d'une formation ? Machine Learning
avec Scikit-Learn
Voir le programme détaillé