Module « scipy.stats.mstats »
Signature de la fonction chisquare
def chisquare(f_obs, f_exp=None, ddof=0, axis=0)
Description
chisquare.__doc__
Calculate a one-way chi-square test.
The chi-square test tests the null hypothesis that the categorical data
has the given frequencies.
Parameters
----------
f_obs : array_like
Observed frequencies in each category.
f_exp : array_like, optional
Expected frequencies in each category. By default the categories are
assumed to be equally likely.
ddof : int, optional
"Delta degrees of freedom": adjustment to the degrees of freedom
for the p-value. The p-value is computed using a chi-squared
distribution with ``k - 1 - ddof`` degrees of freedom, where `k`
is the number of observed frequencies. The default value of `ddof`
is 0.
axis : int or None, optional
The axis of the broadcast result of `f_obs` and `f_exp` along which to
apply the test. If axis is None, all values in `f_obs` are treated
as a single data set. Default is 0.
Returns
-------
chisq : float or ndarray
The chi-squared test statistic. The value is a float if `axis` is
None or `f_obs` and `f_exp` are 1-D.
p : float or ndarray
The p-value of the test. The value is a float if `ddof` and the
return value `chisq` are scalars.
See Also
--------
scipy.stats.power_divergence
scipy.stats.fisher_exact : Fisher exact test on a 2x2 contingency table.
scipy.stats.barnard_exact : An unconditional exact test. An alternative
to chi-squared test for small sample sizes.
Notes
-----
This test is invalid when the observed or expected frequencies in each
category are too small. A typical rule is that all of the observed
and expected frequencies should be at least 5. According to [3]_, the
total number of samples is recommended to be greater than 13,
otherwise exact tests (such as Barnard's Exact test) should be used
because they do not overreject.
Also, the sum of the observed and expected frequencies must be the same
for the test to be valid; `chisquare` raises an error if the sums do not
agree within a relative tolerance of ``1e-8``.
The default degrees of freedom, k-1, are for the case when no parameters
of the distribution are estimated. If p parameters are estimated by
efficient maximum likelihood then the correct degrees of freedom are
k-1-p. If the parameters are estimated in a different way, then the
dof can be between k-1-p and k-1. However, it is also possible that
the asymptotic distribution is not chi-square, in which case this test
is not appropriate.
References
----------
.. [1] Lowry, Richard. "Concepts and Applications of Inferential
Statistics". Chapter 8.
https://web.archive.org/web/20171022032306/http://vassarstats.net:80/textbook/ch8pt1.html
.. [2] "Chi-squared test", https://en.wikipedia.org/wiki/Chi-squared_test
.. [3] Pearson, Karl. "On the criterion that a given system of deviations from the probable
in the case of a correlated system of variables is such that it can be reasonably
supposed to have arisen from random sampling", Philosophical Magazine. Series 5. 50
(1900), pp. 157-175.
Examples
--------
When just `f_obs` is given, it is assumed that the expected frequencies
are uniform and given by the mean of the observed frequencies.
>>> from scipy.stats import chisquare
>>> chisquare([16, 18, 16, 14, 12, 12])
(2.0, 0.84914503608460956)
With `f_exp` the expected frequencies can be given.
>>> chisquare([16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8])
(3.5, 0.62338762774958223)
When `f_obs` is 2-D, by default the test is applied to each column.
>>> obs = np.array([[16, 18, 16, 14, 12, 12], [32, 24, 16, 28, 20, 24]]).T
>>> obs.shape
(6, 2)
>>> chisquare(obs)
(array([ 2. , 6.66666667]), array([ 0.84914504, 0.24663415]))
By setting ``axis=None``, the test is applied to all data in the array,
which is equivalent to applying the test to the flattened array.
>>> chisquare(obs, axis=None)
(23.31034482758621, 0.015975692534127565)
>>> chisquare(obs.ravel())
(23.31034482758621, 0.015975692534127565)
`ddof` is the change to make to the default degrees of freedom.
>>> chisquare([16, 18, 16, 14, 12, 12], ddof=1)
(2.0, 0.73575888234288467)
The calculation of the p-values is done by broadcasting the
chi-squared statistic with `ddof`.
>>> chisquare([16, 18, 16, 14, 12, 12], ddof=[0,1,2])
(2.0, array([ 0.84914504, 0.73575888, 0.5724067 ]))
`f_obs` and `f_exp` are also broadcast. In the following, `f_obs` has
shape (6,) and `f_exp` has shape (2, 6), so the result of broadcasting
`f_obs` and `f_exp` has shape (2, 6). To compute the desired chi-squared
statistics, we use ``axis=1``:
>>> chisquare([16, 18, 16, 14, 12, 12],
... f_exp=[[16, 16, 16, 16, 16, 8], [8, 20, 20, 16, 12, 12]],
... axis=1)
(array([ 3.5 , 9.25]), array([ 0.62338763, 0.09949846]))
Améliorations / Corrections
Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.
Emplacement :
Description des améliorations :