Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Vous êtes un professionnel et vous avez besoin d'une formation ? Machine Learning
avec Scikit-Learn
Voir le programme détaillé
Module « scipy.stats »

Fonction median_test - module scipy.stats

Signature de la fonction median_test

def median_test(*samples, ties='below', correction=True, lambda_=1, nan_policy='propagate') 

Description

help(scipy.stats.median_test)

Perform a Mood's median test.

Test that two or more samples come from populations with the same median.

Let ``n = len(samples)`` be the number of samples.  The "grand median" of
all the data is computed, and a contingency table is formed by
classifying the values in each sample as being above or below the grand
median.  The contingency table, along with `correction` and `lambda_`,
are passed to `scipy.stats.chi2_contingency` to compute the test statistic
and p-value.

Parameters
----------
sample1, sample2, ... : array_like
    The set of samples.  There must be at least two samples.
    Each sample must be a one-dimensional sequence containing at least
    one value.  The samples are not required to have the same length.
ties : str, optional
    Determines how values equal to the grand median are classified in
    the contingency table.  The string must be one of::

        "below":
            Values equal to the grand median are counted as "below".
        "above":
            Values equal to the grand median are counted as "above".
        "ignore":
            Values equal to the grand median are not counted.

    The default is "below".
correction : bool, optional
    If True, *and* there are just two samples, apply Yates' correction
    for continuity when computing the test statistic associated with
    the contingency table.  Default is True.
lambda_ : float or str, optional
    By default, the statistic computed in this test is Pearson's
    chi-squared statistic.  `lambda_` allows a statistic from the
    Cressie-Read power divergence family to be used instead.  See
    `power_divergence` for details.
    Default is 1 (Pearson's chi-squared statistic).
nan_policy : {'propagate', 'raise', 'omit'}, optional
    Defines how to handle when input contains nan. 'propagate' returns nan,
    'raise' throws an error, 'omit' performs the calculations ignoring nan
    values. Default is 'propagate'.

Returns
-------
res : MedianTestResult
    An object containing attributes:

    statistic : float
        The test statistic.  The statistic that is returned is determined
        by `lambda_`.  The default is Pearson's chi-squared statistic.
    pvalue : float
        The p-value of the test.
    median : float
        The grand median.
    table : ndarray
        The contingency table.  The shape of the table is (2, n), where
        n is the number of samples.  The first row holds the counts of the
        values above the grand median, and the second row holds the counts
        of the values below the grand median.  The table allows further
        analysis with, for example, `scipy.stats.chi2_contingency`, or with
        `scipy.stats.fisher_exact` if there are two samples, without having
        to recompute the table.  If ``nan_policy`` is "propagate" and there
        are nans in the input, the return value for ``table`` is ``None``.

See Also
--------
kruskal : Compute the Kruskal-Wallis H-test for independent samples.
mannwhitneyu : Computes the Mann-Whitney rank test on samples x and y.

Notes
-----
.. versionadded:: 0.15.0

References
----------
.. [1] Mood, A. M., Introduction to the Theory of Statistics. McGraw-Hill
    (1950), pp. 394-399.
.. [2] Zar, J. H., Biostatistical Analysis, 5th ed. Prentice Hall (2010).
    See Sections 8.12 and 10.15.

Examples
--------
A biologist runs an experiment in which there are three groups of plants.
Group 1 has 16 plants, group 2 has 15 plants, and group 3 has 17 plants.
Each plant produces a number of seeds.  The seed counts for each group
are::

    Group 1: 10 14 14 18 20 22 24 25 31 31 32 39 43 43 48 49
    Group 2: 28 30 31 33 34 35 36 40 44 55 57 61 91 92 99
    Group 3:  0  3  9 22 23 25 25 33 34 34 40 45 46 48 62 67 84

The following code applies Mood's median test to these samples.

>>> g1 = [10, 14, 14, 18, 20, 22, 24, 25, 31, 31, 32, 39, 43, 43, 48, 49]
>>> g2 = [28, 30, 31, 33, 34, 35, 36, 40, 44, 55, 57, 61, 91, 92, 99]
>>> g3 = [0, 3, 9, 22, 23, 25, 25, 33, 34, 34, 40, 45, 46, 48, 62, 67, 84]
>>> from scipy.stats import median_test
>>> res = median_test(g1, g2, g3)

The median is

>>> res.median
34.0

and the contingency table is

>>> res.table
array([[ 5, 10,  7],
       [11,  5, 10]])

`p` is too large to conclude that the medians are not the same:

>>> res.pvalue
0.12609082774093244

The "G-test" can be performed by passing ``lambda_="log-likelihood"`` to
`median_test`.

>>> res = median_test(g1, g2, g3, lambda_="log-likelihood")
>>> res.pvalue
0.12224779737117837

The median occurs several times in the data, so we'll get a different
result if, for example, ``ties="above"`` is used:

>>> res = median_test(g1, g2, g3, ties="above")
>>> res.pvalue
0.063873276069553273

>>> res.table
array([[ 5, 11,  9],
       [11,  4,  8]])

This example demonstrates that if the data set is not large and there
are values equal to the median, the p-value can be sensitive to the
choice of `ties`.



Vous êtes un professionnel et vous avez besoin d'une formation ? RAG (Retrieval-Augmented Generation)
et Fine Tuning d'un LLM
Voir le programme détaillé