Vous êtes un professionnel et vous avez besoin d'une formation ?
Programmation Python
Les fondamentaux
Voir le programme détaillé
Module « scipy.stats »
Signature de la fonction combine_pvalues
def combine_pvalues(pvalues, method='fisher', weights=None, *, axis=0, nan_policy='propagate', keepdims=False)
Description
help(scipy.stats.combine_pvalues)
Combine p-values from independent tests that bear upon the same hypothesis.
These methods are intended only for combining p-values from hypothesis
tests based upon continuous distributions.
Each method assumes that under the null hypothesis, the p-values are
sampled independently and uniformly from the interval [0, 1]. A test
statistic (different for each method) is computed and a combined
p-value is calculated based upon the distribution of this test statistic
under the null hypothesis.
Parameters
----------
pvalues : array_like
Array of p-values assumed to come from independent tests based on
continuous distributions.
method : {'fisher', 'pearson', 'tippett', 'stouffer', 'mudholkar_george'}
Name of method to use to combine p-values.
The available methods are (see Notes for details):
* 'fisher': Fisher's method (Fisher's combined probability test)
* 'pearson': Pearson's method
* 'mudholkar_george': Mudholkar's and George's method
* 'tippett': Tippett's method
* 'stouffer': Stouffer's Z-score method
weights : array_like, optional
Optional array of weights used only for Stouffer's Z-score method.
Ignored by other methods.
axis : int or None, default: 0
If an int, the axis of the input along which to compute the statistic.
The statistic of each axis-slice (e.g. row) of the input will appear in a
corresponding element of the output.
If ``None``, the input will be raveled before computing the statistic.
nan_policy : {'propagate', 'omit', 'raise'}
Defines how to handle input NaNs.
- ``propagate``: if a NaN is present in the axis slice (e.g. row) along
which the statistic is computed, the corresponding entry of the output
will be NaN.
- ``omit``: NaNs will be omitted when performing the calculation.
If insufficient data remains in the axis slice along which the
statistic is computed, the corresponding entry of the output will be
NaN.
- ``raise``: if a NaN is present, a ``ValueError`` will be raised.
keepdims : bool, default: False
If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
Returns
-------
res : SignificanceResult
An object containing attributes:
statistic : float
The statistic calculated by the specified method.
pvalue : float
The combined p-value.
Notes
-----
If this function is applied to tests with a discrete statistics such as
any rank test or contingency-table test, it will yield systematically
wrong results, e.g. Fisher's method will systematically overestimate the
p-value [1]_. This problem becomes less severe for large sample sizes
when the discrete distributions become approximately continuous.
The differences between the methods can be best illustrated by their
statistics and what aspects of a combination of p-values they emphasise
when considering significance [2]_. For example, methods emphasising large
p-values are more sensitive to strong false and true negatives; conversely
methods focussing on small p-values are sensitive to positives.
* The statistics of Fisher's method (also known as Fisher's combined
probability test) [3]_ is :math:`-2\sum_i \log(p_i)`, which is
equivalent (as a test statistics) to the product of individual p-values:
:math:`\prod_i p_i`. Under the null hypothesis, this statistics follows
a :math:`\chi^2` distribution. This method emphasises small p-values.
* Pearson's method uses :math:`-2\sum_i\log(1-p_i)`, which is equivalent
to :math:`\prod_i \frac{1}{1-p_i}` [2]_.
It thus emphasises large p-values.
* Mudholkar and George compromise between Fisher's and Pearson's method by
averaging their statistics [4]_. Their method emphasises extreme
p-values, both close to 1 and 0.
* Stouffer's method [5]_ uses Z-scores and the statistic:
:math:`\sum_i \Phi^{-1} (p_i)`, where :math:`\Phi` is the CDF of the
standard normal distribution. The advantage of this method is that it is
straightforward to introduce weights, which can make Stouffer's method
more powerful than Fisher's method when the p-values are from studies
of different size [6]_ [7]_.
* Tippett's method uses the smallest p-value as a statistic.
(Mind that this minimum is not the combined p-value.)
Fisher's method may be extended to combine p-values from dependent tests
[8]_. Extensions such as Brown's method and Kost's method are not currently
implemented.
.. versionadded:: 0.15.0
Beginning in SciPy 1.9, ``np.matrix`` inputs (not recommended for new
code) are converted to ``np.ndarray`` before the calculation is performed. In
this case, the output will be a scalar or ``np.ndarray`` of appropriate shape
rather than a 2D ``np.matrix``. Similarly, while masked elements of masked
arrays are ignored, the output will be a scalar or ``np.ndarray`` rather than a
masked array with ``mask=False``.
References
----------
.. [1] Kincaid, W. M., "The Combination of Tests Based on Discrete
Distributions." Journal of the American Statistical Association 57,
no. 297 (1962), 10-19.
.. [2] Heard, N. and Rubin-Delanchey, P. "Choosing between methods of
combining p-values." Biometrika 105.1 (2018): 239-246.
.. [3] https://en.wikipedia.org/wiki/Fisher%27s_method
.. [4] George, E. O., and G. S. Mudholkar. "On the convolution of logistic
random variables." Metrika 30.1 (1983): 1-13.
.. [5] https://en.wikipedia.org/wiki/Fisher%27s_method#Relation_to_Stouffer.27s_Z-score_method
.. [6] Whitlock, M. C. "Combining probability from independent tests: the
weighted Z-method is superior to Fisher's approach." Journal of
Evolutionary Biology 18, no. 5 (2005): 1368-1373.
.. [7] Zaykin, Dmitri V. "Optimally weighted Z-test is a powerful method
for combining probabilities in meta-analysis." Journal of
Evolutionary Biology 24, no. 8 (2011): 1836-1841.
.. [8] https://en.wikipedia.org/wiki/Extensions_of_Fisher%27s_method
Examples
--------
Suppose we wish to combine p-values from four independent tests
of the same null hypothesis using Fisher's method (default).
>>> from scipy.stats import combine_pvalues
>>> pvalues = [0.1, 0.05, 0.02, 0.3]
>>> combine_pvalues(pvalues)
SignificanceResult(statistic=20.828626352604235, pvalue=0.007616871850449092)
When the individual p-values carry different weights, consider Stouffer's
method.
>>> weights = [1, 2, 3, 4]
>>> res = combine_pvalues(pvalues, method='stouffer', weights=weights)
>>> res.pvalue
0.009578891494533616
Vous êtes un professionnel et vous avez besoin d'une formation ?
Machine Learning
avec Scikit-Learn
Voir le programme détaillé
Améliorations / Corrections
Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.
Emplacement :
Description des améliorations :