Vous êtes un professionnel et vous avez besoin d'une formation ?
RAG (Retrieval-Augmented Generation)et Fine Tuning d'un LLM
Voir le programme détaillé
Module « scipy.stats »
Signature de la fonction false_discovery_control
def false_discovery_control(ps, *, axis=0, method='bh')
Description
help(scipy.stats.false_discovery_control)
Adjust p-values to control the false discovery rate.
The false discovery rate (FDR) is the expected proportion of rejected null
hypotheses that are actually true.
If the null hypothesis is rejected when the *adjusted* p-value falls below
a specified level, the false discovery rate is controlled at that level.
Parameters
----------
ps : 1D array_like
The p-values to adjust. Elements must be real numbers between 0 and 1.
axis : int
The axis along which to perform the adjustment. The adjustment is
performed independently along each axis-slice. If `axis` is None, `ps`
is raveled before performing the adjustment.
method : {'bh', 'by'}
The false discovery rate control procedure to apply: ``'bh'`` is for
Benjamini-Hochberg [1]_ (Eq. 1), ``'by'`` is for Benjaminini-Yekutieli
[2]_ (Theorem 1.3). The latter is more conservative, but it is
guaranteed to control the FDR even when the p-values are not from
independent tests.
Returns
-------
ps_adusted : array_like
The adjusted p-values. If the null hypothesis is rejected where these
fall below a specified level, the false discovery rate is controlled
at that level.
See Also
--------
combine_pvalues
statsmodels.stats.multitest.multipletests
Notes
-----
In multiple hypothesis testing, false discovery control procedures tend to
offer higher power than familywise error rate control procedures (e.g.
Bonferroni correction [1]_).
If the p-values correspond with independent tests (or tests with
"positive regression dependencies" [2]_), rejecting null hypotheses
corresponding with Benjamini-Hochberg-adjusted p-values below :math:`q`
controls the false discovery rate at a level less than or equal to
:math:`q m_0 / m`, where :math:`m_0` is the number of true null hypotheses
and :math:`m` is the total number of null hypotheses tested. The same is
true even for dependent tests when the p-values are adjusted accorded to
the more conservative Benjaminini-Yekutieli procedure.
The adjusted p-values produced by this function are comparable to those
produced by the R function ``p.adjust`` and the statsmodels function
`statsmodels.stats.multitest.multipletests`. Please consider the latter
for more advanced methods of multiple comparison correction.
References
----------
.. [1] Benjamini, Yoav, and Yosef Hochberg. "Controlling the false
discovery rate: a practical and powerful approach to multiple
testing." Journal of the Royal statistical society: series B
(Methodological) 57.1 (1995): 289-300.
.. [2] Benjamini, Yoav, and Daniel Yekutieli. "The control of the false
discovery rate in multiple testing under dependency." Annals of
statistics (2001): 1165-1188.
.. [3] TileStats. FDR - Benjamini-Hochberg explained - Youtube.
https://www.youtube.com/watch?v=rZKa4tW2NKs.
.. [4] Neuhaus, Karl-Ludwig, et al. "Improved thrombolysis in acute
myocardial infarction with front-loaded administration of alteplase:
results of the rt-PA-APSAC patency study (TAPS)." Journal of the
American College of Cardiology 19.5 (1992): 885-891.
Examples
--------
We follow the example from [1]_.
Thrombolysis with recombinant tissue-type plasminogen activator (rt-PA)
and anisoylated plasminogen streptokinase activator (APSAC) in
myocardial infarction has been proved to reduce mortality. [4]_
investigated the effects of a new front-loaded administration of rt-PA
versus those obtained with a standard regimen of APSAC, in a randomized
multicentre trial in 421 patients with acute myocardial infarction.
There were four families of hypotheses tested in the study, the last of
which was "cardiac and other events after the start of thrombolitic
treatment". FDR control may be desired in this family of hypotheses
because it would not be appropriate to conclude that the front-loaded
treatment is better if it is merely equivalent to the previous treatment.
The p-values corresponding with the 15 hypotheses in this family were
>>> ps = [0.0001, 0.0004, 0.0019, 0.0095, 0.0201, 0.0278, 0.0298, 0.0344,
... 0.0459, 0.3240, 0.4262, 0.5719, 0.6528, 0.7590, 1.000]
If the chosen significance level is 0.05, we may be tempted to reject the
null hypotheses for the tests corresponding with the first nine p-values,
as the first nine p-values fall below the chosen significance level.
However, this would ignore the problem of "multiplicity": if we fail to
correct for the fact that multiple comparisons are being performed, we
are more likely to incorrectly reject true null hypotheses.
One approach to the multiplicity problem is to control the family-wise
error rate (FWER), that is, the rate at which the null hypothesis is
rejected when it is actually true. A common procedure of this kind is the
Bonferroni correction [1]_. We begin by multiplying the p-values by the
number of hypotheses tested.
>>> import numpy as np
>>> np.array(ps) * len(ps)
array([1.5000e-03, 6.0000e-03, 2.8500e-02, 1.4250e-01, 3.0150e-01,
4.1700e-01, 4.4700e-01, 5.1600e-01, 6.8850e-01, 4.8600e+00,
6.3930e+00, 8.5785e+00, 9.7920e+00, 1.1385e+01, 1.5000e+01])
To control the FWER at 5%, we reject only the hypotheses corresponding
with adjusted p-values less than 0.05. In this case, only the hypotheses
corresponding with the first three p-values can be rejected. According to
[1]_, these three hypotheses concerned "allergic reaction" and "two
different aspects of bleeding."
An alternative approach is to control the false discovery rate: the
expected fraction of rejected null hypotheses that are actually true. The
advantage of this approach is that it typically affords greater power: an
increased rate of rejecting the null hypothesis when it is indeed false. To
control the false discovery rate at 5%, we apply the Benjamini-Hochberg
p-value adjustment.
>>> from scipy import stats
>>> stats.false_discovery_control(ps)
array([0.0015 , 0.003 , 0.0095 , 0.035625 , 0.0603 ,
0.06385714, 0.06385714, 0.0645 , 0.0765 , 0.486 ,
0.58118182, 0.714875 , 0.75323077, 0.81321429, 1. ])
Now, the first *four* adjusted p-values fall below 0.05, so we would reject
the null hypotheses corresponding with these *four* p-values. Rejection
of the fourth null hypothesis was particularly important to the original
study as it led to the conclusion that the new treatment had a
"substantially lower in-hospital mortality rate."
Vous êtes un professionnel et vous avez besoin d'une formation ?
Sensibilisation àl'Intelligence Artificielle
Voir le programme détaillé
Améliorations / Corrections
Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.
Emplacement :
Description des améliorations :