Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Module « scipy.stats »

Fonction somersd - module scipy.stats

Signature de la fonction somersd

def somersd(x, y=None) 

Description

somersd.__doc__

Calculates Somers' D, an asymmetric measure of ordinal association.

    Like Kendall's :math:`\tau`, Somers' :math:`D` is a measure of the
    correspondence between two rankings. Both statistics consider the
    difference between the number of concordant and discordant pairs in two
    rankings :math:`X` and :math:`Y`, and both are normalized such that values
    close  to 1 indicate strong agreement and values close to -1 indicate
    strong disagreement. They differ in how they are normalized. To show the
    relationship, Somers' :math:`D` can be defined in terms of Kendall's
    :math:`\tau_a`:

    .. math::
        D(Y|X) = \frac{\tau_a(X, Y)}{\tau_a(X, X)}

    Suppose the first ranking :math:`X` has :math:`r` distinct ranks and the
    second ranking :math:`Y` has :math:`s` distinct ranks. These two lists of
    :math:`n` rankings can also be viewed as an :math:`r \times s` contingency
    table in which element :math:`i, j` is the number of rank pairs with rank
    :math:`i` in ranking :math:`X` and rank :math:`j` in ranking :math:`Y`.
    Accordingly, `somersd` also allows the input data to be supplied as a
    single, 2D contingency table instead of as two separate, 1D rankings.

    Note that the definition of Somers' :math:`D` is asymmetric: in general,
    :math:`D(Y|X) \neq D(X|Y)`. ``somersd(x, y)`` calculates Somers'
    :math:`D(Y|X)`: the "row" variable :math:`X` is treated as an independent
    variable, and the "column" variable :math:`Y` is dependent. For Somers'
    :math:`D(X|Y)`, swap the input lists or transpose the input table.

    Parameters
    ----------
    x: array_like
        1D array of rankings, treated as the (row) independent variable.
        Alternatively, a 2D contingency table.
    y: array_like
        If `x` is a 1D array of rankings, `y` is a 1D array of rankings of the
        same length, treated as the (column) dependent variable.
        If `x` is 2D, `y` is ignored.

    Returns
    -------
    res : SomersDResult
        A `SomersDResult` object with the following fields:

            correlation : float
               The Somers' :math:`D` statistic.
            pvalue : float
               The two-sided p-value for a hypothesis test whose null
               hypothesis is an absence of association, :math:`D=0`.
               See notes for more information.
            table : 2D array
               The contingency table formed from rankings `x` and `y` (or the
               provided contingency table, if `x` is a 2D array)

    See Also
    --------
    kendalltau : Calculates Kendall's tau, another correlation measure.
    weightedtau : Computes a weighted version of Kendall's tau.
    spearmanr : Calculates a Spearman rank-order correlation coefficient.
    pearsonr : Calculates a Pearson correlation coefficient.

    Notes
    -----
    This function follows the contingency table approach of [2]_ and
    [3]_. *p*-values are computed based on an asymptotic approximation of
    the test statistic distribution under the null hypothesis :math:`D=0`.

    Theoretically, hypothesis tests based on Kendall's :math:`tau` and Somers'
    :math:`D` should be identical.
    However, the *p*-values returned by `kendalltau` are based
    on the null hypothesis of *independence* between :math:`X` and :math:`Y`
    (i.e. the population from which pairs in :math:`X` and :math:`Y` are
    sampled contains equal numbers of all possible pairs), which is more
    specific than the null hypothesis :math:`D=0` used here. If the null
    hypothesis of independence is desired, it is acceptable to use the
    *p*-value returned by `kendalltau` with the statistic returned by
    `somersd` and vice versa. For more information, see [2]_.

    Contingency tables are formatted according to the convention used by
    SAS and R: the first ranking supplied (``x``) is the "row" variable, and
    the second ranking supplied (``y``) is the "column" variable. This is
    opposite the convention of Somers' original paper [1]_.

    References
    ----------
    .. [1] Robert H. Somers, "A New Asymmetric Measure of Association for
           Ordinal Variables", *American Sociological Review*, Vol. 27, No. 6,
           pp. 799--811, 1962.

    .. [2] Morton B. Brown and Jacqueline K. Benedetti, "Sampling Behavior of
           Tests for Correlation in Two-Way Contingency Tables", *Journal of
           the American Statistical Association* Vol. 72, No. 358, pp.
           309--315, 1977.

    .. [3] SAS Institute, Inc., "The FREQ Procedure (Book Excerpt)",
           *SAS/STAT 9.2 User's Guide, Second Edition*, SAS Publishing, 2009.

    .. [4] Laerd Statistics, "Somers' d using SPSS Statistics", *SPSS
           Statistics Tutorials and Statistical Guides*,
           https://statistics.laerd.com/spss-tutorials/somers-d-using-spss-statistics.php,
           Accessed July 31, 2020.

    Examples
    --------
    We calculate Somers' D for the example given in [4]_, in which a hotel
    chain owner seeks to determine the association between hotel room
    cleanliness and customer satisfaction. The independent variable, hotel
    room cleanliness, is ranked on an ordinal scale: "below average (1)",
    "average (2)", or "above average (3)". The dependent variable, customer
    satisfaction, is ranked on a second scale: "very dissatisfied (1)",
    "moderately dissatisfied (2)", "neither dissatisfied nor satisfied (3)",
    "moderately satisfied (4)", or "very satisfied (5)". 189 customers
    respond to the survey, and the results are cast into a contingency table
    with the hotel room cleanliness as the "row" variable and customer
    satisfaction as the "column" variable.

    +-----+-----+-----+-----+-----+-----+
    |     | (1) | (2) | (3) | (4) | (5) |
    +=====+=====+=====+=====+=====+=====+
    | (1) | 27  | 25  | 14  | 7   | 0   |
    +-----+-----+-----+-----+-----+-----+
    | (2) | 7   | 14  | 18  | 35  | 12  |
    +-----+-----+-----+-----+-----+-----+
    | (3) | 1   | 3   | 2   | 7   | 17  |
    +-----+-----+-----+-----+-----+-----+

    For example, 27 customers assigned their room a cleanliness ranking of
    "below average (1)" and a corresponding satisfaction of "very
    dissatisfied (1)". We perform the analysis as follows.

    >>> from scipy.stats import somersd
    >>> table = [[27, 25, 14, 7, 0], [7, 14, 18, 35, 12], [1, 3, 2, 7, 17]]
    >>> res = somersd(table)
    >>> res.statistic
    0.6032766111513396
    >>> res.pvalue
    1.0007091191074533e-27

    The value of the Somers' D statistic is approximately 0.6, indicating
    a positive correlation between room cleanliness and customer satisfaction
    in the sample.
    The *p*-value is very small, indicating a very small probability of
    observing such an extreme value of the statistic under the null
    hypothesis that the statistic of the entire population (from which
    our sample of 189 customers is drawn) is zero. This supports the
    alternative hypothesis that the true value of Somers' D for the population
    is nonzero.