Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Vous êtes un professionnel et vous avez besoin d'une formation ? Programmation Python
Les fondamentaux
Voir le programme détaillé
Module « scipy.stats »

Fonction entropy - module scipy.stats

Signature de la fonction entropy

def entropy(pk: Union[collections.abc.Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes]], qk: Union[collections.abc.Buffer, numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes], NoneType] = None, base: float | None = None, axis: int = 0, *, nan_policy='propagate', keepdims=False) -> numpy.number | numpy.ndarray 

Description

help(scipy.stats.entropy)

    


Calculate the Shannon entropy/relative entropy of given distribution(s).

If only probabilities `pk` are given, the Shannon entropy is calculated as
``H = -sum(pk * log(pk))``.

If `qk` is not None, then compute the relative entropy
``D = sum(pk * log(pk / qk))``. This quantity is also known
as the Kullback-Leibler divergence.

This routine will normalize `pk` and `qk` if they don't sum to 1.

Parameters
----------
pk : array_like
    Defines the (discrete) distribution. Along each axis-slice of ``pk``,
    element ``i`` is the  (possibly unnormalized) probability of event
    ``i``.
qk : array_like, optional
    Sequence against which the relative entropy is computed. Should be in
    the same format as `pk`.
base : float, optional
    The logarithmic base to use, defaults to ``e`` (natural logarithm).
axis : int or None, default: 0
    If an int, the axis of the input along which to compute the statistic.
    The statistic of each axis-slice (e.g. row) of the input will appear in a
    corresponding element of the output.
    If ``None``, the input will be raveled before computing the statistic.
nan_policy : {'propagate', 'omit', 'raise'}
    Defines how to handle input NaNs.
    
    - ``propagate``: if a NaN is present in the axis slice (e.g. row) along
      which the  statistic is computed, the corresponding entry of the output
      will be NaN.
    - ``omit``: NaNs will be omitted when performing the calculation.
      If insufficient data remains in the axis slice along which the
      statistic is computed, the corresponding entry of the output will be
      NaN.
    - ``raise``: if a NaN is present, a ``ValueError`` will be raised.
keepdims : bool, default: False
    If this is set to True, the axes which are reduced are left
    in the result as dimensions with size one. With this option,
    the result will broadcast correctly against the input array.

Returns
-------
S : {float, array_like}
    The calculated entropy.

Notes
-----
Informally, the Shannon entropy quantifies the expected uncertainty
inherent in the possible outcomes of a discrete random variable.
For example,
if messages consisting of sequences of symbols from a set are to be
encoded and transmitted over a noiseless channel, then the Shannon entropy
``H(pk)`` gives a tight lower bound for the average number of units of
information needed per symbol if the symbols occur with frequencies
governed by the discrete distribution `pk` [1]_. The choice of base
determines the choice of units; e.g., ``e`` for nats, ``2`` for bits, etc.

The relative entropy, ``D(pk|qk)``, quantifies the increase in the average
number of units of information needed per symbol if the encoding is
optimized for the probability distribution `qk` instead of the true
distribution `pk`. Informally, the relative entropy quantifies the expected
excess in surprise experienced if one believes the true distribution is
`qk` when it is actually `pk`.

A related quantity, the cross entropy ``CE(pk, qk)``, satisfies the
equation ``CE(pk, qk) = H(pk) + D(pk|qk)`` and can also be calculated with
the formula ``CE = -sum(pk * log(qk))``. It gives the average
number of units of information needed per symbol if an encoding is
optimized for the probability distribution `qk` when the true distribution
is `pk`. It is not computed directly by `entropy`, but it can be computed
using two calls to the function (see Examples).

See [2]_ for more information.

Beginning in SciPy 1.9, ``np.matrix`` inputs (not recommended for new
code) are converted to ``np.ndarray`` before the calculation is performed. In
this case, the output will be a scalar or ``np.ndarray`` of appropriate shape
rather than a 2D ``np.matrix``. Similarly, while masked elements of masked
arrays are ignored, the output will be a scalar or ``np.ndarray`` rather than a
masked array with ``mask=False``.

References
----------
.. [1] Shannon, C.E. (1948), A Mathematical Theory of Communication.
       Bell System Technical Journal, 27: 379-423.
       https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
.. [2] Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information
       Theory (Wiley Series in Telecommunications and Signal Processing).
       Wiley-Interscience, USA.

Examples
--------
The outcome of a fair coin is the most uncertain:

>>> import numpy as np
>>> from scipy.stats import entropy
>>> base = 2  # work in units of bits
>>> pk = np.array([1/2, 1/2])  # fair coin
>>> H = entropy(pk, base=base)
>>> H
1.0
>>> H == -np.sum(pk * np.log(pk)) / np.log(base)
True

The outcome of a biased coin is less uncertain:

>>> qk = np.array([9/10, 1/10])  # biased coin
>>> entropy(qk, base=base)
0.46899559358928117

The relative entropy between the fair coin and biased coin is calculated
as:

>>> D = entropy(pk, qk, base=base)
>>> D
0.7369655941662062
>>> np.isclose(D, np.sum(pk * np.log(pk/qk)) / np.log(base), rtol=4e-16, atol=0)
True

The cross entropy can be calculated as the sum of the entropy and
relative entropy`:

>>> CE = entropy(pk, base=base) + entropy(pk, qk, base=base)
>>> CE
1.736965594166206
>>> CE == -np.sum(pk * np.log(qk)) / np.log(base)
True


Vous êtes un professionnel et vous avez besoin d'une formation ? Calcul scientifique
avec Python
Voir le programme détaillé