Classe « gaussian_kde »

Informations générales

Héritage

builtins.object
    gaussian_kde

Définition

class gaussian_kde(builtins.object):

help(gaussian_kde)

Representation of a kernel-density estimate using Gaussian kernels.

Kernel density estimation is a way to estimate the probability density
function (PDF) of a random variable in a non-parametric way.
`gaussian_kde` works for both uni-variate and multi-variate data.   It
includes automatic bandwidth determination.  The estimation works best for
a unimodal distribution; bimodal or multi-modal distributions tend to be
oversmoothed.

Parameters
----------
dataset : array_like
    Datapoints to estimate from. In case of univariate data this is a 1-D
    array, otherwise a 2-D array with shape (# of dims, # of data).
bw_method : str, scalar or callable, optional
    The method used to calculate the estimator bandwidth.  This can be
    'scott', 'silverman', a scalar constant or a callable.  If a scalar,
    this will be used directly as `kde.factor`.  If a callable, it should
    take a `gaussian_kde` instance as only parameter and return a scalar.
    If None (default), 'scott' is used.  See Notes for more details.
weights : array_like, optional
    weights of datapoints. This must be the same shape as dataset.
    If None (default), the samples are assumed to be equally weighted

Attributes
----------
dataset : ndarray
    The dataset with which `gaussian_kde` was initialized.
d : int
    Number of dimensions.
n : int
    Number of datapoints.
neff : int
    Effective number of datapoints.

    .. versionadded:: 1.2.0
factor : float
    The bandwidth factor, obtained from `kde.covariance_factor`. The square
    of `kde.factor` multiplies the covariance matrix of the data in the kde
    estimation.
covariance : ndarray
    The covariance matrix of `dataset`, scaled by the calculated bandwidth
    (`kde.factor`).
inv_cov : ndarray
    The inverse of `covariance`.

Methods
-------
evaluate
__call__
integrate_gaussian
integrate_box_1d
integrate_box
integrate_kde
pdf
logpdf
resample
set_bandwidth
covariance_factor

Notes
-----
Bandwidth selection strongly influences the estimate obtained from the KDE
(much more so than the actual shape of the kernel).  Bandwidth selection
can be done by a "rule of thumb", by cross-validation, by "plug-in
methods" or by other means; see [3]_, [4]_ for reviews.  `gaussian_kde`
uses a rule of thumb, the default is Scott's Rule.

Scott's Rule [1]_, implemented as `scotts_factor`, is::

    n**(-1./(d+4)),

with ``n`` the number of data points and ``d`` the number of dimensions.
In the case of unequally weighted points, `scotts_factor` becomes::

    neff**(-1./(d+4)),

with ``neff`` the effective number of datapoints.
Silverman's Rule [2]_, implemented as `silverman_factor`, is::

    (n * (d + 2) / 4.)**(-1. / (d + 4)).

or in the case of unequally weighted points::

    (neff * (d + 2) / 4.)**(-1. / (d + 4)).

Good general descriptions of kernel density estimation can be found in [1]_
and [2]_, the mathematics for this multi-dimensional implementation can be
found in [1]_.

With a set of weighted samples, the effective number of datapoints ``neff``
is defined by::

    neff = sum(weights)^2 / sum(weights^2)

as detailed in [5]_.

`gaussian_kde` does not currently support data that lies in a
lower-dimensional subspace of the space in which it is expressed. For such
data, consider performing principal component analysis / dimensionality
reduction and using `gaussian_kde` with the transformed data.

References
----------
.. [1] D.W. Scott, "Multivariate Density Estimation: Theory, Practice, and
       Visualization", John Wiley & Sons, New York, Chicester, 1992.
.. [2] B.W. Silverman, "Density Estimation for Statistics and Data
       Analysis", Vol. 26, Monographs on Statistics and Applied Probability,
       Chapman and Hall, London, 1986.
.. [3] B.A. Turlach, "Bandwidth Selection in Kernel Density Estimation: A
       Review", CORE and Institut de Statistique, Vol. 19, pp. 1-33, 1993.
.. [4] D.M. Bashtannyk and R.J. Hyndman, "Bandwidth selection for kernel
       conditional density estimation", Computational Statistics & Data
       Analysis, Vol. 36, pp. 279-298, 2001.
.. [5] Gray P. G., 1969, Journal of the Royal Statistical Society.
       Series A (General), 132, 272

Examples
--------
Generate some random two-dimensional data:

>>> import numpy as np
>>> from scipy import stats
>>> def measure(n):
...     "Measurement model, return two coupled measurements."
...     m1 = np.random.normal(size=n)
...     m2 = np.random.normal(scale=0.5, size=n)
...     return m1+m2, m1-m2

>>> m1, m2 = measure(2000)
>>> xmin = m1.min()
>>> xmax = m1.max()
>>> ymin = m2.min()
>>> ymax = m2.max()

Perform a kernel density estimate on the data:

>>> X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
>>> positions = np.vstack([X.ravel(), Y.ravel()])
>>> values = np.vstack([m1, m2])
>>> kernel = stats.gaussian_kde(values)
>>> Z = np.reshape(kernel(positions).T, X.shape)

Plot the results:

>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots()
>>> ax.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r,
...           extent=[xmin, xmax, ymin, ymax])
>>> ax.plot(m1, m2, 'k.', markersize=2)
>>> ax.set_xlim([xmin, xmax])
>>> ax.set_ylim([ymin, ymax])
>>> plt.show()

Constructeur(s)

Signature du constructeur	Description
__init__(self, dataset, bw_method=None, weights=None)

Liste des propriétés

Nom de la propriété	Description
inv_cov
neff
weights

Liste des opérateurs

Opérateurs hérités de la classe object

__eq__, __ge__, __gt__, __le__, __lt__, __ne__

Liste des méthodes

Toutes les méthodes Méthodes d'instance Méthodes statiques

Signature de la méthode	Description
__call__(self, points)	Evaluate the estimated pdf on a set of points. _{[extrait de evaluate.__doc__]}
covariance_factor(self)	Computes the coefficient (`kde.factor`) that _{[extrait de scotts_factor.__doc__]}
evaluate(self, points)	Evaluate the estimated pdf on a set of points. _{[extrait de evaluate.__doc__]}
integrate_box(self, low_bounds, high_bounds, maxpts=None)	Computes the integral of a pdf over a rectangular interval. _{[extrait de integrate_box.__doc__]}
integrate_box_1d(self, low, high)
integrate_gaussian(self, mean, cov)
integrate_kde(self, other)
logpdf(self, x)
marginal(self, dimensions)	Return a marginal KDE distribution _{[extrait de marginal.__doc__]}
pdf(self, x)
resample(self, size=None, seed=None)	Randomly sample a dataset from the estimated pdf. _{[extrait de resample.__doc__]}
scotts_factor(self)	Computes the coefficient (`kde.factor`) that _{[extrait de scotts_factor.__doc__]}
set_bandwidth(self, bw_method=None)	Compute the estimator bandwidth with given method. _{[extrait de set_bandwidth.__doc__]}
silverman_factor(self)	Compute the Silverman factor. _{[extrait de silverman_factor.__doc__]}