Classe « LatinHypercube »

Informations générales

Héritage

builtins.object
    ABC
        QMCEngine
            LatinHypercube

Définition

class LatinHypercube(QMCEngine):

help(LatinHypercube)

Latin hypercube sampling (LHS).

A Latin hypercube sample [1]_ generates :math:`n` points in
:math:`[0,1)^{d}`. Each univariate marginal distribution is stratified,
placing exactly one point in :math:`[j/n, (j+1)/n)` for
:math:`j=0,1,...,n-1`. They are still applicable when :math:`n << d`.

Parameters
----------
d : int
    Dimension of the parameter space.
scramble : bool, optional
    When False, center samples within cells of a multi-dimensional grid.
    Otherwise, samples are randomly placed within cells of the grid.

    .. note::
        Setting ``scramble=False`` does not ensure deterministic output.
        For that, use the `rng` parameter.

    Default is True.

    .. versionadded:: 1.10.0

optimization : {None, "random-cd", "lloyd"}, optional
    Whether to use an optimization scheme to improve the quality after
    sampling. Note that this is a post-processing step that does not
    guarantee that all properties of the sample will be conserved.
    Default is None.

    * ``random-cd``: random permutations of coordinates to lower the
      centered discrepancy. The best sample based on the centered
      discrepancy is constantly updated. Centered discrepancy-based
      sampling shows better space-filling robustness toward 2D and 3D
      subprojections compared to using other discrepancy measures.
    * ``lloyd``: Perturb samples using a modified Lloyd-Max algorithm.
      The process converges to equally spaced samples.

    .. versionadded:: 1.8.0
    .. versionchanged:: 1.10.0
        Add ``lloyd``.

strength : {1, 2}, optional
    Strength of the LHS. ``strength=1`` produces a plain LHS while
    ``strength=2`` produces an orthogonal array based LHS of strength 2
    [7]_, [8]_. In that case, only ``n=p**2`` points can be sampled,
    with ``p`` a prime number. It also constrains ``d <= p + 1``.
    Default is 1.

    .. versionadded:: 1.8.0

rng : `numpy.random.Generator`, optional
    Pseudorandom number generator state. When `rng` is None, a new
    `numpy.random.Generator` is created using entropy from the
    operating system. Types other than `numpy.random.Generator` are
    passed to `numpy.random.default_rng` to instantiate a ``Generator``.

    .. versionchanged:: 1.15.0

        As part of the `SPEC-007 <https://scientific-python.org/specs/spec-0007/>`_
        transition from use of `numpy.random.RandomState` to
        `numpy.random.Generator`, this keyword was changed from `seed` to
        `rng`. For an interim period, both keywords will continue to work, although
        only one may be specified at a time. After the interim period, function
        calls using the `seed` keyword will emit warnings. Following a
        deprecation period, the `seed` keyword will be removed.

See Also
--------
:ref:`quasi-monte-carlo`

Notes
-----

When LHS is used for integrating a function :math:`f` over :math:`n`,
LHS is extremely effective on integrands that are nearly additive [2]_.
With a LHS of :math:`n` points, the variance of the integral is always
lower than plain MC on :math:`n-1` points [3]_. There is a central limit
theorem for LHS on the mean and variance of the integral [4]_, but not
necessarily for optimized LHS due to the randomization.

:math:`A` is called an orthogonal array of strength :math:`t` if in each
n-row-by-t-column submatrix of :math:`A`: all :math:`p^t` possible
distinct rows occur the same number of times. The elements of :math:`A`
are in the set :math:`\{0, 1, ..., p-1\}`, also called symbols.
The constraint that :math:`p` must be a prime number is to allow modular
arithmetic. Increasing strength adds some symmetry to the sub-projections
of a sample. With strength 2, samples are symmetric along the diagonals of
2D sub-projections. This may be undesirable, but on the other hand, the
sample dispersion is improved.

Strength 1 (plain LHS) brings an advantage over strength 0 (MC) and
strength 2 is a useful increment over strength 1. Going to strength 3 is
a smaller increment and scrambled QMC like Sobol', Halton are more
performant [7]_.

To create a LHS of strength 2, the orthogonal array :math:`A` is
randomized by applying a random, bijective map of the set of symbols onto
itself. For example, in column 0, all 0s might become 2; in column 1,
all 0s might become 1, etc.
Then, for each column :math:`i` and symbol :math:`j`, we add a plain,
one-dimensional LHS of size :math:`p` to the subarray where
:math:`A^i = j`. The resulting matrix is finally divided by :math:`p`.

References
----------
.. [1] Mckay et al., "A Comparison of Three Methods for Selecting Values
   of Input Variables in the Analysis of Output from a Computer Code."
   Technometrics, 1979.
.. [2] M. Stein, "Large sample properties of simulations using Latin
   hypercube sampling." Technometrics 29, no. 2: 143-151, 1987.
.. [3] A. B. Owen, "Monte Carlo variance of scrambled net quadrature."
   SIAM Journal on Numerical Analysis 34, no. 5: 1884-1910, 1997
.. [4]  Loh, W.-L. "On Latin hypercube sampling." The annals of statistics
   24, no. 5: 2058-2080, 1996.
.. [5] Fang et al. "Design and modeling for computer experiments".
   Computer Science and Data Analysis Series, 2006.
.. [6] Damblin et al., "Numerical studies of space filling designs:
   optimization of Latin Hypercube Samples and subprojection properties."
   Journal of Simulation, 2013.
.. [7] A. B. Owen , "Orthogonal arrays for computer experiments,
   integration and visualization." Statistica Sinica, 1992.
.. [8] B. Tang, "Orthogonal Array-Based Latin Hypercubes."
   Journal of the American Statistical Association, 1993.
.. [9] Seaholm, Susan K. et al. (1988). Latin hypercube sampling and the
   sensitivity analysis of a Monte Carlo epidemic model. Int J Biomed
   Comput, 23(1-2), 97-112. :doi:`10.1016/0020-7101(88)90067-0`

Examples
--------
Generate samples from a Latin hypercube generator.

>>> from scipy.stats import qmc
>>> sampler = qmc.LatinHypercube(d=2)
>>> sample = sampler.random(n=5)
>>> sample
array([[0.1545328 , 0.53664833], # random
        [0.84052691, 0.06474907],
        [0.52177809, 0.93343721],
        [0.68033825, 0.36265316],
        [0.26544879, 0.61163943]])

Compute the quality of the sample using the discrepancy criterion.

>>> qmc.discrepancy(sample)
0.0196... # random

Samples can be scaled to bounds.

>>> l_bounds = [0, 2]
>>> u_bounds = [10, 5]
>>> qmc.scale(sample, l_bounds, u_bounds)
array([[1.54532796, 3.609945 ], # random
        [8.40526909, 2.1942472 ],
        [5.2177809 , 4.80031164],
        [6.80338249, 3.08795949],
        [2.65448791, 3.83491828]])

Below are other examples showing alternative ways to construct LHS with
even better coverage of the space.

Using a base LHS as a baseline.

>>> sampler = qmc.LatinHypercube(d=2)
>>> sample = sampler.random(n=5)
>>> qmc.discrepancy(sample)
0.0196...  # random

Use the `optimization` keyword argument to produce a LHS with
lower discrepancy at higher computational cost.

>>> sampler = qmc.LatinHypercube(d=2, optimization="random-cd")
>>> sample = sampler.random(n=5)
>>> qmc.discrepancy(sample)
0.0176...  # random

Use the `strength` keyword argument to produce an orthogonal array based
LHS of strength 2. In this case, the number of sample points must be the
square of a prime number.

>>> sampler = qmc.LatinHypercube(d=2, strength=2)
>>> sample = sampler.random(n=9)
>>> qmc.discrepancy(sample)
0.00526...  # random

Options could be combined to produce an optimized centered
orthogonal array based LHS. After optimization, the result would not
be guaranteed to be of strength 2.

**Real-world example**

In [9]_, a Latin Hypercube sampling (LHS) strategy was used to sample a
parameter space to study the importance of each parameter of an epidemic
model. Such analysis is also called a sensitivity analysis.

Since the dimensionality of the problem is high (6), it is computationally
expensive to cover the space. When numerical experiments are costly, QMC
enables analysis that may not be possible if using a grid.

The six parameters of the model represented the probability of illness,
the probability of withdrawal, and four contact probabilities. The
authors assumed uniform distributions for all parameters and generated
50 samples.

Using `scipy.stats.qmc.LatinHypercube` to replicate the protocol,
the first step is to create a sample in the unit hypercube:

>>> from scipy.stats import qmc
>>> sampler = qmc.LatinHypercube(d=6)
>>> sample = sampler.random(n=50)

Then the sample can be scaled to the appropriate bounds:

>>> l_bounds = [0.000125, 0.01, 0.0025, 0.05, 0.47, 0.7]
>>> u_bounds = [0.000375, 0.03, 0.0075, 0.15, 0.87, 0.9]
>>> sample_scaled = qmc.scale(sample, l_bounds, u_bounds)

Such a sample was used to run the model 50 times, and a polynomial
response surface was constructed. This allowed the authors to study the
relative importance of each parameter across the range of possibilities
of every other parameter.

In this computer experiment, they showed a 14-fold reduction in the
number of samples required to maintain an error below 2% on their
response surface when compared to a grid sampling.

Constructeur(s)

Signature du constructeur	Description
__init__(self, d: int \| numpy.integer, *, scramble: bool = True, strength: int = 1, optimization: Optional[Literal['random-cd', 'lloyd']] = None, rng: int \| numpy.integer \| numpy.random._generator.Generator \| numpy.random.mtrand.RandomState \| None = None) -> None