Vous êtes un professionnel et vous avez besoin d'une formation ?
RAG (Retrieval-Augmented Generation)et Fine Tuning d'un LLM
Voir le programme détaillé
Module « scipy.stats.qmc »
Classe « LatinHypercube »
Informations générales
Héritage
builtins.object
ABC
QMCEngine
LatinHypercube
Définition
class LatinHypercube(QMCEngine):
help(LatinHypercube)
Latin hypercube sampling (LHS).
A Latin hypercube sample [1]_ generates :math:`n` points in
:math:`[0,1)^{d}`. Each univariate marginal distribution is stratified,
placing exactly one point in :math:`[j/n, (j+1)/n)` for
:math:`j=0,1,...,n-1`. They are still applicable when :math:`n << d`.
Parameters
----------
d : int
Dimension of the parameter space.
scramble : bool, optional
When False, center samples within cells of a multi-dimensional grid.
Otherwise, samples are randomly placed within cells of the grid.
.. note::
Setting ``scramble=False`` does not ensure deterministic output.
For that, use the `rng` parameter.
Default is True.
.. versionadded:: 1.10.0
optimization : {None, "random-cd", "lloyd"}, optional
Whether to use an optimization scheme to improve the quality after
sampling. Note that this is a post-processing step that does not
guarantee that all properties of the sample will be conserved.
Default is None.
* ``random-cd``: random permutations of coordinates to lower the
centered discrepancy. The best sample based on the centered
discrepancy is constantly updated. Centered discrepancy-based
sampling shows better space-filling robustness toward 2D and 3D
subprojections compared to using other discrepancy measures.
* ``lloyd``: Perturb samples using a modified Lloyd-Max algorithm.
The process converges to equally spaced samples.
.. versionadded:: 1.8.0
.. versionchanged:: 1.10.0
Add ``lloyd``.
strength : {1, 2}, optional
Strength of the LHS. ``strength=1`` produces a plain LHS while
``strength=2`` produces an orthogonal array based LHS of strength 2
[7]_, [8]_. In that case, only ``n=p**2`` points can be sampled,
with ``p`` a prime number. It also constrains ``d <= p + 1``.
Default is 1.
.. versionadded:: 1.8.0
rng : `numpy.random.Generator`, optional
Pseudorandom number generator state. When `rng` is None, a new
`numpy.random.Generator` is created using entropy from the
operating system. Types other than `numpy.random.Generator` are
passed to `numpy.random.default_rng` to instantiate a ``Generator``.
.. versionchanged:: 1.15.0
As part of the `SPEC-007 <https://scientific-python.org/specs/spec-0007/>`_
transition from use of `numpy.random.RandomState` to
`numpy.random.Generator`, this keyword was changed from `seed` to
`rng`. For an interim period, both keywords will continue to work, although
only one may be specified at a time. After the interim period, function
calls using the `seed` keyword will emit warnings. Following a
deprecation period, the `seed` keyword will be removed.
See Also
--------
:ref:`quasi-monte-carlo`
Notes
-----
When LHS is used for integrating a function :math:`f` over :math:`n`,
LHS is extremely effective on integrands that are nearly additive [2]_.
With a LHS of :math:`n` points, the variance of the integral is always
lower than plain MC on :math:`n-1` points [3]_. There is a central limit
theorem for LHS on the mean and variance of the integral [4]_, but not
necessarily for optimized LHS due to the randomization.
:math:`A` is called an orthogonal array of strength :math:`t` if in each
n-row-by-t-column submatrix of :math:`A`: all :math:`p^t` possible
distinct rows occur the same number of times. The elements of :math:`A`
are in the set :math:`\{0, 1, ..., p-1\}`, also called symbols.
The constraint that :math:`p` must be a prime number is to allow modular
arithmetic. Increasing strength adds some symmetry to the sub-projections
of a sample. With strength 2, samples are symmetric along the diagonals of
2D sub-projections. This may be undesirable, but on the other hand, the
sample dispersion is improved.
Strength 1 (plain LHS) brings an advantage over strength 0 (MC) and
strength 2 is a useful increment over strength 1. Going to strength 3 is
a smaller increment and scrambled QMC like Sobol', Halton are more
performant [7]_.
To create a LHS of strength 2, the orthogonal array :math:`A` is
randomized by applying a random, bijective map of the set of symbols onto
itself. For example, in column 0, all 0s might become 2; in column 1,
all 0s might become 1, etc.
Then, for each column :math:`i` and symbol :math:`j`, we add a plain,
one-dimensional LHS of size :math:`p` to the subarray where
:math:`A^i = j`. The resulting matrix is finally divided by :math:`p`.
References
----------
.. [1] Mckay et al., "A Comparison of Three Methods for Selecting Values
of Input Variables in the Analysis of Output from a Computer Code."
Technometrics, 1979.
.. [2] M. Stein, "Large sample properties of simulations using Latin
hypercube sampling." Technometrics 29, no. 2: 143-151, 1987.
.. [3] A. B. Owen, "Monte Carlo variance of scrambled net quadrature."
SIAM Journal on Numerical Analysis 34, no. 5: 1884-1910, 1997
.. [4] Loh, W.-L. "On Latin hypercube sampling." The annals of statistics
24, no. 5: 2058-2080, 1996.
.. [5] Fang et al. "Design and modeling for computer experiments".
Computer Science and Data Analysis Series, 2006.
.. [6] Damblin et al., "Numerical studies of space filling designs:
optimization of Latin Hypercube Samples and subprojection properties."
Journal of Simulation, 2013.
.. [7] A. B. Owen , "Orthogonal arrays for computer experiments,
integration and visualization." Statistica Sinica, 1992.
.. [8] B. Tang, "Orthogonal Array-Based Latin Hypercubes."
Journal of the American Statistical Association, 1993.
.. [9] Seaholm, Susan K. et al. (1988). Latin hypercube sampling and the
sensitivity analysis of a Monte Carlo epidemic model. Int J Biomed
Comput, 23(1-2), 97-112. :doi:`10.1016/0020-7101(88)90067-0`
Examples
--------
Generate samples from a Latin hypercube generator.
>>> from scipy.stats import qmc
>>> sampler = qmc.LatinHypercube(d=2)
>>> sample = sampler.random(n=5)
>>> sample
array([[0.1545328 , 0.53664833], # random
[0.84052691, 0.06474907],
[0.52177809, 0.93343721],
[0.68033825, 0.36265316],
[0.26544879, 0.61163943]])
Compute the quality of the sample using the discrepancy criterion.
>>> qmc.discrepancy(sample)
0.0196... # random
Samples can be scaled to bounds.
>>> l_bounds = [0, 2]
>>> u_bounds = [10, 5]
>>> qmc.scale(sample, l_bounds, u_bounds)
array([[1.54532796, 3.609945 ], # random
[8.40526909, 2.1942472 ],
[5.2177809 , 4.80031164],
[6.80338249, 3.08795949],
[2.65448791, 3.83491828]])
Below are other examples showing alternative ways to construct LHS with
even better coverage of the space.
Using a base LHS as a baseline.
>>> sampler = qmc.LatinHypercube(d=2)
>>> sample = sampler.random(n=5)
>>> qmc.discrepancy(sample)
0.0196... # random
Use the `optimization` keyword argument to produce a LHS with
lower discrepancy at higher computational cost.
>>> sampler = qmc.LatinHypercube(d=2, optimization="random-cd")
>>> sample = sampler.random(n=5)
>>> qmc.discrepancy(sample)
0.0176... # random
Use the `strength` keyword argument to produce an orthogonal array based
LHS of strength 2. In this case, the number of sample points must be the
square of a prime number.
>>> sampler = qmc.LatinHypercube(d=2, strength=2)
>>> sample = sampler.random(n=9)
>>> qmc.discrepancy(sample)
0.00526... # random
Options could be combined to produce an optimized centered
orthogonal array based LHS. After optimization, the result would not
be guaranteed to be of strength 2.
**Real-world example**
In [9]_, a Latin Hypercube sampling (LHS) strategy was used to sample a
parameter space to study the importance of each parameter of an epidemic
model. Such analysis is also called a sensitivity analysis.
Since the dimensionality of the problem is high (6), it is computationally
expensive to cover the space. When numerical experiments are costly, QMC
enables analysis that may not be possible if using a grid.
The six parameters of the model represented the probability of illness,
the probability of withdrawal, and four contact probabilities. The
authors assumed uniform distributions for all parameters and generated
50 samples.
Using `scipy.stats.qmc.LatinHypercube` to replicate the protocol,
the first step is to create a sample in the unit hypercube:
>>> from scipy.stats import qmc
>>> sampler = qmc.LatinHypercube(d=6)
>>> sample = sampler.random(n=50)
Then the sample can be scaled to the appropriate bounds:
>>> l_bounds = [0.000125, 0.01, 0.0025, 0.05, 0.47, 0.7]
>>> u_bounds = [0.000375, 0.03, 0.0075, 0.15, 0.87, 0.9]
>>> sample_scaled = qmc.scale(sample, l_bounds, u_bounds)
Such a sample was used to run the model 50 times, and a polynomial
response surface was constructed. This allowed the authors to study the
relative importance of each parameter across the range of possibilities
of every other parameter.
In this computer experiment, they showed a 14-fold reduction in the
number of samples required to maintain an error below 2% on their
response surface when compared to a grid sampling.
Constructeur(s)
Liste des opérateurs
Opérateurs hérités de la classe object
__eq__,
__ge__,
__gt__,
__le__,
__lt__,
__ne__
Liste des méthodes
Toutes les méthodes
Méthodes d'instance
Méthodes statiques
Méthodes dépréciées
Méthodes héritées de la classe QMCEngine
__init_subclass__, __subclasshook__, fast_forward, integers, random, reset
Méthodes héritées de la classe object
__delattr__,
__dir__,
__format__,
__getattribute__,
__getstate__,
__hash__,
__reduce__,
__reduce_ex__,
__repr__,
__setattr__,
__sizeof__,
__str__
Vous êtes un professionnel et vous avez besoin d'une formation ?
Deep Learning avec Python
et Keras et Tensorflow
Voir le programme détaillé
Améliorations / Corrections
Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.
Emplacement :
Description des améliorations :