Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Vous êtes un professionnel et vous avez besoin d'une formation ? Machine Learning
avec Scikit-Learn
Voir le programme détaillé
Module « pandas »

Fonction to_datetime - module pandas

Signature de la fonction to_datetime

def to_datetime(arg: 'DatetimeScalarOrArrayConvertible | DictConvertible', errors: 'DateTimeErrorChoices' = 'raise', dayfirst: 'bool' = False, yearfirst: 'bool' = False, utc: 'bool' = False, format: 'str | None' = None, exact: 'bool | lib.NoDefault' = <no_default>, unit: 'str | None' = None, infer_datetime_format: 'lib.NoDefault | bool' = <no_default>, origin: 'str' = 'unix', cache: 'bool' = True) -> 'DatetimeIndex | Series | DatetimeScalar | NaTType | None' 

Description

help(pandas.to_datetime)

Convert argument to datetime.

This function converts a scalar, array-like, :class:`Series` or
:class:`DataFrame`/dict-like to a pandas datetime object.

Parameters
----------
arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
    The object to convert to a datetime. If a :class:`DataFrame` is provided, the
    method expects minimally the following columns: :const:`"year"`,
    :const:`"month"`, :const:`"day"`. The column "year"
    must be specified in 4-digit format.
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If :const:`'raise'`, then invalid parsing will raise an exception.
    - If :const:`'coerce'`, then invalid parsing will be set as :const:`NaT`.
    - If :const:`'ignore'`, then invalid parsing will return the input.
dayfirst : bool, default False
    Specify a date parse order if `arg` is str or is list-like.
    If :const:`True`, parses dates with the day first, e.g. :const:`"10/11/12"`
    is parsed as :const:`2012-11-10`.

    .. warning::

        ``dayfirst=True`` is not strict, but will prefer to parse
        with day first.

yearfirst : bool, default False
    Specify a date parse order if `arg` is str or is list-like.

    - If :const:`True` parses dates with the year first, e.g.
      :const:`"10/11/12"` is parsed as :const:`2010-11-12`.
    - If both `dayfirst` and `yearfirst` are :const:`True`, `yearfirst` is
      preceded (same as :mod:`dateutil`).

    .. warning::

        ``yearfirst=True`` is not strict, but will prefer to parse
        with year first.

utc : bool, default False
    Control timezone-related parsing, localization and conversion.

    - If :const:`True`, the function *always* returns a timezone-aware
      UTC-localized :class:`Timestamp`, :class:`Series` or
      :class:`DatetimeIndex`. To do this, timezone-naive inputs are
      *localized* as UTC, while timezone-aware inputs are *converted* to UTC.

    - If :const:`False` (default), inputs will not be coerced to UTC.
      Timezone-naive inputs will remain naive, while timezone-aware ones
      will keep their time offsets. Limitations exist for mixed
      offsets (typically, daylight savings), see :ref:`Examples
      <to_datetime_tz_examples>` section for details.

    .. warning::

        In a future version of pandas, parsing datetimes with mixed time
        zones will raise an error unless `utc=True`.
        Please specify `utc=True` to opt in to the new behaviour
        and silence this warning. To create a `Series` with mixed offsets and
        `object` dtype, please use `apply` and `datetime.datetime.strptime`.

    See also: pandas general documentation about `timezone conversion and
    localization
    <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
    #time-zone-handling>`_.

format : str, default None
    The strftime to parse time, e.g. :const:`"%d/%m/%Y"`. See
    `strftime documentation
    <https://docs.python.org/3/library/datetime.html
    #strftime-and-strptime-behavior>`_ for more information on choices, though
    note that :const:`"%f"` will parse all the way up to nanoseconds.
    You can also pass:

    - "ISO8601", to parse any `ISO8601 <https://en.wikipedia.org/wiki/ISO_8601>`_
      time string (not necessarily in exactly the same format);
    - "mixed", to infer the format for each element individually. This is risky,
      and you should probably use it along with `dayfirst`.

    .. note::

        If a :class:`DataFrame` is passed, then `format` has no effect.

exact : bool, default True
    Control how `format` is used:

    - If :const:`True`, require an exact `format` match.
    - If :const:`False`, allow the `format` to match anywhere in the target
      string.

    Cannot be used alongside ``format='ISO8601'`` or ``format='mixed'``.
unit : str, default 'ns'
    The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
    integer or float number. This will be based off the origin.
    Example, with ``unit='ms'`` and ``origin='unix'``, this would calculate
    the number of milliseconds to the unix epoch start.
infer_datetime_format : bool, default False
    If :const:`True` and no `format` is given, attempt to infer the format
    of the datetime strings based on the first non-NaN element,
    and if it can be inferred, switch to a faster method of parsing them.
    In some cases this can increase the parsing speed by ~5-10x.

    .. deprecated:: 2.0.0
        A strict version of this argument is now the default, passing it has
        no effect.

origin : scalar, default 'unix'
    Define the reference date. The numeric values would be parsed as number
    of units (defined by `unit`) since this reference date.

    - If :const:`'unix'` (or POSIX) time; origin is set to 1970-01-01.
    - If :const:`'julian'`, unit must be :const:`'D'`, and origin is set to
      beginning of Julian Calendar. Julian day number :const:`0` is assigned
      to the day starting at noon on January 1, 4713 BC.
    - If Timestamp convertible (Timestamp, dt.datetime, np.datetimt64 or date
      string), origin is set to Timestamp identified by origin.
    - If a float or integer, origin is the difference
      (in units determined by the ``unit`` argument) relative to 1970-01-01.
cache : bool, default True
    If :const:`True`, use a cache of unique, converted dates to apply the
    datetime conversion. May produce significant speed-up when parsing
    duplicate date strings, especially ones with timezone offsets. The cache
    is only used when there are at least 50 values. The presence of
    out-of-bounds values will render the cache unusable and may slow down
    parsing.

Returns
-------
datetime
    If parsing succeeded.
    Return type depends on input (types in parenthesis correspond to
    fallback in case of unsuccessful timezone or out-of-range timestamp
    parsing):

    - scalar: :class:`Timestamp` (or :class:`datetime.datetime`)
    - array-like: :class:`DatetimeIndex` (or :class:`Series` with
      :class:`object` dtype containing :class:`datetime.datetime`)
    - Series: :class:`Series` of :class:`datetime64` dtype (or
      :class:`Series` of :class:`object` dtype containing
      :class:`datetime.datetime`)
    - DataFrame: :class:`Series` of :class:`datetime64` dtype (or
      :class:`Series` of :class:`object` dtype containing
      :class:`datetime.datetime`)

Raises
------
ParserError
    When parsing a date from string fails.
ValueError
    When another datetime conversion error happens. For example when one
    of 'year', 'month', day' columns is missing in a :class:`DataFrame`, or
    when a Timezone-aware :class:`datetime.datetime` is found in an array-like
    of mixed time offsets, and ``utc=False``.

See Also
--------
DataFrame.astype : Cast argument to a specified dtype.
to_timedelta : Convert argument to timedelta.
convert_dtypes : Convert dtypes.

Notes
-----

Many input types are supported, and lead to different output types:

- **scalars** can be int, float, str, datetime object (from stdlib :mod:`datetime`
  module or :mod:`numpy`). They are converted to :class:`Timestamp` when
  possible, otherwise they are converted to :class:`datetime.datetime`.
  None/NaN/null scalars are converted to :const:`NaT`.

- **array-like** can contain int, float, str, datetime objects. They are
  converted to :class:`DatetimeIndex` when possible, otherwise they are
  converted to :class:`Index` with :class:`object` dtype, containing
  :class:`datetime.datetime`. None/NaN/null entries are converted to
  :const:`NaT` in both cases.

- **Series** are converted to :class:`Series` with :class:`datetime64`
  dtype when possible, otherwise they are converted to :class:`Series` with
  :class:`object` dtype, containing :class:`datetime.datetime`. None/NaN/null
  entries are converted to :const:`NaT` in both cases.

- **DataFrame/dict-like** are converted to :class:`Series` with
  :class:`datetime64` dtype. For each row a datetime is created from assembling
  the various dataframe columns. Column keys can be common abbreviations
  like ['year', 'month', 'day', 'minute', 'second', 'ms', 'us', 'ns']) or
  plurals of the same.

The following causes are responsible for :class:`datetime.datetime` objects
being returned (possibly inside an :class:`Index` or a :class:`Series` with
:class:`object` dtype) instead of a proper pandas designated type
(:class:`Timestamp`, :class:`DatetimeIndex` or :class:`Series`
with :class:`datetime64` dtype):

- when any input element is before :const:`Timestamp.min` or after
  :const:`Timestamp.max`, see `timestamp limitations
  <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
  #timeseries-timestamp-limits>`_.

- when ``utc=False`` (default) and the input is an array-like or
  :class:`Series` containing mixed naive/aware datetime, or aware with mixed
  time offsets. Note that this happens in the (quite frequent) situation when
  the timezone has a daylight savings policy. In that case you may wish to
  use ``utc=True``.

Examples
--------

**Handling various input formats**

Assembling a datetime from multiple columns of a :class:`DataFrame`. The keys
can be common abbreviations like ['year', 'month', 'day', 'minute', 'second',
'ms', 'us', 'ns']) or plurals of the same

>>> df = pd.DataFrame({'year': [2015, 2016],
...                    'month': [2, 3],
...                    'day': [4, 5]})
>>> pd.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

Using a unix epoch time

>>> pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
>>> pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

.. warning:: For float arg, precision rounding might happen. To prevent
    unexpected behavior use a fixed-width exact type.

Using a non-unix epoch origin

>>> pd.to_datetime([1, 2, 3], unit='D',
...                origin=pd.Timestamp('1960-01-01'))
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
              dtype='datetime64[ns]', freq=None)

**Differences with strptime behavior**

:const:`"%f"` will parse all the way up to nanoseconds.

>>> pd.to_datetime('2018-10-26 12:00:00.0000000011',
...                format='%Y-%m-%d %H:%M:%S.%f')
Timestamp('2018-10-26 12:00:00.000000001')

**Non-convertible date/times**

Passing ``errors='coerce'`` will force an out-of-bounds date to :const:`NaT`,
in addition to forcing non-dates (or non-parseable dates) to :const:`NaT`.

>>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
NaT

.. _to_datetime_tz_examples:

**Timezones and time offsets**

The default behaviour (``utc=False``) is as follows:

- Timezone-naive inputs are converted to timezone-naive :class:`DatetimeIndex`:

>>> pd.to_datetime(['2018-10-26 12:00:00', '2018-10-26 13:00:15'])
DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'],
              dtype='datetime64[ns]', freq=None)

- Timezone-aware inputs *with constant time offset* are converted to
  timezone-aware :class:`DatetimeIndex`:

>>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500'])
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'],
              dtype='datetime64[ns, UTC-05:00]', freq=None)

- However, timezone-aware inputs *with mixed time offsets* (for example
  issued from a timezone with daylight savings, such as Europe/Paris)
  are **not successfully converted** to a :class:`DatetimeIndex`.
  Parsing datetimes with mixed time zones will show a warning unless
  `utc=True`. If you specify `utc=False` the warning below will be shown
  and a simple :class:`Index` containing :class:`datetime.datetime`
  objects will be returned:

>>> pd.to_datetime(['2020-10-25 02:00 +0200',
...                 '2020-10-25 04:00 +0100'])  # doctest: +SKIP
FutureWarning: In a future version of pandas, parsing datetimes with mixed
time zones will raise an error unless `utc=True`. Please specify `utc=True`
to opt in to the new behaviour and silence this warning. To create a `Series`
with mixed offsets and `object` dtype, please use `apply` and
`datetime.datetime.strptime`.
Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00],
      dtype='object')

- A mix of timezone-aware and timezone-naive inputs is also converted to
  a simple :class:`Index` containing :class:`datetime.datetime` objects:

>>> from datetime import datetime
>>> pd.to_datetime(["2020-01-01 01:00:00-01:00",
...                 datetime(2020, 1, 1, 3, 0)])  # doctest: +SKIP
FutureWarning: In a future version of pandas, parsing datetimes with mixed
time zones will raise an error unless `utc=True`. Please specify `utc=True`
to opt in to the new behaviour and silence this warning. To create a `Series`
with mixed offsets and `object` dtype, please use `apply` and
`datetime.datetime.strptime`.
Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')

|

Setting ``utc=True`` solves most of the above issues:

- Timezone-naive inputs are *localized* as UTC

>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00'], utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

- Timezone-aware inputs are *converted* to UTC (the output represents the
  exact same datetime, but viewed from the UTC time offset `+00:00`).

>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
...                utc=True)
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

- Inputs can contain both string or datetime, the above
  rules still apply

>>> pd.to_datetime(['2018-10-26 12:00', datetime(2020, 1, 1, 18)], utc=True)
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2020-01-01 18:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)


Vous êtes un professionnel et vous avez besoin d'une formation ? Programmation Python
Les compléments
Voir le programme détaillé