Participer au site avec un Tip
Rechercher
 

Améliorations / Corrections

Vous avez des améliorations (ou des corrections) à proposer pour ce document : je vous remerçie par avance de m'en faire part, cela m'aide à améliorer le site.

Emplacement :

Description des améliorations :

Classe « DataFrame »

Méthode pandas.DataFrame.to_parquet

Signature de la méthode to_parquet

def to_parquet(self, path: 'Optional[FilePathOrBuffer]' = None, engine: 'str' = 'auto', compression: 'Optional[str]' = 'snappy', index: 'Optional[bool]' = None, partition_cols: 'Optional[List[str]]' = None, storage_options: 'StorageOptions' = None, **kwargs) -> 'Optional[bytes]' 

Description

to_parquet.__doc__

Write a DataFrame to the binary parquet format.

This function writes the dataframe as a `parquet file
<https://parquet.apache.org/>`_. You can choose different parquet
backends, and have the option of compression. See
:ref:`the user guide <io.parquet>` for more details.

Parameters
----------
path : str or file-like object, default None
    If a string, it will be used as Root Directory path
    when writing a partitioned dataset. By file-like object,
    we refer to objects with a write() method, such as a file handle
    (e.g. via builtin open function) or io.BytesIO. The engine
    fastparquet does not accept file-like objects. If path is None,
    a bytes object is returned.

    .. versionchanged:: 1.2.0

    Previously this was "fname"

engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto'
    Parquet library to use. If 'auto', then the option
    ``io.parquet.engine`` is used. The default ``io.parquet.engine``
    behavior is to try 'pyarrow', falling back to 'fastparquet' if
    'pyarrow' is unavailable.
compression : {'snappy', 'gzip', 'brotli', None}, default 'snappy'
    Name of the compression to use. Use ``None`` for no compression.
index : bool, default None
    If ``True``, include the dataframe's index(es) in the file output.
    If ``False``, they will not be written to the file.
    If ``None``, similar to ``True`` the dataframe's index(es)
    will be saved. However, instead of being saved as values,
    the RangeIndex will be stored as a range in the metadata so it
    doesn't require much space and is faster. Other indexes will
    be included as columns in the file output.

    .. versionadded:: 0.24.0

partition_cols : list, optional, default None
    Column names by which to partition the dataset.
    Columns are partitioned in the order they are given.
    Must be None if path is not a string.

    .. versionadded:: 0.24.0

storage_options : dict, optional
    Extra options that make sense for a particular storage connection, e.g.
    host, port, username, password, etc., if using a URL that will
    be parsed by ``fsspec``, e.g., starting "s3://", "gcs://". An error
    will be raised if providing this argument with a non-fsspec URL.
    See the fsspec and backend storage implementation docs for the set of
    allowed keys and values.

    .. versionadded:: 1.2.0

**kwargs
    Additional arguments passed to the parquet library. See
    :ref:`pandas io <io.parquet>` for more details.

Returns
-------
bytes if no path argument is provided else None

See Also
--------
read_parquet : Read a parquet file.
DataFrame.to_csv : Write a csv file.
DataFrame.to_sql : Write to a sql table.
DataFrame.to_hdf : Write to hdf.

Notes
-----
This function requires either the `fastparquet
<https://pypi.org/project/fastparquet>`_ or `pyarrow
<https://arrow.apache.org/docs/python/>`_ library.

Examples
--------
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
...               compression='gzip')  # doctest: +SKIP
>>> pd.read_parquet('df.parquet.gzip')  # doctest: +SKIP
   col1  col2
0     1     3
1     2     4

If you want to get a buffer to the parquet content you can use a io.BytesIO
object, as long as you don't use partition_cols, which creates multiple files.

>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()