superleaf.dataframe.standardize#

Functions

standardize_columns(df[, to_datetime, ...])

Standardize DataFrame column names and optionally convert columns to datetime.

superleaf.dataframe.standardize.standardize_columns(df: DataFrame, to_datetime: bool | str | Iterable[str] | None = False, force_datetime: bool | None = None, quiet=False) DataFrame[source]#

Standardize DataFrame column names and optionally convert columns to datetime.

This function returns a copy of the input DataFrame with column names stripped of leading/trailing whitespace, lowercased, and spaces replaced by underscores. Optionally, specified columns (or auto-detected date/time columns) are converted to pandas datetime.

Parameters:
  • df (pd.DataFrame) – The input DataFrame whose column names will be standardized.

  • to_datetime (bool, str, Iterable[str], or None, optional) – If False (default), no datetime conversion is attempted. If True, attempts to convert columns named ‘date’, ‘datetime’, ‘time’, or ‘timestamp’. If a string or non-iterable, converts the specified column name. If an iterable of strings, converts the listed column names.

  • force_datetime (bool or None, optional) – If True, forces conversion to datetime and raises on errors. If False, quietly coerces invalid parsing to NaT and may emit warnings. If None (default), set to True when converting specific columns (str or iterable), otherwise False.

  • quiet (bool, default False) – If False, prints a message when a column cannot be converted to datetime under non-forced mode.

Returns:

A new DataFrame with standardized column names and datetime-converted columns.

Return type:

pd.DataFrame

Examples

>>> df = pd.DataFrame({
...     ' Date ': ['2021-01-01', '2021-02-01'],
...     'Value': [10, 20]
... })
>>> standardize_columns(df, to_datetime=True)
  date  value
0 2021-01-01     10
1 2021-02-01     20