superleaf.dataframe.selection#

Functions

dfilter(df, *filters, **col_filters)

Filters a DataFrame by applying provided conditions.

partition(df, *filters, **col_filters)

Partitions a DataFrame into two subsets based on provided filtering conditions.

reorder_columns(df, columns[, back, after, ...])

Reorders columns in a DataFrame based on the provided parameters.

superleaf.dataframe.selection.dfilter(df: DataFrame, *filters, **col_filters) DataFrame[source]#

Filters a DataFrame by applying provided conditions.

Parameters:
  • df (pd.DataFrame) – The DataFrame to filter.

  • *filters – Variable positional arguments that can be: - Instances of ColOp. - Callables applied row-wise, returning boolean values. - Iterable of boolean values indicating row selection.

  • **col_filters – Keyword arguments mapping column names to conditions, which can be: - Values (equality filter). - Instances of ColOp. - Callables applied element-wise to the specified column.

Returns:

A copy of the DataFrame containing only rows satisfying all filters.

Return type:

pd.DataFrame

Examples

>>> filtered_df = dfilter(df, Col('age') > 30, status='active')
superleaf.dataframe.selection.partition(df: DataFrame, *filters, **col_filters) tuple[DataFrame, DataFrame][source]#

Partitions a DataFrame into two subsets based on provided filtering conditions.

Parameters:
  • df (pd.DataFrame) – The DataFrame to partition.

  • *filters – Variable positional arguments (see dfilter documentation).

  • **col_filters – Keyword arguments (see dfilter documentation).

Returns:

  • First DataFrame contains rows matching the provided filters.

  • Second DataFrame contains rows that do not match.

Return type:

tuple[pd.DataFrame, pd.DataFrame]

Example

>>> passed_df, failed_df = partition(df, score=lambda x: x > 50)
superleaf.dataframe.selection.reorder_columns(df: DataFrame, columns: str | Sequence[str], back=False, after=None, before=None) DataFrame[source]#

Reorders columns in a DataFrame based on the provided parameters.

Parameters:
  • df (pd.DataFrame) – The DataFrame whose columns to reorder.

  • columns (Union[str, Sequence[str]])) – Column name or sequence of column names to reorder.

  • back (bool, optional) – If True, moves specified columns to the end. Default is False.

  • after (str, optional) – Column name after which the specified columns should be placed. Default is None.

  • before (str, optional) – Column name before which the specified columns should be placed. Default is None.

Returns:

A new DataFrame with reordered columns.

Return type:

pd.DataFrame

Notes

Exactly one of back, after, or before can be used at a time.

Raises:

ValueError – If more than one of back, after, or before parameters are provided simultaneously.

Examples

>>> reordered_df = reorder_columns(df, ['age', 'name'], after='id')