Module compose (1.20.0)

Build composite transformers on heterogeneous data. This module is styled after scikit-Learn's compose module: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.compose.

Classes

ColumnTransformer

ColumnTransformer(
    transformers: typing.Iterable[
        typing.Tuple[
            str,
            typing.Union[
                bigframes.ml.preprocessing.OneHotEncoder,
                bigframes.ml.preprocessing.StandardScaler,
                bigframes.ml.preprocessing.MaxAbsScaler,
                bigframes.ml.preprocessing.MinMaxScaler,
                bigframes.ml.preprocessing.KBinsDiscretizer,
                bigframes.ml.preprocessing.LabelEncoder,
                bigframes.ml.preprocessing.PolynomialFeatures,
                bigframes.ml.impute.SimpleImputer,
                bigframes.ml.compose.SQLScalarColumnTransformer,
            ],
            typing.Union[str, typing.Iterable[str]],
        ]
    ]
)

Applies transformers to columns of BigQuery DataFrames.

This estimator allows different columns or column subsets of the input to be transformed separately, and the features generated by each transformer will be concatenated to form a single feature space. This is useful for heterogeneous or columnar data to combine several feature extraction mechanisms or transformations into a single transformer.

SQLScalarColumnTransformer

SQLScalarColumnTransformer(sql: str, target_column: str = "transformed_{0}")

Wrapper for plain SQL code contained in a ColumnTransformer.

Create a single column transformer in plain sql. This transformer can only be used inside ColumnTransformer.

When creating an instance '{0}' can be used as placeholder for the column to transform:

SQLScalarColumnTransformer("{0}+1")

The default target column gets the prefix 'transformed_' but can also be changed when creating an instance:

SQLScalarColumnTransformer("{0}+1", "inc_{0}")

Examples:

>>> from bigframes.ml.compose import ColumnTransformer, SQLScalarColumnTransformer
>>> import bigframes.pandas as bpd
<BLANKLINE>
>>> df = bpd.DataFrame({'name': ["James", None, "Mary"], 'city': ["New York", "Boston", None]})
>>> col_trans = ColumnTransformer([
...     ("strlen",
...      SQLScalarColumnTransformer("CASE WHEN {0} IS NULL THEN 15 ELSE LENGTH({0}) END"),
...      ['name', 'city']),
... ])
>>> col_trans = col_trans.fit(df)
>>> df_transformed = col_trans.transform(df)
>>> df_transformed
   transformed_name  transformed_city
0                 5                 8
1                15                 6
2                 4                15
<BLANKLINE>
[3 rows x 2 columns]

SQLScalarColumnTransformer can be combined with other transformers, like StandardScaler:

>>> col_trans = ColumnTransformer([
...     ("identity", SQLScalarColumnTransformer("{0}", target_column="{0}"), ["col1", "col5"]),
...     ("increment", SQLScalarColumnTransformer("{0}+1", target_column="inc_{0}"), "col2"),
...     ("stdscale", preprocessing.StandardScaler(), "col3"),
...     # ...
... ])