Module model_selection (0.15.0)

Functions for test/train split and model tuning. This module is styled after Scikit-Learn's model_selection module: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection.

Modules Functions

train_test_split

train_test_split(
    *arrays: typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series],
    test_size: typing.Optional[float] = None,
    train_size: typing.Optional[float] = None,
    random_state: typing.Optional[int] = None
) -> typing.List[typing.Union[bigframes.dataframe.DataFrame, bigframes.series.Series]]

Splits dataframes or series into random train and test subsets.

Parameters
Name	Description
`\*arrays`	`bigframes.dataframe.DataFrame or bigframes.series.Series` A sequence of BigQuery DataFrames or Series that can be joined on their indexes
`test_size`	`default None` The proportion of the dataset to include in the test split. If None, this will default to the complement of train_size. If both are none, it will be set to 0.25.
`train_size`	`default None` The proportion of the dataset to include in the train split. If None, this will default to the complement of test_size.
`random_state`	`default None` A seed to use for randomly choosing the rows of the split. If not set, a random split will be generated each time.

Returns
Type	Description
`List[Union[bigframes.dataframe.DataFrame, bigframes.series.Series]]`	A list of BigQuery DataFrames or Series.