Changelog

PyPI History

1.16.0 (2024-09-04)

Features

Add DataFrame.struct.explode to add struct subfields to a DataFrame (#916) (ad2f75e)
Implement bigframes.bigquery.json_extract_array (#910) (575a29e)
Recover struct column from exploded Series (#904) (7dd304c)

Bug Fixes

Fix issue with iterating on >10gb dataframes (#949) (2b0f0fa)
Improve Series.replace for dict input (#907) (4208044)
NullIndex in ML model.predict error (#917) (612271d)
Struct field non-nullable type issue. (#914) (149d5ff)
Unordered mode errors in ml train_test_split (#925) (85d7c21)

Performance Improvements

Improve repr performance (#918) (46f2dd7)

Dependencies

Re-introduce support for numpy 1.24.x (#931) (3d71913)
Update minimum support to Pandas 1.5.3 and Pyarrow 10.0.1 (#903) (7ed3962)

Documentation

Add Claude3 ML and RemoteFunc notebooks (#930) (cfd16c1)
Create sample notebook to manipulate struct and array data (#883) (3031903)
Update struct examples. (#953) (d632cd0)
Use unstack() from BigQuery DataFrames instead of pandas in the PyPI sample notebook (#890) (d1883cc)

1.15.0 (2024-08-20)

Features

Add llm.TextEmbeddingGenerator to support new embedding models (#905) (6bc6a41)
Add ml.llm.Claude3TextGenerator model (#901) (7050038)

Documentation

Add columns for “requires ordering/index” to supported APIs summary (#892) (d2fc51a)
Remove duplicate description for kms_key_name (#898) (1053d56)
Update embedding model notebooks (#906) (d9b8ef5)

1.14.0 (2024-08-14)

Features

Implement bigframes.bigquery.json_extract (#868) (3dbf84b)
Implement Series.str.__getitem__ (#897) (e027b7e)

Bug Fixes

Fix caching from generating row numbers in partial ordering mode (#872) (52b7786)

Performance Improvements

Generate SQL with fewer CTEs (#877) (eb60804)
Speed up compilation by reducing redundant type normalization (#896) (e0b11bc)

Documentation

Add streaming html docs (#884) (171da6c)
Fix the DisplayOptions doc rendering (#893) (3eb6a17)
Update streaming notebook (#887) (6e6f9df)

1.13.0 (2024-08-05)

Features

df.apply(axis=1) to support remote function with mutiple params (#851) (2158818)
Allow windowing in ‘partial’ ordering mode (#861) (ca26fe5)
Create a separate OrderingModePartialPreviewWarning for more fine-grained warning filters (#879) (8753bdd)

Bug Fixes

Fix issue with invalid sql generated by ml distance functions (#865) (9959fc8)

Documentation

Create sample notebook using ordering_mode="partial" (#880) (c415eb9)
Update streaming notebook (#875) (e9b0557)

1.12.0 (2024-07-31)

Features

Add bigframes-mode label to query jobs (#832) (c9eaff0)
Add config option to set partial ordering mode (#855) (823c0ce)
Add stratify param support to ml.model_selection.train_test_split method (#815) (27f8631)
Add streaming.StreamingDataFrame class (#864) (a7d7197)
Allow DataFrame.join for self-join on Null index (#860) (e950533)
Support remote function cleanup with session.close (#818) (ed06436)
Support to_csv/parquet/json to local files/objects (#858) (d0ab9cc)

Bug Fixes

Fewer relation joins from df self-operations (#823) (0d24f73)
Fix ‘sql’ property for null index (#844) (1b6a556)
Fix unordered mode using ordered path to print frame (#839) (93785cb)
Reduce redundant remote_function deployments (#856) (cbf2d42)

Documentation

Add partner attribution steps to integrations sample notebook (#835) (d7b333f)
Make get_global_session/close_session/reset_session appears in the docs (#847) (01d6bbb)

1.11.1 (2024-07-08)

Documentation

Remove session and connection in llm notebook (#821) (74170da)
Remove the experimental flask icon from the public docs (#820) (067ff17)

1.11.0 (2024-07-01)

Features

Add .agg support for size (#792) (87e6018)
Add bigframes.bigquery.json_set (#782) (1b613e0)
Add bigframes.streaming.to_pubsub method to create continuous query that writes to Pub/Sub (#801) (b47f32d)
Add DataFrame.to_arrow to create Arrow Table from DataFrame (#807) (1e3feda)
Add PolynomialFeatures support to to_gbq and pipelines (#805) (57d98b9)
Add Series.peek to preview data efficiently (#727) (580e1b9)
Expose gcf memory param in remote_function (#803) (014765c)
More informative error when query plan too complex (#811) (136dc24)

Bug Fixes

Include internally required packages in remote_function hash (#799) (4b8fc15)

Documentation

Document dtype limitation on row processing remote_function (#800) (487dff6)

1.10.0 (2024-06-21)

Features

Add dataframe.insert (#770) (e8bab68)
Add groupby head API (#791) (44202bc)
Add ml.preprocessing.PolynomialFeatures class (#793) (b4fbb51)
Bigframes.streaming module for continuous queries (#703) (0433a1c)
Include index columns in DataFrame.sql if they are named (#788) (c8d16c0)

Bug Fixes

Allow __repr__ to work with uninitialed DataFrame/Series/Index (#778) (e14c7a9)
Df.loc with the 2nd input as bigframes boolean Series (#789) (a4ac82e)
Ensure numpy version matches in remote_function deployment (#798) (324d93c)
Fix temp table creation retries by now throwing if table already exists. (#787) (0e57d1f)
Self-join optimization doesn’t needlessly invalidate caching (#797) (1b96b80)

1.9.0 (2024-06-10)

Features

Allow functions returned from bpd.read_gbq_function to execute outside of apply (#706) (ad7d8ac)
Support bigquery.vector_search() (#736) (dad66fd)
Support score() in GeminiTextGenerator (#740) (b2c7d8b)
Support bytes type in remote_function (#761) (4915424)
Support fit() in GeminiTextGenerator (#758) (d751f5c)

Bug Fixes

ARIMAPlus loads auto_arima_min_order param (#752) (39d7013)
Improve to_pandas_batches for large results (#746) (61f18cb)
Resolve issue with unset thread-local options (#741) (d93dbaf)

Documentation

Fix ML.EVALUATE spelling (#749) (7899749)
Remove LogisticRegression normal_equation strategy (#753) (ea5d367)

1.8.0 (2024-05-31)

Features

merge only generates a default index if both inputs already have an index (#733) (25d049c)
Add +, - as unary ops, ^ binary op (#724) (968d825)
Add GroupBy.size() to get number of rows in each group (#479) (1fca588)
Add DataFrame ~ operator (#721) (354abc1)
Add GeminiText 1.5 Preview models (#737) (56cbd3b)
Add slot_millis and add stats to session object (#725) (72e9583)
Adds bigframes.bigquery.array_to_string to convert array elements to delimited strings (#731) (f12c906)
Allow functions decorated with bpd.remote_function() to execute locally (#704) (d850da6)
Ensure "bigframes-api" label is always set on jobs, even if the API is unknown (#722) (1832778)
Support ml.SimpleImputer in bigframes (#708) (4c4415f)
Support type annotations to supply input and output types to bpd.remote_function() decorator (#717) (4a12e3c)
Support type annotations with bpd.remote_function() and axis=1 (a preview feature) (#730) (e5a2992)

Bug Fixes

Correct index labels in multiple aggregations for DataFrameGroupBy (#723) (6a78c89)
Fix Null index assign series to column (#711) (ffb4b57)
Set bpd.remote_function()s input_types and output_types default to None to allow omitting them when type annotations are present (#729) (0e25a3b)
Warn and disable time travel for linked datasets (#712) (085fa9d)

Performance Improvements

Optimize dataframe-series alignment on axis=1 (#732) (3d39221)

Documentation

Add examples to DataFrameGroupBy and SeriesGroupBy (#701) (e7da0f0)

1.7.0 (2024-05-20)

Features

read_gbq_query supports filters (9386373)
read_gbq suggests a correct column name when one is not found (9386373)
Add DefaultIndexKind.NULL to use as index_col in read_gbq\*, creating an indexless DataFrame/Series (#662) (29e4886)
Bigframes.bigquery.array_agg(SeriesGroupBy|DataFrameGroupby) (#663) (412f28b)
To_datetime supports utc=False for string inputs (#579) (adf9889)

Bug Fixes

read_gbq_table respects primary keys even when filters are set (#689) (9386373)
Fix type error in test_cluster (#698) (14d81c1)
Improve escaping of literals and identifiers (#682) (da9b136)
Properly identify non-unique index in tables without primary keys (#699) (6e0f4d8)
Remove a usage of the resource package when not available, such as on Windows (#681) (96243f2)
The imported samples error and use peek() (#688) (1a0b744)

Performance Improvements

Don’t run query immediately from read_gbq_table if filters is set (9386373)
Use a LIMIT clause when max_results is set (9386373)

Documentation

Add code snippets for imported onnx tutorials (#684) (cb36e46)
Add code snippets for imported tensorflow model (#679) (b02c401)
Use class_weight="balanced" in the logistic regression prediction tutorial (#678) (b951549)

1.6.0 (2024-05-13)

Features

Add DataFrame.__delitem__ (#673) (2218c21)
Add Series.case_when() (#673) (2218c21)
Add strategy="quantile" in KBinsDiscretizer (#654) (c6c487f)
Add Series.combine (#680) (2fd1b81)
Series.str.split (#675) (6eb19a7)
Suggest correct options in bpd.options.bigquery.location (#666) (57ccabc)
Support axis=1 in df.apply for scalar outputs (#629) (f6bdc4a)
Support gcf vpc connector in remote_function (#677) (9ca92d0)
Warn with a more specific DefaultLocationWarning category when no location can be detected (#648) (e084e54)

Bug Fixes

Include index_col when selecting columns and filters in read_gbq_table (#648) (e084e54)

Dependencies

Add jellyfish as a dependency for spelling correction (57ccabc)

Documentation

Add code snippets for llm text generatiion (#669) (93416ed)
Add logistic regression samples (#673) (2218c21)
Address lint errors in code samples (#665) (4fc8964)
Document inlining of small data in read_\* APIs (#670) (306953a)

1.5.0 (2024-05-07)

Features

bigframes.options and bigframes.option_context now uses thread-local variables to prevent context managers in separate threads from affecting each other (#652) (651fd7d)
Add ARIMAPlus.coef_ property exposing ML.ARIMA_COEFFICIENTS functionality (#585) (81d1262)
Add a unique session_id to Session and allow cleaning up sessions (#553) (c8d4e23)
Add the bigframes.bigquery sub-package with a bigframes.bigquery.array_length function (#630) (9963f85)
Always do a query dry run when option.repr_mode == "deferred" (#652) (651fd7d)
Custom query labels for compute options (#638) (f561799)
Warn with DefaultIndexWarning from read_gbq on clustered/partitioned tables with no index_col or filters set (#631, #658) (2715d2b, 73064dd)
Support index_col=False in read_csv and engine="bigquery" (73064dd)
Support gcf max instance count in remote_function (#657) (36578ab)

Bug Fixes

Don’t raise UnknownLocationWarning for US or EU multi-regions (#653) (8e4616b)
Fix bug with na in the column labels in stack (#659) (4a34293)
Use explicit session in PaLM2TextGenerator (#651) (e4f13c3)

Documentation

Add python code sample for multiple forecasting time series (#531) (16866d2)
Fix the Palm2TextGenerator output token size (#649) (c67e501)

1.4.0 (2024-04-29)

Features

Add .cache() method to persist intermediate dataframe (#626) (a5c94ec)
Add transpose support for small homogeneously typed DataFrames. (#621) (054075d)
Allow single input type in remote_function (#641) (3aa643f)
Expose gcf max timeout in remote_function (#639) (dfeaad0)
Series binary ops compatible with more types (#618) (518d315)
Support the score method for PaLM2TextGenerator (#634) (3ffc1d2)

Bug Fixes

Allow to_pandas to download more than 10GB (#637) (ce56495)
Extend row hash to 128 bits to guarantee unique row id (#632) (9005c6e)
Llm fine tuning tests (#627) (4724a1a)
Llm palm score tests (#643) (cf4ec3a)

Performance Improvements

Automatically condense internal expression representation (#516) (03c1b0d)
Cache transpose to allow performant retranspose (#635) (44b738d)

Documentation

Add supported pandas apis on the main page (#628) (8d2a51c)
Add the first sample for the Single time-series forecasting from Google Analytics data tutorial (#623) (2b84c4f)
Address more technical writers’ feedback (#640) (1e7793c)

1.3.0 (2024-04-22)

Features

Add Series.struct.dtypes property (#599) (d924ec2)
Add fine tuning fit() for Palm2TextGenerator (#616) (9c106bd)
Add quantile statistic (#613) (bc82804)
Expose max_batching_rows in remote_function (#622) (240a1ac)
Support primary key(s) in read_gbq by using as the index_col by default (#625) (75bb240)
Warn if location is set to unknown location (#609) (3706b4f)

Bug Fixes

Address technical writers fb (#611) (9f8f181)
Infer narrowest numeric type when combining numeric columns (#602) (8f9ece6)
Use exact median implementation by default (#619) (9d205ae)

Documentation

Fix rendering of examples for multiple apis (#620) (9665e39)
Set index_cols in read_gbq as a best practice (#624) (70015b7)

1.2.0 (2024-04-15)

Features

Add hasnans, combine_first, update to Series (#600) (86e0f38)
Add MultiIndex subclass. (#596) (5d0f149)
Add pivot_table for DataFrame. (#473) (5f1d670)
Add Series.autocorr (#605) (4ec8034)
Support list of numerics in pandas.cut (#580) (290f95d)

Bug Fixes

Address more technical writers feedback (#581) (4b08d92)
Error for object dtype on read_pandas (#570) (8702dcf)
Inverting int now does bitwise inversion rather than sign flip (#574) (5f1db8b)
Loc setitem dtype issue. (#603) (b94bae9)
Toc menu missing plotting name (#591) (eed12c1)

Documentation

(Series|Dataframe).dtypes (#598) (edef48f)
Add code samples for str accessor methdos (#594) (a557ea2)
Add docs for DataFrame and Series dunder methods (#562) (8fc26c4)
Add examples for at/iat (#582) (3be4a2e)

1.1.0 (2024-04-04)

Features

(Series|DataFrame).explode (#556) (9e32f57)
Add DataFrame.eval and DataFrame.query (#361) (5e28ebd)
Add ColumnTransformer save/load (#541) (9d8cf67)
Add ml.metrics.mean_squared_error (#559) (853c25e)
Add support for numpy expm1, log1p, floor, ceil, arctan2 ops (#505) (e8e66cf)
Add transformers save/load (#552) (d805241)
Allow DataFrame binary ops to align on either axis and with loc… (#544) (6d8f3af)
Expose DataFrame.bqclient to assist in integrations (#519) (0be8911)
Read_pandas accepts pandas Series and Index objects (#573) (f8821fe)
Support ML.GENERATE_EMBEDDING in PaLM2TextEmbeddingGenerator (#539) (1156c1e)
Support max_columns in repr and make repr more efficient (#515) (54e49cf)

Bug Fixes

Assign NaN scalar to column error. (#513) (0a4153c)
Don’t download 100gb onto local python machine in load test (#537) (082c58b)
Exclude list-like s parameter in plot.scatter (#568) (1caac27)
Fix case where df.peek would fail to execute even with force=True (#511) (8eca99a)
Fix error in Series.drop(0) (#575) (75dd786)
Include all names in MultiIndex repr (#564) (b188146)
Plot.scatter s parameter cannot accept float-like column (#563) (8d39187)
Product operation produces float result for all input types (#501) (6873b30)
Reloaded transformer .transform error (#569) (39fe474)
Rename PaLM2TextEmbeddingGenerator.predict output columns to be backward compatible (#561) (4995c00)
Respect hard stack size limit and swallow limit change exception. (#558) (4833908)
Restore string to date/time type coercion (#565) (4ae0262)
Sync the notebook with embedding changes (#550) (347f2dd)
Use bytes limit on frame inlining rather than element count (#576) (659a161)

Performance Improvements

Add multi-query execution capability for complex dataframes (#427) (d2d7e33)

Dependencies

Include pyarrow as a dependency (#529) (9b1525a)

Documentation

bigframes.options.bigquery.project and location are optional in some circumstances (#548) (90bcec5)
Add “Supported pandas APIs” reference to the documentation (#542) (74c3915)
Add General Availability banner to README (#507) (262ff59)
Add opeartions in API docs (#557) (ea95761)
Add progress_bar code sample (#508) (92a1af3)
Add the code samples for metrics{auc, roc_auc_score, roc_curve} (#520) (5f37b09)
Address more comments from technical writers to meet legal purposes (#571) (9084df3)
Fix docs of ARIMAPlus.predict (#512) (3b80f95)
Include Index in table-of-contents (#564) (b188146)
Mark Gemini model as Pre-GA (#543) (769868b)
Migrate the overview page to Bigframes official landing page (#536) (a0fb8bb)

1.0.0 (2024-03-25)

⚠ BREAKING CHANGES

rename model parameter min_rel_progress to tol
early_stop setting no longer supported, always uses True
rename model parameter n_parallell_trees to n_estimators
rename class_weights to class_weight
rename learn_rate to learning_rate
PCA n_components supports float value and None, default to None
rename various ml model parameters for consistency with sklearn (https://github.com/googleapis/python-bigquery-dataframes/pull/491)

Features

Add configuration option to read_gbq (#401) (85cede2)
Add ml ARIMAPlus model params (#488) (352cb85)
Add ml KMeans model params (#477) (23a8d9a)
Add ml LogisticRegression model params (#481) (f959b65)
Add ml PCA model params (#474) (fb5d83b)
Add params for LinearRegression model (#464) (21b2188)
Add support for Python 3.12 (#231) (df2976f)
Allow assigning directly to Series.name property (#495) (ad0e99e)
Ensure Series.str.len() can get length of array columns (#497) (10c0446)
Option to use bq connection without check (#460) (0b3f8e5)
PCA n_components supports float value and None, default to None (65c6f47)
Rename class_weights to class_weight (65c6f47)
Rename learn_rate to learning_rate (65c6f47)
Rename model parameter min_rel_progress to tol (65c6f47)
Rename model parameter n_parallell_trees to n_estimators (65c6f47)
Rename various ml model parameters for consistency with sklearn (https://github.com/googleapis/python-bigquery-dataframes/pull/491) (65c6f47)
Support BQ regional endpoints for europe-west9, europe-west3, us-east4, and us-west1 (#504) (fbada4a)
Support dataframe.cov (#498) (c4beafd)
Support Series.dt.floor (#493) (2dd01c2)
Support Series.dt.normalize (#483) (0bf1e91)
Update plot sample to 1000 rows (#458) (60d4a7b)

Bug Fixes

early_stop setting no longer supported, always uses True (65c6f47)
Fix -1 offset lookups failing (#463) (2dfb9c2)
Plot.scatter c argument functionalities (#494) (d6ee994)
Properly support format param for numerical input. (#486) (ae20c35)
Renable to_csv and to_json related tests (#468) (2b9a01d)
Sampling plot cannot preserve ordering if index is not ordered (#475) (a5345fe)
Use actual BigQuery types rather than ibis types in to_pandas (#500) (82b4f91)

Dependencies

Support pandas 2.2 (#492) (e2cf50e)

Documentation

Add code samples for metrics.{accuracy_score, confusion_matrix} (#478) (3e3329a)
Add code samples for metrics.{recall_score, precision_score, f11_score} (#502) (370fe90)
Improve API documentation (#489) (751266e)
Update bigquery connection documentation (#499) (4bfe094)
Update LLM + K-means notebook to handle partial failures (#496) (97afad9)

0.26.0 (2024-03-20)

⚠ BREAKING CHANGES

exclude remote models for .register() (#465)

Features

(Series|DataFrame).plot (#438) (1c3e668)
read_gbq_table supports LIKE as a operator in filters (#454) (d2d425a)
Add DataFrame.pipe() method (#421) (95f5a6e)
Set force=True by default in DataFrame.peek() (#469) (4e8e97d)
Support datetime related casting in (Series|DataFrame|Index).astype (#442) (fde339b)
Support Series.dt.strftime (#453) (8f6e955)

Bug Fixes

Any() on empty set now correctly returns False (#471) (f55680c)
Df.drop_na preserves columns dtype (#457) (3bab1a9)
Disable to_json and to_csv related tests (#462) (874026d)
Exclude remote models for .register() (#465) (73fe0f8)
Fix broken link in covid notebook (#450) (adadb06)
Fix broken multiindex loc cases (#467) (b519197)
Fix grouping series on multiple other series (#455) (3971bd2)
Groupby aggregates no longer check if grouping keys are numeric (#472) (4fbf938)
Raise ValueError when read_pandas() receives a bigframes DataFrame (#447) (b28f9fd)
Series.(to_csv|to_json) leverages bq export (#452) (718a00c)
Warn when read_gbq / read_gbq_table uses the snapshot time cache (#441) (e16a8c0)

Documentation

Add code samples for ml.metrics.r2_score (#459) (85fefa2)
Add the docs for loc and iloc indexers (#446) (14ab8d8)
Add the pages for at and iat indexers (#456) (340f0b5)
Add version information to bug template (#437) (91bd39e)
Indicate that project and location are optional in example notebooks (#451) (1df0140)

0.25.0 (2024-03-14)

Features

(Series|DataFrame).plot.(line|area|scatter) (#431) (0772510)
Support CMEK for remote_function cloud functions (#430) (2fd69f4)

0.24.0 (2024-03-12)

⚠ BREAKING CHANGES

read_parquet uses a “pandas” engine to parse files by default. Use engine="bigquery" for the previous behavior

Features

(Series|Dataframe).plot.hist() (#420) (4aadff4)
Add detect_anomalies to ml ARIMAPlus and KMeans models (#426) (6df28ed)
Add engine parameter to read_parquet (#413) (31325a1)
Add ml PCA.detect_anomalies method (#422) (8d82945)
Support BYOSA in remote_function (#407) (d92ced2)
Support CMEK for BQ tables (#403) (9a678e3)

Bug Fixes

Move third_party.bigframes_vendored to bigframes_vendored (#424) (763edeb)
Only do row identity based joins when joining by index (#356) (76b252f)
Read_pandas inline respects location (#412) (ae0e3ea)

Documentation

Add predict sample to samples/snippets/bqml_getting_started_test.py (#388) (6a3b0cc)
Document minimum IAM requirement (#416) (36173b0)
Fix the note rendering for DataFrames methods: nlargest, nsmallest (#417) (38bd2ba)

0.23.0 (2024-03-05)

Features

Add ml.metrics.pairwise.euclidean_distance (#397) (1726588)
Add TextEmbedding model version support (#394) (e0f1ab0)

Bug Fixes

Code exception in remote_function now prevents retry and surfaces in the client (#387) (dd3643d)
Docs link for metrics.pairwise (#400) (a60aba7)

Dependencies

Update ibis to version 8.0.0 and refactor remote_function to use ibis UDF method (#277) (350499b)

Documentation

Update README to point to new summary pages (#402) (bfe2b23)

0.22.0 (2024-02-27)

⚠ BREAKING CHANGES

rename cosine_similarity to paired_cosine_distances (#393)
move model optional args to kwargs (#381)

Features

Add DataFrames.corr() method (#379) (67fd434)
Add ml.metrics.pairwise.manhattan_distance (#392) (9d31865)
Enable regional endpoints for me-central2 (#386) (469674d)

Bug Fixes

Avoid ibis warning for “database” table() method argument (#390) (a0490a4)
Correct the numeric literal dtype (#365) (93b02cd)
Rename cosine_similarity to paired_cosine_distances (#393) (81ece46)

Performance Improvements

Inline read_pandas for small data (#383) (59b446b)

Dependencies

Add minimum version constraint for sqlglot to 19.9.0 (#389) (8b62d77)

Documentation

Add a code sample for creating a kmeans model (#267) (4291d65)
Fix bigframes.pandas.concat documentation (#382) (234b61c)

Miscellaneous Chores

Release 0.22.0 (#396) (8f73d9e)

Code Refactoring

Move model optional args to kwargs (#381) (4037992)

0.21.0 (2024-02-13)

Features

Add Series.cov method (#368) (443db22)
Add ml.llm.GeminiTextGenerator model (#370) (de1e0a4)
Add ml.metrics.pairwise.cosine_similarity function (#374) (126f566)
Add XGBoostModel (#363) (d5518b2)
Limited support of lambdas in Series.apply (#345) (208e081)
Support bigframes.pandas.to_datetime for scalars, iterables and series. (#372) (ffb0d15)
Support read_gbq wildcard table path (#377) (90caf86)

Bug Fixes

Error message fix. (#375) (930cf6b)

Documentation

Clarify ADC pre-auth in a non-interactive environment (#348) (99a9e6e)

0.20.1 (2024-02-06)

Performance Improvements

Make repr cache the block where appropriate (#350) (068879f)

Documentation

Add a sample to demonstrate the evaluation results (#364) (cff0919)
Fix the DataFrame.apply code sample (#366) (1866a26)

0.20.0 (2024-01-30)

Features

Add DataFrame.peek() as an efficient alternative to head() results preview (#318) (9c34d83)
Add ARIMA_EVAULATE options in forecasting models (#336) (73e997b)
Add Index constructor, repr, copy, get_level_values, to_series (#334) (e5d054e)
Improve error message for drive based BQ table reads (#344) (0794788)
Update cut to work without labels = False and show intervals as dict (#335) (4ff53db)

Bug Fixes

Chance default connection name in getting_started.ipnyb (#347) (677f014)
Series iteration correctly returns values instead of index (#339) (2c6af9b)

Documentation

Add code samples for Series.{between, cumprod} (#353) (09a52fd)

0.19.2 (2024-01-22)

Bug Fixes

Read_gbq large response issue (#332) (b8178b9)
Use object dtype for ARRAY columns in to_pandas() with pandas 1.x (#329) (374ddb5)

Documentation

Add DataFrame.applymap documentation (#326) (bd531a1)
Add code samples for series methods (#323) (32cc6fa)
Add remote model requirements (#333) (c91f70c)

0.19.1 (2024-01-17)

Bug Fixes

Handle multi-level columns for df aggregates properly (#305) (5bb45ba)
Update max_output_token limitation. (#308) (5cccd36)

Documentation

Add code samples for Series.corr (#316) (9150c16)

0.19.0 (2024-01-09)

Features

Add ‘columns’ as an alias for ‘col_order’ (#298) (a01b271)
Add Series dt.tz and dt.unit properties (#303) (2e1a403)
Add to_gbq() method for LLM models (#299) (dafbc1b)
Allow manually set clustering_columns in dataframe.to_gbq (#302) (9c21323)
Support assigning to columns like a property (#304) (f645c56)
Support upcasting numeric columns in concat (#294) (e3a056a)

Bug Fixes

DF.drop tuple input as multi-index (#301) (21391a9)
Fix bug converting non-string labels to sql ids (#296) (a61c5fe)

Documentation

Add code samples for Series.ffill and DataFrame.ffill (#307) (1c63b45)

0.18.0 (2024-01-02)

Features

Add dataframe.to_html (#259) (2cd6489)
Add IntervalIndex support to bigframes.pandas.cut (#254) (6c1969a)
Add replace method to DataFrame (#261) (5092215)
Specific pyarrow mappings for decimal, bytes types (#283) (a1c0631)

Bug Fixes

Dataframes to_gbq now creates dataset if it doesn’t exist (#222) (bac62f7)
Exclude pandas 2.2.0rc0 to unblock prerelease tests (#292) (ac1a745)
Fix DataFrameGroupby.agg() issue with as_index=False (#273) (ab49350)
Make Series.str.replace work for simple strings (#285) (ad67465)
Update dataframe.to_gbq to dedup column names. (#286) (746115d)
Use setuptools.find_namespace_packages (#246) (9ec352a)

Dependencies

Migrate to ibis-framework >= "7.1.0" (#53) (9798a2b)

Documentation

Add code snippets for explore query result page (#278) (7cbbb7d)
Code samples for astype common to DataFrame and Series (#280) (95b673a)
Code samples for DataFrame.copy and Series.copy (#290) (7cbc2b0)
Code samples for drop and fillna (#284) (9c5012e)
Code samples for isna, isnull, dropna, isin (#289) (ad51035)
Code samples for rename , size (#293) (eb69f60)
Code samples for reset_index and sort_values (#282) (acc0eb7)
Code samples for sample, get, Series.round (#295) (c2b1892)
Code samples for Series.{add, replace, unique, T, transpose} (#287) (0e1bbfc)
Code samples for Series.{map, to_list, count} (#290) (7cbc2b0)
Code samples for Series.{name, std, agg} (#293) (eb69f60)
Code samples for Series.groupby and Series.{sum,mean,min,max} (#280) (95b673a)
Code samples for DataFrame set_index, items (#295) (c2b1892)
Fix the rendering for get_dummies (#291) (252f3a2)

0.17.0 (2023-12-14)

Features

Add filters argument to read_gbq for enhanced data querying (#198) (034f71f)
Add module/class level api tracking (#272) (4f3db3d)
Deprecate use_regional_endpoints (#199) (319a1f2)

Bug Fixes

Increase recursion limit, cache compilation tree hashes (#184) (b54791c)
Replaced raise NotImplementedError with return NotImplemented (#258) (a133822)

Documentation

Add code samples for values and value_counts (#249) (f247d95)
Add sample for getting started with BQML (#141) (fb14f54)

0.16.0 (2023-12-12)

Features

Add ARIMAPlus.predict parameters (#264) (99598c7)
Add DataFrame from_dict and from_records methods (#244) (8d81e24)
Add DataFrame.select_dtypes method (#242) (1737acc)
Add nunique method to Series/DataFrameGroupby (#256) (c8ec245)
Support dataframe.loc with conditional columns selection (#233) (3febea9)

Bug Fixes

Enfore pandas version requirement <2.1.4 (#265) (9dd63f6)
Exclude pandas 2.1.4 from prerelease tests to unblock e2e tests (b02fc2c)
Fix value_counts column label for normalize=True (#245) (d3fa6f2)
Migrate e2e tests to bigframes-load-testing project (8766ac6)
Ml.sql logic (#262) (68c6fdf)
Update the llm_kmeans notebook (#247) (66d1839)

Documentation

Add code samples for shape and head (#257) (5bdcc65)
Add example for dataframe.melt, dataframe.pivot, dataframe.stac… (#252) (8c63697)
Add example to dataframe.nlargest, dataframe.nsmallest, datafra… (#234) (e735412)
Add examples for dataframe.cummin, dataframe.cummax, dataframe.cumsum, dataframe.cumprod (#243) (0523a31)
Add examples for dataframe.nunique, dataframe.diff, dataframe.a… (#251) (77074ec)
Correct the docs for option_context (#263) (d21c6dd)
Correct the params rendering for ml.remote and ml.ensemble modules (#248) (c2829e3)
Fix return annotation in API docstrings (#253) (89a1c67)

0.15.0 (2023-11-29)

⚠ BREAKING CHANGES

model.predict returns all the columns (#204)

Features

Add info and memory_usage methods to dataframe (#219) (9d6613d)
Add remote vertex model support (#237) (0bfc4fb)
Add the recent api method for ML component (#225) (ed8876d)
Model.predict returns all the columns (#204) (416171a)
Send warnings on LLM prediction partial failures (#216) (81125f9)

Bug Fixes

Add df snapshots lookup for read_gbq (#229) (d0d9b84)
Avoid unnecessary row_number() on sort key for io (#211) (a18d40e)
Dedup special character (#209) (dd78acb)
Invalid JSON type of the notebook (#215) (a729831)
Make to_pandas override enable_downsampling when sampling_method is manually set. (#200) (ae03756)
Polish the llm+kmeans notebook (#208) (e8532b1)
Update the llm+kmeans notebook with recent change (#236) (f8917ab)
Use anonymous dataset to create remote_function (#205) (69b016e)

Documentation

Add code samples for index and column properties (#212) (c88d38e)
Add code samples for df reshaping, function, merge, and join methods (#203) (010486c)
Add examples for dataframe.kurt, dataframe.std, dataframe.count (#232) (f9c6e72)
Add examples for dataframe.mean, dataframe.median, dataframe.va… (#228) (edd0522)
Add examples for dataframe.min, dataframe.max and dataframe.sum (#227) (3a375e8)
Code samples for Series.dot and DataFrame.dot (#226) (b62a07a)
Code samples for Series.where and Series.mask (#217) (52dfad2)
Code samples for dataframe.any, dataframe.all and dataframe.prod (#223) (d7957fa)
Make the code samples reflect default bq connection usage (#206) (71844b0)

Miscellaneous Chores

Release 0.15.0 (#241) (6c899be)

0.14.1 (2023-11-16)

Bug Fixes

Correctly handle null values when initializing fingerprint ordering (#210) (8324f13)

Documentation

Add an example notebook about line graphs (#197) (f957b27)

0.14.0 (2023-11-14)

Features

Add ‘cross’ join support (#176) (765446a)
Add ‘index’, ‘pad’, ‘nearest’ interpolate methods (#162) (6a28403)
Add series.sample (identical to existing dataframe.sample) (#187) (37914a4)
Add unordered sql compilation (#156) (58f420c)
Log most recent API calls as recent-bigframes-api-xx labels on BigQuery jobs (#145) (4ea33b7)
Read_gbq creates order deterministically without table copy (#191) (8ab81de)
Support date_series.astype("string[pyarrow]") to cast DATE to STRING (#186) (aee0e8e)
Support series.at[row_label] = scalar (#173) (0c8bd33)
Temporary resources no longer use BigQuery Sessions (#194) (4a02cac)

Bug Fixes

All sort operation are now stable (#195) (3a2761f)
Default to 7 days expiration for read_csv, read_json, read_parquet (#193) (03606cd)
Deprecate the remote_service_type in llm model (#180) (a8a409a)
For reset_index on unnamed multiindex, always use level_[n] label (#182) (f95000d)
Match pandas behavior when assigning listlike to empty dfs (#172) (c1d1f42)
Use anonymous dataset instead of session dataset for temp tables (#181) (800d44e)
Use random table for read_pandas (#192) (741c75e)
Use random table when loading data for read_csv, read_json, read_parquet (#175) (9d2e6dc)

Documentation

Add code samples for read_gbq_function using community UDFs (#188) (7506eab)
Add docstring code samples for Series.apply and DataFrame.map (#185) (c816d84)
Add llm kmeans notebook as an included example (#177) (d49ae42)
Use head() to get top n results, not to preview results (#190) (87f84c9)

0.13.0 (2023-11-07)

Features

to_gbq without a destination table writes to a temporary table (#158) (e1817c9)
Add DataFrame.__iter__, DataFrame.iterrows, DataFrame.itertuples, and DataFrame.keys methods (#164) (c065071)
Add Series.__iter__ method (#164) (c065071)
Add interpolate() to series and dataframe (#157) (b9cb55c)
Support 32k text-generation and multilingual embedding models (#161) (5f0ea37)

Bug Fixes

Update default temp table expiration to 7 days (#174) (4ff26cd)

0.12.0 (2023-11-01)

Features

Add DataFrame.melt (#113) (4e4409c)
Add DataFrame.to_pandas_batches() to download large DataFrame objects (#136) (3afd4a3)
Add bigframes.options.compute.maximum_bytes_billed option that sets maximum bytes billed on query jobs (#133) (63c7919)
Add pandas.qcut (#104) (8e44518)
Add pd.get_dummies (#149) (d8baad5)
Add unstack to series, add level param (#115) (5edcd19)
Implement operator @ for DataFrame.dot (#139) (79a638e)
Populate ibis version in user agent (#140) (c639a36)

Bug Fixes

Don’t override the global logging config (#138) (2ddbf74)
Fix bug with column names under repeated column assignment (#150) (29032d0)
Resolve plotly rendering issue by using ipython html for job pro… (#134) (39df43e)
Use indexee’s session for loc listlike cases (#152) (27c5725)

Documentation

Add artithmetic df sample code (#153) (ac44ccd)
Fix indentation on read_gbq_function code sample (#163) (0801d96)
Link to ML.EVALUATE BQML page for score() methods (#137) (45c617f)

0.11.0 (2023-10-26)

Features

Add back reset_session as an alias for close_session (#124) (694a85a)
Change query parameter to query_or_table in read_gbq (#127) (f9bb3c4)

Bug Fixes

Expose bigframes.pandas.reset_session as a public API (#128) (b17e1f4)
Use series’s own session in series.reindex listlike case (#135) (95bff3f)

Documentation

Add runnable code samples for DataFrames I/O methods and property (#129) (6fea8ef)
Add runnable code samples for reading methods (#125) (a669919)

0.10.0 (2023-10-19)

Features

Implement DataFrame.dot for matrix multiplication (#67) (29dd414)

0.9.0 (2023-10-18)

⚠ BREAKING CHANGES

rename bigframes.pandas.reset_session to close_session (#101)

Features

Add bigframes.options.bigquery.application_name for partner attribution (#117) (52d64ff)
Add AtIndexer getitems (#107) (752b01f)
Rename bigframes.pandas.reset_session to close_session (#101) (36693bf)
Send BigQuery cancel request when canceling bigframes process (#103) (e325fbb)
Support external packages in remote_function (#98) (ec10c4a)
Use ArrowDtype for STRUCT columns in to_pandas (#85) (9238fad)

Bug Fixes

Support multiindex for three loc getitem overloads (#113) (68e3cd3)

Performance Improvements

If primary keys are defined, read_gbq avoids copying table data (#112) (e6c0cd1)

Documentation

Add documentation for Series.struct.field and Series.struct.explode (#114) (a6dab9c)
Add open-source link in API doc (#106) (db51fe3)
Update ML overview API doc (#105) (1b3f3a5)

0.8.0 (2023-10-12)

⚠ BREAKING CHANGES

The default behavior of to_parquet is changing from no compression to 'snappy' compression.

Features

Support compression in to_parquet (a8c286f)

Bug Fixes

Create session dataset for remote functions only when needed (#94) (1d385be)

0.7.0 (2023-10-11)

Features

Add aliases for several series properties (#80) (c0efec8)
Add equals methods to series/dataframe (#76) (636a209)
Add iat and iloc accessing by tuples of integers (#90) (228aeba)
Add level param to DataFrame.stack (#88) (97b8bec)
Allow df.drop to take an index object (#68) (740c451)
Use default session connection (#87) (4ae4ef9)

Bug Fixes

Change the invalid url in docs (#93) (969800d)

Documentation

Add more preprocessing models into the docs menu. (#97) (1592315)

0.6.0 (2023-10-04)

Features

Add df.unstack (#63) (4a84714)
Add idxmin, idxmax to series, dataframe (#74) (781307e)
Add ml.preprocessing.KBinsDiscretizer (#81) (24c6256)
Add multi-column dataframe merge (#73) (c9fa85c)
Add update and align methods to dataframe (#57) (bf050cf)
Support STRUCT data type with Series.struct.field to extract child fields (#71) (17afac9)

Bug Fixes

Avoid 403 response too large to return error with read_gbq and large query results (#77) (8f3b5b2)
Change return type of Series.loc[scalar] (#40) (fff3d45)
Fix df/series.iloc by list with multiindex (#79) (971d091)

0.5.0 (2023-09-28)

Features

Add DataFrame.kurtosis / DF.kurt method (c1900c2)
Add DataFrame.rolling and DataFrame.expanding methods (c1900c2)
Add items, apply methods to DataFrame. (#43) (3adc1b3)
Add axis param to simple df aggregations (#52) (9cf9972)
Add index dtype, astype, drop, fillna, aggregate attributes. (#38) (1a254a4)
Add ml.preprocessing.LabelEncoder (#50) (2510461)
Add ml.preprocessing.MaxAbsScaler (#56) (14b262b)
Add ml.preprocessing.MinMaxScaler (#64) (392113b)
Add more index methods (#54) (a6e32aa)
Support calculate_p_values parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support class_weights="balanced" in LogisticRegression model (c1900c2)
Support df[column_name] = df_only_one_column (c1900c2)
Support early_stop parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support enable_global_explain parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support l2_reg parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support learn_rate_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support ls_init_learn_rate parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support max_iterations parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support min_rel_progress parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support optimize_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)
Support casting string to integer or float (#59) (3502f83)

Bug Fixes

Fix header skipping logic in read_csv (#49) (d56258c)
Generate unique ids on join to avoid id collisions (#65) (7ab65e8)
LabelEncoder params consistent with Sklearn (#60) (632caec)
Loosen filter items tests to accomodate shifting pandas impl (#41) (edabdbb)

Performance Improvements

Add ability to cache dataframe and series to session table (#51) (416d7cb)
Inline small Series and DataFrames in query text (#45) (5e199ec)
Reimplement unpivot to use cross join rather than union (#47) (f9a93ce)
Simplify join order to use multiple order keys instead of string. (#36) (5056da6)

Documentation

Link to Remote Functions code samples from README and API reference (c1900c2)

0.4.0 (2023-09-16)

Features

Add axis parameter to droplevel and reorder_levels (7c6b0dd)
Add bfill and ffill to DataFrame and Series (7c6b0dd)
Add DataFrame.combine and DataFrame.combine_first (#27) (7c6b0dd)
Add DataFrame.nlargest, nsmallest (7c6b0dd)
Add DataFrame.pct_change and Series.pct_change (7c6b0dd)
Add DataFrame.skew and GroupBy.skew (7c6b0dd)
Add DataFrame.to_dict, to_excel, to_latex, to_records, to_string, to_markdown, to_pickle, to_orc (7c6b0dd)
Add diff method to DataFrame and GroupBy (7c6b0dd)
Add filter and reindex to Series and DataFrame (7c6b0dd)
Add reindex_like to DataFrame and Series (7c6b0dd)
Add swaplevel to DataFrame and Series (7c6b0dd)
Add partial support for Sereies.replace (7c6b0dd)
Support DataFrame.loc[bool_series, column] = scalar (7c6b0dd)
Support a persistent name in remote_function (7c6b0dd)

Bug Fixes

remote_function uses same credentials as other APIs (7c6b0dd)
Add type hints to models (7c6b0dd)
Raise error when ARIMAPlus is used with Pipeline (7c6b0dd)
Remove transforms parameter in model.fit (breaking change) (7c6b0dd)
Support column joins with “None indexer” (7c6b0dd)
Use for literals Int64Dtype in cut (7c6b0dd)
Use lowercase strings for parameter literals in bigframes.ml (breaking change) (7c6b0dd)

Performance Improvements

bigframes-api label to I/O query jobs (7c6b0dd)

Documentation

Document possible parameter values for PaLM2TextGenerator (7c6b0dd)
Document region logic in README (7c6b0dd)
Fix OneHotEncoder sample (7c6b0dd)

0.3.2 (2023-09-06)

Bug Fixes

Make release.sh script for PyPI upload executable (#20) (9951610)

0.3.1 (2023-09-05)

Bug Fixes

release: Use correct directory name for release build config (#17) (3dd25b3)

0.3.0 (2023-09-02)

Features

Add bigframes.get_global_session() and bigframes.reset_session() aliases (a32b747)
Add bigframes.pandas.read_pickle function (a32b747)
Add components_, explained_variance_, and explained_variance_ratio_ properties to bigframes.ml.decomposition.PCA (89b9503)
Add fit_transform to bigquery.ml transformers (a32b747)
Add Series.dropna and DataFrame.fillna (8fab755)
Add Series.str methods isalpha, isdigit, isdecimal, isalnum, isspace, islower, isupper, zfill, center (a32b747)
Support bigframes.pandas.merge() (8fab755)
Support DataFrame.isin with list and dict inputs (8fab755)
Support DataFrame.pivot (a32b747)
Support DataFrame.stack (89b9503)
Support DataFrame-DataFrame binary operations (8fab755)
Support df[my_column] = [a python list] (89b9503)
Support Index.is_monotonic (8fab755)
Support np.arcsin, np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, np.arctanh, np.exp with Series argument (89b9503)
Support np.sin, np.cos, np.tan, np.log, np.log10, np.sqrt, np.abs with Series argument (89b9503)
Support pow() and power operator in DataFrame and Series (8fab755)
Support read_json with engine=bigquery for newline-delimited JSON files (89b9503)
Support Series.corr (89b9503)
Support Series.map (8fab755)
Support for np.add, np.subtract, np.multiply, np.divide, np.power (8fab755)
Support MultiIndex for DataFrame columns (a32b747)
Use pandas.Index for column labels (a32b747)
Use default session and connection in ml.llm and ml.imported (8fab755)

Bug Fixes

Add error message to set_index (a32b747)
Align column names with pandas in DataFrame.agg results (89b9503)
Allow (but still not recommended) ORDER BY in read_gbq input when an index_col is defined (89b9503)
Check for IAM role on the BigQuery connection when initializing a remote_function (89b9503)
Check that types are specified in read_gbq_function (a32b747)
Don’t use query cache for Session construction (a32b747)
Include survey link in abstract NotImplementedError exception messages (89b9503)
Label temp table creation jobs with source=bigquery-dataframes-temp label (89b9503)
Make X_train argument names consistent across methods (8fab755)
Raise AttributeError for unimplemented pandas methods (89b9503)
Raise exception for invalid function in read_gbq_function (a32b747)
Support spaces in column names in DataFrame initializater (89b9503)

Performance Improvements

Add local cache for __repr_\*__ methods (a32b747)
Lazily instantiate client library objects (89b9503)
Use row_number() filter for head / tail (8fab755)

Documentation

Add ML section under Overview (a32b747)
Add release status to table of contents (a32b747)
Add samples and best practices to read_gbq docs (a32b747)
Correct the return types of Dataframe and Series (a32b747)
Create subfolders for notebooks (a32b747)
Fix link to GitHub (89b9503)
Highlight bigframes is open-source (a32b747)
Sample ML Drug Name Generation notebook (a32b747)
Set options.bigquery.project in sample code (89b9503)
Transform remote function user guide into sample code (a32b747)
Update remote function notebook with read_gbq_function usage (8fab755)

0.2.0 (2023-08-17)

Features

Add KMeans.cluster_centers_.
Allow column labels to be any type handled by bq df, column labels can be integers now.
Add dataframegroupby.agg().
Add Series Property is_monotonic_increasing and is_monotonic_decreasing.
Add match, fullmatch, get, pad str methods.
Add series isin function.

Bug Fixes

Update ML package to use sessions for queries.
Optimize read_gbq with index_col set to cluster by index_col.
Raise ValueError if the location mismatched.
read_gbq no longer uses ‘time travel’ with query inputs.

Documentation

Add docstring to _uniform_sampling to avoid user using it.

0.1.1 (2023-08-14)

Documentation

Correct link to code repository in setup.py and use correct terminology for console.cloud.google.com links.

0.1.0 (2023-08-11)

Features

Add bigframes.pandas package with an API compatible with pandas. Supported data sources include: BigQuery SQL queries, BigQuery tables, CSV (local and GCS), Parquet (local and Cloud Storage), and more.
Add bigframes.ml package with an API inspired by scikit-learn. Train machine learning models and run batch predicition, powered by BigQuery ML.

0.0.0 (2023-02-22)

Empty package to reserve package name.