About Vertex AI Feature Store

Vertex AI Feature Store is a managed, cloud-native feature store service that's integral to Vertex AI. It streamlines your ML feature management and online serving processes by letting you manage your feature data in a BigQuery table or view. You can then serve features online directly from the BigQuery data source.

Vertex AI Feature Store provisions resources that let you set up online serving by specifying your feature data sources. It then acts as a metadata layer interfacing with the BigQuery data sources and serves the latest feature values directly from BigQuery for online predictions at low latencies.

In Vertex AI Feature Store, the BigQuery tables or views containing the feature data collectively form the offline store. You can maintain feature values, including historical feature data, in the offline store. Because all the feature data is maintained in BigQuery, Vertex AI Feature Store doesn't need to provision a separate offline store within Vertex AI. Moreover, if you want to use the data in the offline store to train ML models, you can use the APIs and capabilities in BigQuery to export or fetch the data.

The workflow to set up and start online serving using Vertex AI Feature Store can be summarized as follows:

  1. Prepare your data source in BigQuery.

  2. Optional: Register your data sources by creating feature groups and features.

  3. Set up online store and feature view resources to connect the feature data sources with online serving clusters.

  4. Serve the latest feature values online from a feature view.

Vertex AI Feature Store data model and resources

This section explains the data models and resources associated with the following aspects of Vertex AI Feature Store:

Data source preparation in BigQuery

During online serving, Vertex AI Feature Store uses feature data from BigQuery data sources. Before you set up Feature Registry or online serving resources, you must store your feature data in one or more BigQuery tables or views.

Within a BigQuery table or view, each column represents a feature. Each row contains feature values corresponding to a unique ID. For more information about how to prepare the feature data in BigQuery, see Prepare data source.

For example, in figure 1, the BigQuery table includes the following columns:

  • f1 and f2: Feature columns.

  • entity_id: An ID column containing the unique IDs to identify each feature record.

  • feature_timestamp: A timestamp column.

A feature view containing features f1 and f2 in a time-series format.
Figure 1. Example of a BigQuery data source.

Because you prepare the data source in BigQuery and not in Vertex AI, you don't need to create any Vertex AI resources at this stage.

Feature Registry setup

After you've prepared your data sources in BigQuery, you can register those data sources, including specific feature columns, in the Feature Registry.

Registering your features is optional. You can serve features online even if you don't add your BigQuery data sources to the Feature Registry. However, registering your features is advantageous in the following scenarios:

  • Your data contains multiple instances of the same entity ID and you need to prepare your data in a time-series format with a timestamp column. When you register your features, Vertex AI Feature Store looks up the timestamp and serves only the latest feature values.

  • You want to register specific feature columns from a data source.

  • You want to aggregate specific columns from multiple data sources to define a feature view instance.

  • You want to monitor the feature statistics and detect feature drift.

There are two types of Vertex AI Feature Store resources in the Feature Registry:

Feature Registry resources for feature data

To register your feature data in the Feature Registry, you need to create the following Vertex AI Feature Store resources:

  • Feature group (FeatureGroup): A FeatureGroupresource is associated with a specific BigQuery source table or view. It represents a logical grouping of feature columns, which are represented by Feature resources. A feature group also contains one or multiple entity ID columns to identify the feature records. If the feature data is in a time-series format, the feature group must also contain a timestamp column. For information about how to create a feature group, see Create a feature group.

  • Feature (Feature): A Featureresource represents a specific column containing feature values from the feature data source associated with its parent FeatureGroup resource. For information about how to create features within a feature group, see Create a feature.

For example, figure 2 illustrates a feature group including feature columns f1 and f2, sourced from a BigQuery table associated with the feature group. The BigQuery data source contains four feature columns—two columns are aggregated to form the feature group. The feature group also contains an entity ID column and a feature timestamp column.

A feature group containing features f1 and f2 in time-series format.
Figure 2. Example of a FeatureGroup containing two Feature columns sourced from a BigQuery data source.

Feature Registry resources for feature monitoring

Feature monitoring resources let you monitor the feature data registered using FeatureGroupand Feature resources. You can create the following resources related to feature monitoring:

  • Feature monitor (FeatureMonitor): A FeatureMonitor resource is associated with a FeatureGroup resource and one or more features within that feature group. It specifies the monitoring schedule. You can create multiple feature monitor resources to set up different monitoring schedules for the same a set of features within a feature group. For example, if the features f1 and f2 are updated every hour, but the features f3 and f4 are updated every day, you can create two feature monitor resources to efficiently monitor these features:

    • Feature monitor fm1 that runs a monitoring job every hour on the features f1 and f2.

    • Feature monitor fm2 that runs a monitoring job every day on the features f3 and f4.

  • Feature monitor job (FeatureMonitorJob): A FeatureMonitorJobresource contains the feature statistics and information retrieved when a feature monitoring job is run. It can also contain information about anomalies, such as feature drift, detected in the feature data.

For more information about how to create feature monitoring resources, see Monitor features for anomalies.

Online serving setup

To serve features for online predictions, you must define and configure at least one online serving cluster, and associate it with your feature data source or Feature Registry resources. In Vertex AI Feature Store, the online serving cluster is called an online store instance. An online store instance can contain multiple feature view instances, where each feature view is associated with a feature data source.

Online serving resources

To set up online serving, you must create the following Vertex AI Feature Store resources:

  • Online store (FeatureOnlineStore): A FeatureOnlineStore resource represents an online serving cluster instance and contains the online serving configuration, such as the number of online serving nodes. An online store instance doesn't specify the source of the feature data, but contains FeatureView resources that specify the feature data sources in either BigQuery or the Feature Registry. For information about how to create an online store instance, see Create an online store instance.

  • Feature view (FeatureView): A FeatureView resource is a logical collection of features in an online store instance. When you create a feature view, you can specify the location of the feature data source in either of the following ways:

    • Associate one or more feature groups and features from the Feature Registry. A feature group specifies the location of the BigQuery data source. A feature within the feature group points to a specific feature column within that data source.

    • Alternatively, associate a BigQuery source table or view.

    For information about how to create feature view instances within an online store, see Create a feature view.

For example, figure 3 illustrates a feature view comprising feature columns f2 and f4, which are sourced from two separate feature groups associated with a BigQuery table.

A feature view containing features f2 and f4 sourced from two feature group.
Figure 3. Example of a FeatureView containing features from two separate feature groups.

Online serving

Vertex AI Feature Store provides the following types of online serving for real-time online predictions:

  • Bigtable online serving is useful for serving large data volumes (terabytes of data). It's similar to online serving in Vertex AI Feature Store (Legacy) and provides improved caching to mitigate hotspotting. Bigtable online serving doesn't support embeddings. If you need to serve large volumes of data that are frequently updated and don't need to serve embeddings, use Bigtable online serving.

  • Optimized online serving lets you online serve features at ultra-low latencies. Although online serving latencies depend on the workload, Optimized online serving can provide lower latencies than Bigtable online serving and is recommended for most scenarios. Optimized online serving also supports embeddings management.

    To use Optimized online serving, you need to configure either a public endpoint or a dedicated Private Service Connect endpoint.

To learn how to set up online serving in Vertex AI Feature Store after you set up features, see Online serving types.

Offline serving for batch predictions or model training

Because you don't need to copy or import your feature data from BigQuery to a separate offline store in Vertex AI, you can use the data management and export capabilities of BigQuery to do the following:

For more information about machine learning using BigQuery, see BigQuery ML introduction.

Vertex AI Feature Store terms

feature engineering
  • Feature engineering is the process of transforming raw machine learning (ML) data into features that can be used to train ML models or to make predictions.

feature
  • In machine learning (ML), a feature is a characteristic or attribute of an instance or entity that's used as an input to train an ML model or to make predictions.

feature timestamp
  • A feature timestamp indicates when the set of feature values in a specific feature record for an entity were generated.

feature record
  • A feature record is an aggregation of all feature values that describe the attributes of a unique entity at a specific point in time.

Terms related to Feature Registry

feature registry
  • A feature registry is a central interface for recording feature data sources that you want to serve for online predictions. For more information, see Feature Registry setup.

feature group
  • A feature group is a feature registry resource that corresponds to a BigQuery source table or view containing feature data. A feature view might contain features and can be thought of as a logical grouping of feature columns in the data source.

feature serving
  • Feature serving is the process of exporting or fetching feature values for training or inference. In Vertex AI, there are two types of feature serving—online serving and offline serving. Online serving retrieves the latest feature values of a subset of the feature data source for online predictions. Offline or batch serving exports high volumes of feature data for offline processing, such as ML model training.

offline store
  • The offline store is a storage facility storing recent and historical feature data, which is typically used for training ML models. An offline store also contains the latest feature values, which you can serve for online predictions.

online store
  • In feature management, an online store is a storage facility for the latest feature values to be served for online predictions.

feature view
  • A feature view is a logical collection of features materialized from a BigQuery data source to an online store instance. A feature view stores and periodically refreshes the customer's feature data, which is refreshed periodically from the BigQuery source. A feature view is associated with the feature data storage either directly or through associations to feature registry resources.

Location constraints

All Vertex AI Feature Store resources must be located in the same region or the same multi-regional location as your BigQuery data source. For example, if the feature data source is located in us-central1, you must create your FeatureOnlineStore instance only in us-central1 or in the US multi-region location.

Feature metadata

Vertex AI Feature Store is integrated with Dataplex to provide feature governance capabilities, including feature metadata. Online store instances, feature views, and feature groups are automatically registered as data assets in Data Catalog, a Dataplex feature that catalogs metadata from these resources. You can then use the metadata search capability of Dataplex to search for, view, and manage the metadata for these resources. For more information about searching for Vertex AI Feature Store resources in Dataplex, see Search for resource metadata in Data Catalog.

Feature labels

You can add labels to resources during or after the resource creation. For more information about adding labels to existing Vertex AI Feature Store resources, see Update labels.

Resource version metadata

Vertex AI Feature Store only supports the version 0 for features.

Feature monitoring

Vertex AI Feature Store lets you set up feature monitoring to retrieve feature statistics and detect anomalies in feature data. You can either set up monitoring schedules to periodically run monitoring jobs, or manually run a monitoring job. For more information about setting up feature monitoring and running feature monitoring jobs, see Monitor features for anomalies.

Embedding management and vector retrieval

Optimized online serving in Vertex AI Feature Store supports embedding management. You can store embeddings in BigQuery as regular double arrays. Using the embedding management capabilities of Vertex AI Feature Store, you can perform vector similarity searches to retrieve entities that are approximate nearest neighbors for a specified entity or embedding value.

To use embedding management in Vertex AI Feature Store, you need to do the following:

For information about how to perform a vector similarity search in Vertex AI Feature Store, see Perform a vector search for entities.

Data retention

Vertex AI Feature Store retains the latest feature values for a unique ID, based on the timestamp associated with the feature values in the data source. There's no data retention limit in the online store.

Because the offline store is provisioned by BigQuery, data retention limits or quotas from BigQuery might apply to the feature data source, including historical feature values. Learn more about quotas and limits in BigQuery.

Quotas and limits

Vertex AI Feature Store enforces quotas and limits to help you manage resources by setting usage limits, and to protect the community of Google Cloud users by preventing unforeseen spikes in usage. To efficiently use Vertex AI Feature Store resources without hitting these constraints, review the Vertex AI Feature Store quotas and limits.

Pricing

For information about resource usage pricing for Vertex AI Feature Store, see Vertex AI Feature Store pricing.

Notebook tutorials

Use the following samples and tutorials to learn more about Vertex AI Feature Store.

Online feature serving and fetching of BigQuery data with Vertex AI Feature Store Bigtable online serving

In this tutorial, you learn how to use Bigtable online serving in Vertex AI Feature Store for online serving and fetching of feature values in BigQuery.

Open in Colab  |  Open in Colab Enterprise  |  View on GitHub  |  Open in Vertex AI Workbench user-managed notebooks

Online feature serving and fetching of BigQuery data with Vertex AI Feature Store Optimized online serving

In this tutorial, you learn how to use Optimized online serving in Vertex AI Feature Store for serving and fetching of feature values from BigQuery.

Open in Colab  |  Open in Colab Enterprise  |  View on GitHub  |  Open in Vertex AI Workbench user-managed notebooks

Online feature serving and vector retrieval of BigQuery data with Vertex AI Feature Store

In this tutorial, you learn how to use Vertex AI Feature Store for online serving and vector retrieval of feature values in BigQuery.

Open in Colab  |  Open in Colab Enterprise  |  View on GitHub  |  Open in Vertex AI Workbench user-managed notebooks

Vertex AI Feature Store feature view Service Agents

In this tutorial, you learn how to enable feature view Service Agents and grant each feature view access to the specific source data that is used.

Open in Colab  |  Open in Colab Enterprise  |  View on GitHub  |  Open in Vertex AI Workbench user-managed notebooks

Vertex AI Feature Store based LLM grounding tutorial

In this tutorial, you learn how to chunk user-provided data, and then generate embedding vectors for each chunk using a Large Language Model (LLM) that has embedding generation capabilities. The resulting embedding vector dataset can then be loaded into Vertex AI Feature Store, enabling fast feature retrieval and efficient online serving.

Open in Colab  |  Open in Colab Enterprise  |  View on GitHub  |  Open in Vertex AI Workbench user-managed notebooks

Build a GenAI RAG application with Vertex AI Feature Store and BigQuery

In this tutorial, you learn how to build a low-latency vector search system for your Gen AI application using BigQuery vector search and Vertex AI Feature Store.

Open in Colab  |  Open in Colab Enterprise  |  View on GitHub  |  Open in Vertex AI Workbench user-managed notebooks

Configure IAM Policy in Vertex AI Feature Store

In this tutorial, you learn how to configure an IAM policy to control access to resources and data stored within Vertex AI Feature Store.

Open in Colab  |  Open in Colab Enterprise  |  View on GitHub  |  Open in Vertex AI Workbench user-managed notebooks

What's next