Data lineage is a Dataplex feature that lets you track how data moves through your systems: where it comes from, where it is passed to, and what transformations are applied to it. For detailed information about data lineage, see About data lineage in the Dataplex documentation.
You can track lineage for both Data Catalog entries and Dataplex Catalog entries. When you use data lineage with Data Catalog, be aware of the differences that are described in this document.
For more information about how to transition your Data Catalog content and usage to Dataplex Catalog, see Transition from Data Catalog to Dataplex Catalog.
Enable data lineage
To use data lineage with Data Catalog, in the project where you view lineage, enable the Data Catalog API and the Data Lineage API.
IAM roles
To use data lineage with Data Catalog, you must be
granted the
Data Catalog Viewer role (roles/datacatalog.viewer
)
to access the Data Catalog entry,
instead of the
Dataplex Catalog Viewer role (roles/dataplex.catalogViewer
).
Other required roles to view lineage and manipulate lineage information are the
same as when using data lineage with
Dataplex Catalog. For more information, see
Predefined roles for data lineage.
Limitations
- For entries that were created in Data Catalog, the Google Cloud console shows detailed information about the source and target using the Data Catalog entry.
- For entries that were created in Dataplex Catalog, the Google Cloud console doesn't show detailed information if there isn't an equivalent Data Catalog entry.
Locations
Data lineage for Data Catalog entries is available in the same Dataplex locations that data lineage for Dataplex Catalog entries is available. In addition, data lineage for Data Catalog entries (but not for Dataplex Catalog entries) is available in the following multi-regions:
asia
(Asia)eu
(Europe)us
(US)
Billing impact
When you enable Data Lineage API on a project, review the impact on your billing charges since Data Lineage API is enabled on a per-project basis.
For multi-regions, such as European Union (eu
), Asia (asia
),
and United States (us
), and for BigQuery Omni, lineage processing
is distributed to specific regions, and costs depend on the regions where
the processing is performed.