How to find column level lineage information with in the snowflake

1.7k Views Asked by At

I am trying to find column level lineage information with in the snowflake. Few blogs say's that we can build lineage from the data present in the Access_History view which is under Account Usage Schema but I could not find the relevant info from this view. Can anybody clarify on this.

2

There are 2 best solutions below

0
NickW On
0
Lukasz Szozda On

Column Lineage

Column lineage (i.e. access history for columns) extends the Account Usage ACCESS_HISTORY view to specify how data flows from the source column to the target column in a write operation. Snowflake tracks the data from the source columns through all subsequent table objects that reference data from the source columns (e.g. INSERT, MERGE, CTAS) provided that objects in the lineage chain are not dropped. Snowflake makes column lineage accessible by enhancing the OBJECTS_MODIFIED column in the ACCESS_HISTORY view.

Example: Column Lineage

The following example queries the ACCESS_HISTORY view and uses the FLATTEN function to flatten the OBJECTS_MODIFIED column.

...

select
  directSources.value: "objectId" as source_object_id,
  directSources.value: "objectName" as source_object_name,
  directSources.value: "columnName" as source_column_name,
  'DIRECT' as source_column_type,
  om.value: "objectName" as target_object_name,
  columns_modified.value: "columnName" as target_column_name
from
  (
    select
      *
    from
      snowflake.account_usage.access_history
  ) t,
  lateral flatten(input => t.OBJECTS_MODIFIED) om,
  lateral flatten(input => om.value: "columns", outer => true) columns_modified,
  lateral flatten(
    input => columns_modified.value: "directSources",
    outer => true
  ) directSources

union


select
  baseSources.value: "objectId" as source_object_id,
  baseSources.value: "objectName" as source_object_name,
  baseSources.value: "columnName" as source_column_name,
  'BASE' as source_column_type,
  om.value: "objectName" as target_object_name,
  columns_modified.value: "columnName" as target_column_name
from
  (
    select
      *
    from
      snowflake.account_usage.access_history
  ) t,
  lateral flatten(input => t.OBJECTS_MODIFIED) om,
  lateral flatten(input => om.value: "columns", outer => true) columns_modified,
  lateral flatten(
    input => columns_modified.value: "baseSourceColumns",
    outer => true
  ) baseSources
;