Databricks metadata collector

  • Release version: Australia
  • Updated March 12, 2026
  • 4 minutes to read
  • The Databricks metadata collector provides read-only access to metadata from an external Databricks account.

    The collector harvests metadata from data assets in Databricks Hive Metadata, Unity Catalog (including Delta Lake), Workflows, and Notebooks.

    Metadata cataloged

    The Databricks collector catalogs the following information.

    Table 1. Metadata harvested
    Object Information cataloged
    Columns

    Name, Description, JDBC type, Column Type, Is Nullable, Default Value, Column size, Column index

    Extended metadata: Tags

    Note:
    Deprecated columns and any lineage related to these deprecated columns are not cataloged.
    Table

    Name, Description, Schema, Primary key, Foreign key

    Extended metadata: Tags, Owner, Type, Creation date, Last Modified, Location, Provider, Version, Size, File Count, Partition Columns, Properties

    Model

    Name, Owner, Description, Created By, Created At, Last Modified By, Last Modified At, Securable Kind, Securable Type

    Views

    Name, Description, SQL definition, Tags

    Schema

    Name

    Extended metadata: Tags

    Database

    Type, Name, Server, Port, Environment, JDBC URL

    Extended metadata: Tags

    Notebook

    Notebook ID, Path, Language Type (SQL, Python, Scala, R)

    Function

    Name, Description, Function Type

    Job

    Title, Description, Creator, Created At, Job run as, Format, Max Concurrent Runs, Notification On Start, Timeouts (sec), Notification On Success, Schedule, Git Source, Notification on Failure, Tags, List of tasks, List of clusters

    Cluster

    Name, Description, Node Type ID, Driver Node Type ID, Spark Version, Number of Workers, Autoscale Max Workers, Autoscale Min Workers, AWS Attributes, Tags

    Task

    Task Key, Type of Task (Notebook, dbt, Spark jar, Python script, Python wheel, Pipeline task, SQL), Task timeout, Retry interval, Cluster used by the task, Max retries, Depends on, Libraries, Notifications (On start, On success, On failure), Notebook File Path, Notebook Source, Notebook Parameters, Spark Jar Main Class Name, Spark Jar Parameters, Python Script File path, Python Script Parameters, Spark Submit Parameters, Pipeline ID, Pipeline Full Refresh, Python Wheel Package Name, Python Wheel Entry Point, Python Wheel Parameters, SQL Warehouse, SQL Query ID, SQL Dashboard ID, SQL Alert ID, Dbt Project Directory, Dbt Profiles Directory, Dbt warehouse, Dbt catalog, Dbt schema, Dbt commands

    External Location

    Name, External URL, Description, Data Source Type, Created Date, Created By, Owner

    Storage Credential

    Name, Description, Credential, Created Date, Created By, Owner

    Volume

    Name, Description, Type, Owner, Created By, Created At, Last Modified By, Last Modified At, Metastore ID

    Materialized View

    Name, SQL Definition, Created, Last Modified

    Metric View

    Name, Description, YAML Definition, Source Table, Source Table Type, Filter, Created, Last Modified

    Following additional information is cataloged when you run the collector with the ​Enable Governance Metadata Collection option.

    Table 2. Cataloging Databricks Governance Policies
    Object Information Cataloged
    Row Filter Access Control

    Name

    Column Mask Access Control

    Name

    Attribute Based Access Control

    Name, Description, Created by, Created at, Modified by, Modified at, On securable type, For securable type, To principals, Except principals

    Workspace bindings

    Workspace ID, Binding type

    Privileges

    Granted to, Granted by, Privilege type, Granted on object, Inherited from

    Relationships between objects

    The harvested metadata includes catalog pages for the following data asset types. Each catalog page has a relationship to the other related data asset types.

    Table 3. Relationships between harvested data asset pages
    Data asset page Relationships
    Table
    • Columns contained in Table
    • ​​​Table Indexes​
    • ​​Has privileges

    Columns contained in Table

    Schema
    • Database that contains Schema
    • Table that is part of Schema
    • Has privileges
    Database Schema contained in Database
    Columns
    • Table Indexes
    • Table containing Column
    Table Indexes Columns
    Job
    • Clusters used by tasks in Job
    • Tasks contained within Job
    Cluster
    • Cluster contained in job
    • Task using Cluster
    Task
    • Job containing Task
    • Cluster used by Task
    • Tasks depending on Task
    Notebook
    • Folder containing Notebook
    • Task sourcing data from Notebook
    Folder
    • Folders contained in Folder
    • Notebooks contained in Folder
    External Location
    • Uses storage credential
    • Connects to datasource (S3 bucket, S3 Object, Azure container or Azure blob)
    • Has workspace bindings
    Storage Credential
    • Has workspace bindings
    • Used by External Location
    Model
    • Registered in schema
    • Stored in data assets (S3 Bucket, S3 Object)
    Volume
    • Contained within schema
    • Stored in data assets (S3 Bucket, S3 Object)
    Materialized View
    • Schema that contains Materialized Views
    • Columns that are part of Materialized Views
    Metric View
    • Schema that contains Metric Views
    • Columns that are part of Metric Views
    Pipeline
    • Copies data to Databricks Table/Database Schema
    • Ingests data from Database Table/Database Schema
    Unity Catalog Metastore Databases contained in metastore
    Row Filter Access Control
    • Applies to table
    • Uses function
    • Using column
    • Contained within schema
    Column Mask Access Control
    • Applies to column
    • Uses function
    • Contained within schema
    Attribute Based Access Control
    • Applies to catalog, schema and table
    • Defined on catalog, schema and table
    • Uses function
    Catalog
    • Has workspace bindings
    • Has privileges

    Lineage for Databricks

    The following lineage information is collected by the Databricks collector.

    Note:
    The collector does not support lineage for SQL statements defined via variable statements.
    Table 4. Object Lineage Availability
    Object Lineage available
    Column in view The collector identifies the associated column in an upstream view or table for both Hive metastore and Unity Catalog:
    • Where the data is sourced from
    • That sort the rows via ORDER BY
    • That filter the rows via WHERE/HAVING
    • That aggregate the rows via GROUP BY
    Note:
    Deprecated columns and any lineage related to these deprecated columns are not cataloged.
    Notebook Tasks that reference Notebook. (Only if Databricks Unity Catalog is enabled).
    Table
    • The collector identifies the upstream and downstream tables and their external locations (S3 and ADLS Gen2 data assets) along with the intermediate Job.
    • The collector harvests the Databricks table lineage to S3 object.

    Authentication supported

    The Databricks collector supports Personal access token authentication and Oauth service principal authentication.