Databricks metadata collector

Release version: Australia

Updated March 12, 2026

4 minutes to read

The Databricks metadata collector provides read-only access to metadata from an external Databricks account.

The collector harvests metadata from data assets in Databricks Hive Metadata, Unity Catalog (including Delta Lake), Workflows, and Notebooks.

Metadata cataloged

The Databricks collector catalogs the following information.

Table 1. Metadata harvested
Object	Information cataloged
Columns	Name, Description, JDBC type, Column Type, Is Nullable, Default Value, Column size, Column index Extended metadata: Tags Note: Deprecated columns and any lineage related to these deprecated columns are not cataloged.
Table	Name, Description, Schema, Primary key, Foreign key Extended metadata: Tags, Owner, Type, Creation date, Last Modified, Location, Provider, Version, Size, File Count, Partition Columns, Properties
Model	Name, Owner, Description, Created By, Created At, Last Modified By, Last Modified At, Securable Kind, Securable Type
Views	Name, Description, SQL definition, Tags
Schema	Name Extended metadata: Tags
Database	Type, Name, Server, Port, Environment, JDBC URL Extended metadata: Tags
Notebook	Notebook ID, Path, Language Type (SQL, Python, Scala, R)
Function	Name, Description, Function Type
Job	Title, Description, Creator, Created At, Job run as, Format, Max Concurrent Runs, Notification On Start, Timeouts (sec), Notification On Success, Schedule, Git Source, Notification on Failure, Tags, List of tasks, List of clusters
Cluster	Name, Description, Node Type ID, Driver Node Type ID, Spark Version, Number of Workers, Autoscale Max Workers, Autoscale Min Workers, AWS Attributes, Tags
Task	Task Key, Type of Task (Notebook, dbt, Spark jar, Python script, Python wheel, Pipeline task, SQL), Task timeout, Retry interval, Cluster used by the task, Max retries, Depends on, Libraries, Notifications (On start, On success, On failure), Notebook File Path, Notebook Source, Notebook Parameters, Spark Jar Main Class Name, Spark Jar Parameters, Python Script File path, Python Script Parameters, Spark Submit Parameters, Pipeline ID, Pipeline Full Refresh, Python Wheel Package Name, Python Wheel Entry Point, Python Wheel Parameters, SQL Warehouse, SQL Query ID, SQL Dashboard ID, SQL Alert ID, Dbt Project Directory, Dbt Profiles Directory, Dbt warehouse, Dbt catalog, Dbt schema, Dbt commands
External Location	Name, External URL, Description, Data Source Type, Created Date, Created By, Owner
Storage Credential	Name, Description, Credential, Created Date, Created By, Owner
Volume	Name, Description, Type, Owner, Created By, Created At, Last Modified By, Last Modified At, Metastore ID
Materialized View	Name, SQL Definition, Created, Last Modified
Metric View	Name, Description, YAML Definition, Source Table, Source Table Type, Filter, Created, Last Modified

Following additional information is cataloged when you run the collector with the Enable Governance Metadata Collection option.

Table 2. Cataloging Databricks Governance Policies
Object	Information Cataloged
Row Filter Access Control	Name
Column Mask Access Control	Name
Attribute Based Access Control	Name, Description, Created by, Created at, Modified by, Modified at, On securable type, For securable type, To principals, Except principals
Workspace bindings	Workspace ID, Binding type
Privileges	Granted to, Granted by, Privilege type, Granted on object, Inherited from

Relationships between objects

The harvested metadata includes catalog pages for the following data asset types. Each catalog page has a relationship to the other related data asset types.

Table 3. Relationships between harvested data asset pages
Data asset page	Relationships
Table	Columns contained in Table Table Indexes Has privileges Columns contained in Table
Schema	Database that contains Schema Table that is part of Schema Has privileges
Database	Schema contained in Database
Columns	Table Indexes Table containing Column
Table Indexes	Columns
Job	Clusters used by tasks in Job Tasks contained within Job
Cluster	Cluster contained in job Task using Cluster
Task	Job containing Task Cluster used by Task Tasks depending on Task
Notebook	Folder containing Notebook Task sourcing data from Notebook
Folder	Folders contained in Folder Notebooks contained in Folder
External Location	Uses storage credential Connects to datasource (S3 bucket, S3 Object, Azure container or Azure blob) Has workspace bindings
Storage Credential	Has workspace bindings Used by External Location
Model	Registered in schema Stored in data assets (S3 Bucket, S3 Object)
Volume	Contained within schema Stored in data assets (S3 Bucket, S3 Object)
Materialized View	Schema that contains Materialized Views Columns that are part of Materialized Views
Metric View	Schema that contains Metric Views Columns that are part of Metric Views
Pipeline	Copies data to Databricks Table/Database Schema Ingests data from Database Table/Database Schema
Unity Catalog Metastore	Databases contained in metastore
Row Filter Access Control	Applies to table Uses function Using column Contained within schema
Column Mask Access Control	Applies to column Uses function Contained within schema
Attribute Based Access Control	Applies to catalog, schema and table Defined on catalog, schema and table Uses function
Catalog	Has workspace bindings Has privileges

Lineage for Databricks

The following lineage information is collected by the Databricks collector.

Note:

The collector does not support lineage for SQL statements defined via variable statements.

Table 4. Object Lineage Availability
Object	Lineage available
Column in view	The collector identifies the associated column in an upstream view or table for both Hive metastore and Unity Catalog: Where the data is sourced from That sort the rows via ORDER BY That filter the rows via WHERE/HAVING That aggregate the rows via GROUP BY Note: Deprecated columns and any lineage related to these deprecated columns are not cataloged.
Notebook	Tasks that reference Notebook. (Only if Databricks Unity Catalog is enabled).
Table	The collector identifies the upstream and downstream tables and their external locations (S3 and ADLS Gen2 data assets) along with the intermediate Job. The collector harvests the Databricks table lineage to S3 object.

Authentication supported

The Databricks collector supports Personal access token authentication and Oauth service principal authentication.