Create a Databricks metadata collector

Release version: Australia

Updated March 12, 2026

4 minutes to read

Create a collector to import metadata from Databricks.

Before you begin

Before you begin, verify the following:

A MID Server is setup for the collectors. For more information, see MID Server for metadata collectors.
All per-requisite tasks are completed. For more information, see Prepare to run the Databricks collector.
Role required: connection-admin

Procedure

Navigate to All > Workflow Data Fabric > Workflow Data Fabric Home.
Select the Connect Hub icon in the left sidebar.
Select Create > Metadata collector.
From the System list, select Databricks.
From the Connection type list, select one of the following:
1. Select New connection to configure a new connection.
2. Select Existing connection to reuse an existing connection and select an existing connection from the Connections list.
  The configuration form is filled with details from the existing connection. The name is appended with the word Copy and sensitive details like password aren't copied.

On the form, fill in the fields.

Table 1. Databricks metadata collector form
Field	Description
Connection name	Unique identifier for the connection. This field can't be modified once the connection is established.
Short description	Purpose and details of the connection.

Enter the Databricks configuration details.

Table 2. Configuration details
Field	Description
Server	Hostname of the database server to connect to.

Choose between Collect all schemas and Specify which schema to collect to configure the schema collection options.

Table 3. Schema collection options
Field	Description
Collect all schemas
Collect all schemas	Catalog all schemas to which the user has access.
Exclude Schema	Name or regular expression of the database schema to be excluded.
Include Information Schema	Include the database's Information Schema in catalog collection.
Specify which schema to collect
Specify which schema to collect	Catalog only the specified schemas.
Schema	Name of the database schema to catalog.

Enter the Databricks configuration details.

Table 4. Configuration details
Field	Description
Server port	Port of the database server (if not the default).
Database	Name of the database to connect to. Specify multiple databases by adding one value per line.
Databricks HTTP Path	Databricks compute resources URL. See Databricks documentation for details.
Excluded database	Name or regular expression for databases to exclude when the Database field is empty. Note: This parameter is ignored if the Database field is specified.

Configure the server details and authentication options.

Table 5. Server and authentication details
Field	Description
Server details
	Hostname of the database server to connect to.
Authentication options
Authenticate using personal access token	Option to authenticate using the Databricks personal access token. For details, see Databricks documentation.
Authenticate using Databricks Service Principal	Option to authenticate using the Databricks service principal client ID and Databricks Service Principal Client Secret.

Configure the statistics and sampling options.

Table 6. Statistics and sampling options
Field	Description
Enable column statistics collection	Enable harvesting of column statistics (data profiling). Note: Enabling profiling can increase the collector's runtime because the collector must read table data to generate profiling metadata.
Target sample size for column statistics	Number of rows sampled for computation of column statistics and string-value histograms. For example, to sample 1000 rows, set the parameter to 1000. Default: 100000
Disable Lineage collection	Skip harvesting of intra-database lineage metadata.
Disable Extended Metadata collection	Skip harvesting of extended metadata for data asset types such as database, schema, table, columns functions, stored procedures, user defined types, and synonyms. Basic metadata for these data asset types will still be harvested.

Configure the harvesting scope and limits options.

Table 7. Harvesting scope and limits options
Field	Description
Disable Harvesting Workflows	Skip harvesting of Databricks workflows and their lineage metadata.
Harvest Lineage from Other Schemas	Harvest lineage from other schemas.
Enable Sample String Values collection	Enable sampling and storage of sample values for string-valued columns.
Exclude system functions	Exclude harvesting of built-in Databricks system functions.
Disable Harvesting Notebook Content	Skip harvesting notebook content.
Page Size for Harvesting Queries	Specify the page size for harvesting queries. Default: 1000
Page Size for Databricks API Responses	Specify the page size for Databricks API response. Default: 100
Enable Metric Views Harvesting	Enable harvesting of metric views. Metric view information is extracted from a table's extended metadata and is available only when extended metadata harvesting is enabled.

Configure the connection and reliability options.

Table 8. Connection and reliability options
Field	Description
Server environment	Friendly name for the environment where your database server runs when the server name is localhost. Helps differentiate it from other environments.
Database ID	Unique identifier for this database. Used to generate the database ID when the database name isn't sufficiently unique.
JDBC properties	JDBC driver properties to pass through to driver connection.
Max retries	The number of times the system retries a failed API call. Default: 5
Retry delay	The number of seconds to wait between retry attempts for a failed API call. Default: 2 seconds
Disable Model Collection	Skip harvesting machine learning models.
Databricks account ID	The Databricks account ID for Unity Catalog access.
External Workspace URL	The external workspace URL for cross-workspace access.
Enable Governance Metadata Collection	Enable harvesting of governance metadata including privileges, workspace bindings, ABAC policies, row filters, and column masking policies
Workspace ID to URL Mapping	Specify workspace ID to workspace URL mapping. Provide the option multiple times for multiple mappings.
SQL parsing timeout	Timeout in seconds for SQL parsing during lineage collection. Default: 60

Select Save.

Result

The metadata collector is created and appears on the Connectors page with a Configured status. It is now ready to connect to the source system and harvest metadata.

What to do next

After creating the collector, you can perform any of the following tasks:

Run the collector manually to harvest metadata immediately. See Run metadata collectors manually.
Automate metadata collection by scheduling regular collector runs. See Schedule metadata collector runs.
Monitor execution status and troubleshoot issues by viewing the runtime logs. See View runtime logs for collector runs.
Discover and evaluate the harvested data assets in the Data Catalog. See Governing the Data Catalog.