Azure Data Factory metadata collector
Summarize
Summary of Azure Data Factory metadata collector
The Azure Data Factory (ADF) metadata collector provides ServiceNow customers with read-only access to metadata from an external Azure Data Factory account. It enables the harvesting of detailed metadata including pipelines, datasets, dataflows, linked services, triggers, integration runtimes, and global parameters. This collector also captures lineage information within ADF datasets and between ADF and external data sources such as Snowflake and Databricks.
Show less
Key Features
- Comprehensive Metadata Cataloging: Collects detailed attributes for ADF factories, pipelines, activities, linked services, datasets, dataflows, triggers, integration runtimes, and global parameters. For example, it collects factory configuration, pipeline variables, activity policies, linked service connection details (excluding SFTP connection strings), dataset schema, and compute runtime properties.
- Relationship Mapping: Displays relationships between data assets, such as pipelines containing activities, datasets using linked services, and triggers activating pipelines. This helps visualize dependencies and data flow within ADF.
- Lineage Tracking: Tracks data lineage by identifying sources and sinks for datasets and tables. It supports lineage extraction for supported data sources like Snowflake, Databricks, PostgreSQL, MySQL, Oracle, Teradata, DB2, and SQL Server, particularly when Copy Activities move data between datasets.
- Authentication: Uses Azure Service Principal for secure authentication when connecting to Azure Data Factory accounts.
Practical Use for ServiceNow Customers
This collector allows customers to integrate ADF metadata into ServiceNow’s data catalog and governance workflows, enabling improved visibility into data pipelines, dependencies, and data lineage. By harvesting detailed metadata and relationships, customers can better manage, audit, and understand their data engineering processes within Azure Data Factory.
Preparation and Setup
Before running the collector, customers must prepare their Azure data assets and configure authentication with an Azure Service Principal. After setup, they can create and run the Azure Data Factory metadata collector within ServiceNow to import metadata and lineage information.
The Azure Data Factory metadata collector provides read-only access to metadata from an external Azure Data Factory account.
Use this collector to harvest metadata from ADF, including pipelines, datasets, dataflows, linked services, triggers, integration runtimes, and global parameters. It gathers lineage information between ADF datasets and between ADF and external sources such as Snowflake.
Metadata cataloged
The Azure Data Factory collector catalogs the following information.
| Object | Information cataloged |
|---|---|
| Factory | ID, Name, ETag, Location, Create Time, Provisioning State, Version, Public Network Access, Factory Tags, Repository configuration (Account name, Collaboration Branch, Repository Name, Disable Publish, Root Folder, Host Name, Client ID, Project Name, Last Commit ID, Tenant ID, Repo Configuration Type). |
| Pipeline | ID, Name, Description, Etag, Concurrency, Folder, Parameters, Metric Policy Duration, Variables |
| Pipeline Activity | Name, Description, Type, Inactivity Status, State, User Properties, Activity Policy (Retry, Timeout, Retry Interval In Secs, Secure Input, Secure Output) |
| Linked Service | ID, Name, Description, Type, Etag, Connection String, Domain, Parameters Note: Harvesting of Connection String for SFTP Linked Services is not supported. |
| Dataset | ID, Name, Etag, Type, Database, Schema, Table, Folder, Container, File Name, Parameters |
| Dataflow | ID, Name, Etag, Type, Description, Folder |
| Trigger | ID, Name, Etag, Type, State, Description, Frequency, Interval, Start time, End time |
| Integration Runtime | ID, Etag, Name, Type, Description, State Compute Properties (Node Size, Number of Nodes, Max Parallel Execution Per Node, Core Count, Compute Type, Clean up, Number of External Nodes, Number of Pipeline Nodes), SSIS properties ( Catalog Server Endpoint, Catalog Admin Username, Catalog Pricing Tier, License Type, Dual Standby PairName, Edition) |
| Global Parameter | ID, Name, Value, Type |
| ADF Table | ID, Name |
| ADF Column | ID, Name, Type, Precision, Scale |
| Pipeline Activity | Query |
Relationships between objects
Catalog pages show relationships between the following data asset types:
| Data asset page | Relationship |
|---|---|
| Factory | Contains Global Parameter, Contains Pipeline, Contains Dataset, Contains Dataflow, Contains Trigger, Contains Integration Runtime |
| Pipeline | Has Tag (also known as Annotation), Contains Activity |
| Activity | Belongs to Pipeline, Contains Activity, Depends on Activity, uses Linked Service, uses Integration Runtime, uses Dataset |
| Linked Service | Uses Integration Runtime, Has Tag (also known as Annotation), Connects to database |
| Dataset | Uses Linked Service, Has Tabular Datasource, Has Tag (also known as Annotation) |
| Dataflow | Uses Dataflow, Imports Data From Linked Service, Exports Data From Linked Service, Imports Data From Dataset, Exports Data From Dataset, has Tag (also known as Annotation) |
| Integration Runtime | Uses Integration Runtime, Uses Linked Service |
| Trigger | Triggers Pipeline, Has Tag (also known as Annotation) |
Lineage for Azure Data Factory
Collected lineage information:
| Object | Lineage available |
|---|---|
| Dataset | The collector identifies the source or sink of the dataset:
|
| ADF table | The collector identifies the associated table in an upstream table where the data is sourced from/sinked to. |
| ADF column | The collector identifies the associated table in an upstream column where the data is sourced from/sinked to. |
Supported data sources for cross-system lineage:
- Snowflake
- Databricks
Authentication types supported
The Azure Data Factory collector authenticates using Azure Service Principal.