Azure Data Factory metadata collector

  • Release version: Australia
  • Updated April 1, 2026
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Azure Data Factory metadata collector

    The Azure Data Factory (ADF) metadata collector provides ServiceNow customers with read-only access to metadata from an external Azure Data Factory account. It enables the harvesting of detailed metadata including pipelines, datasets, dataflows, linked services, triggers, integration runtimes, and global parameters. This collector also captures lineage information within ADF datasets and between ADF and external data sources such as Snowflake and Databricks.

    Show full answer Show less

    Key Features

    • Comprehensive Metadata Cataloging: Collects detailed attributes for ADF factories, pipelines, activities, linked services, datasets, dataflows, triggers, integration runtimes, and global parameters. For example, it collects factory configuration, pipeline variables, activity policies, linked service connection details (excluding SFTP connection strings), dataset schema, and compute runtime properties.
    • Relationship Mapping: Displays relationships between data assets, such as pipelines containing activities, datasets using linked services, and triggers activating pipelines. This helps visualize dependencies and data flow within ADF.
    • Lineage Tracking: Tracks data lineage by identifying sources and sinks for datasets and tables. It supports lineage extraction for supported data sources like Snowflake, Databricks, PostgreSQL, MySQL, Oracle, Teradata, DB2, and SQL Server, particularly when Copy Activities move data between datasets.
    • Authentication: Uses Azure Service Principal for secure authentication when connecting to Azure Data Factory accounts.

    Practical Use for ServiceNow Customers

    This collector allows customers to integrate ADF metadata into ServiceNow’s data catalog and governance workflows, enabling improved visibility into data pipelines, dependencies, and data lineage. By harvesting detailed metadata and relationships, customers can better manage, audit, and understand their data engineering processes within Azure Data Factory.

    Preparation and Setup

    Before running the collector, customers must prepare their Azure data assets and configure authentication with an Azure Service Principal. After setup, they can create and run the Azure Data Factory metadata collector within ServiceNow to import metadata and lineage information.

    The Azure Data Factory metadata collector provides read-only access to metadata from an external Azure Data Factory account.

    Use this collector to harvest metadata from ADF, including pipelines, datasets, dataflows, linked services, triggers, integration runtimes, and global parameters. It gathers lineage information between ADF datasets and between ADF and external sources such as Snowflake.

    Metadata cataloged

    The Azure Data Factory collector catalogs the following information.

    Table 1. Metadata harvested
    Object Information cataloged
    Factory ID, Name, ETag, Location, Create Time, Provisioning State, Version, Public Network Access, Factory Tags, Repository configuration (Account name, Collaboration Branch, Repository Name, Disable Publish, Root Folder, Host Name, Client ID, Project Name, Last Commit ID, Tenant ID, Repo Configuration Type).
    Pipeline ID, Name, Description, Etag, Concurrency, Folder, Parameters, Metric Policy Duration, Variables
    Pipeline Activity Name, Description, Type, Inactivity Status, State, User Properties, Activity Policy (Retry, Timeout, Retry Interval In Secs, Secure Input, Secure Output)
    Linked Service ID, Name, Description, Type, Etag, Connection String, Domain, Parameters
    Note:
    Harvesting of Connection String for SFTP Linked Services is not supported.
    Dataset ID, Name, Etag, Type, Database, Schema, Table, Folder, Container, File Name, Parameters
    Dataflow ID, Name, Etag, Type, Description, Folder
    Trigger ID, Name, Etag, Type, State, Description, Frequency, Interval, Start time, End time
    Integration Runtime ID, Etag, Name, Type, Description, State Compute Properties (Node Size, Number of Nodes, Max Parallel Execution Per Node, Core Count, Compute Type, Clean up, Number of External Nodes, Number of Pipeline Nodes), SSIS properties ( Catalog Server Endpoint, Catalog Admin Username, Catalog Pricing Tier, License Type, Dual Standby PairName, Edition)
    Global Parameter ID, Name, Value, Type
    ADF Table ID, Name
    ADF Column ID, Name, Type, Precision, Scale
    Pipeline Activity Query

    Relationships between objects

    Catalog pages show relationships between the following data asset types:

    Table 2. Relationships between harvested data asset pages
    Data asset page Relationship
    Factory Contains Global Parameter, Contains Pipeline, Contains Dataset, Contains Dataflow, Contains Trigger, Contains Integration Runtime
    Pipeline Has Tag (also known as Annotation), Contains Activity
    Activity Belongs to Pipeline, Contains Activity, Depends on Activity, uses Linked Service, uses Integration Runtime, uses Dataset
    Linked Service Uses Integration Runtime, Has Tag (also known as Annotation), Connects to database
    Dataset Uses Linked Service, Has Tabular Datasource, Has Tag (also known as Annotation)
    Dataflow Uses Dataflow, Imports Data From Linked Service, Exports Data From Linked Service, Imports Data From Dataset, Exports Data From Dataset, has Tag (also known as Annotation)
    Integration Runtime Uses Integration Runtime, Uses Linked Service
    Trigger Triggers Pipeline, Has Tag (also known as Annotation)

    Lineage for Azure Data Factory

    Collected lineage information:

    Table 3. Lineage availability by object
    Object Lineage available
    Dataset The collector identifies the source or sink of the dataset:
    • when the source/sink is Snowflake, Databricks, PostgreSQL, MySQL, Oracle, Teradata, DB2, and SQLServer.
    • when there is a Copy Activity Run copying data between two datasets.
    ADF table The collector identifies the associated table in an upstream table where the data is sourced from/sinked to.
    ADF column The collector identifies the associated table in an upstream column where the data is sourced from/sinked to.

    Supported data sources for cross-system lineage:

    • Snowflake
    • Databricks

    Authentication types supported

    The Azure Data Factory collector authenticates using Azure Service Principal.