Consuming AVRO Files into ServiceNow

AshT

Executive Summary

ServiceNow does not natively support AVRO file ingestion. Import Sets accept CSV, Excel, XML, and JSON only. However, the problem should not be framed as “how do I get AVRO into ServiceNow?” but rather “where should format transformation occur in the enterprise data architecture?” The answer, for long-term success, is to push format conversion upstream — ideally into a data lake like Databricks — and have ServiceNow consume clean, transformed data through platform-aligned channels.

Strategic recommendation: Use Databricks as the intermediary. The publishing team loads AVRO into Databricks (which has built-in AVRO support), transforms and stores it as Delta Lake tables, and ServiceNow consumes via Zero Copy connectors through Workflow Data Fabric or via Reverse ETL to the Import Set/Table API.
Why now: ServiceNow and Databricks announced a Zero Copy partnership (October 2024) enabling bi-directional, high-bandwidth, secure integration via Delta Sharing. Workflow Data Fabric, launched in Zurich, connects to Databricks, Snowflake, Oracle, and other platforms without duplicating data.
The principle: Never force ServiceNow to parse binary formats. Separation of concerns dictates that data engineering (AVRO parsing, schema evolution, quality checks) belongs in the data platform, while ServiceNow focuses on workflow, ITSM, and operational action.
Fallback patterns: For organizations without Databricks, this guide covers six additional patterns including Kafka streaming, MID Server Java processing, iPaaS middleware, and batch pre-conversion.

The Fundamental Architecture Question

When an architect encounters “how do I get Format X into System Y?” the instinct is often to build a custom parser inside System Y. But this creates tight coupling, maintenance burden, and forces a workflow platform to behave like a data engineering tool.

The better question is: “Who should own the format transformation, and where should it happen?” In a well-designed enterprise data architecture, the answer follows three principles:

Producer responsibility: The team publishing AVRO files owns the data contract. They should be responsible for making that data available in formats consumable by downstream systems, or for loading it into a shared data platform.
Data platform as intermediary: A data lake (Databricks, Snowflake) or data warehouse serves as the canonical transformation layer. It ingests any format, applies quality rules, and exposes clean data via APIs, Delta Sharing, or reverse ETL.
Consumer simplicity: ServiceNow should consume data through its strongest channels: REST APIs, Import Sets, Kafka streaming (Stream Connect), or Zero Copy connectors. It should never need to understand AVRO binary encoding.

This separation yields long-term benefits: schema evolution is handled in Databricks (not ServiceNow), data quality rules are centralized, multiple consumers (not just ServiceNow) can access the same cleaned dataset, and the publishing team’s format choices don’t cascade into downstream integration rework.

Recommended: Databricks as the Intermediary

Complexity: Low-Medium (Infra already exists) | Best for: Strategic long-term architecture, any volume, any cadence

This is the architecturally cleanest pattern and aligns with ServiceNow’s own strategic direction. Databricks has built-in AVRO support through Apache Spark’s spark-avro module (built into Databricks Runtime since version 5.0), making AVRO ingestion a single line of code:

df = spark.read.format("avro").load("/path/to/files/*.avro")

df.write.format("delta").saveAsTable("catalog.schema.my_table")

Once data lands in Delta Lake, ServiceNow can consume it through multiple channels:

Option A: Zero Copy via Workflow Data Fabric (Strategic)

ServiceNow’s Workflow Data Fabric connects enterprise data sources without duplicating or moving data. Zero Copy connectors to Databricks, Snowflake, Google BigQuery, Oracle, Amazon Redshift, and others deliver secure, real-time access. The October 2024 partnership announcement confirmed that Databricks’ Delta Sharing enables ServiceNow to offer bi-directional, high-bandwidth integration via the Databricks Data Intelligence Platform.

In this model, Databricks publishes a Delta Sharing share containing the transformed tables. ServiceNow’s Workflow Data Fabric accesses this data in real-time to fuel AI agents, workflows, and analytics — without any data copying. This is the forward-looking pattern that ServiceNow is investing in, and it eliminates the Import Set bottleneck entirely for read scenarios.

Key advantages: No data duplication, real-time access, centralized governance in Databricks Unity Catalog, bi-directional capability via RaptorDB, and alignment with ServiceNow’s Zurich release roadmap.

Limitation: Zero Copy integration availability may vary by release. Confirm your entitlement and version compatibility. Best suited for read/query scenarios; for bulk record creation in ServiceNow tables, Option B is needed.

Option B: Reverse ETL from Databricks to ServiceNow (Operational)

When data must be physically written into ServiceNow tables (creating incidents, populating CMDB, updating user records), a Reverse ETL pattern pushes data from Delta Lake into ServiceNow via the Table API or Import Set API.

The pipeline is: AVRO files → Databricks (spark.read.format(“avro”)) → Delta Lake table → Databricks job converts to JSON → REST calls to ServiceNow Import Set API (/api/now/import/{stagingTable}) → Transform Map → target table.

Databricks supports Change Data Feed (CDF) on Delta tables, enabling incremental sync: only new/changed rows are pushed to ServiceNow on each run. This avoids full-table reloads and respects ServiceNow’s REST API concurrency limits (~16 active + 150 queued transactions per node).

Key advantages: Data quality and transformation in Databricks, incremental sync via CDF, scheduling via Databricks Workflows or Airflow, and the publishing team only needs to land AVRO in a known location.

Option C: Kafka Bridge — Databricks + Kafka + ServiceNow Stream Connect

For real-time streaming needs, Databricks can read AVRO files or Kafka streams, transform them, and republish to Kafka topics as JSON (or AVRO with schema). ServiceNow Stream Connect then consumes these topics natively via Kafka Message Flow Triggers, ETL Consumers, or Transform Map Consumers.

This pattern is ideal when the publishing team is already writing AVRO to Kafka and you need sub-minute latency into ServiceNow. The Confluent Delta Lake Sink Connector or Databricks structured streaming writes to Delta, while a separate producer publishes transformed data to a ServiceNow-consumable Kafka topic.

Alternative Patterns (When Databricks Is Not Available)

Not every organization has Databricks. The following patterns address different infrastructure realities, ranked by long-term viability:

Pattern	Complexity	Latency	Volume Ceiling	Key Dependency
External Pre-Conversion	Low	Batch	~100K records	Python/Java CLI, cron/CI
MID Server + Java JAR	Medium-Low	Batch	~1M records	MID Server, Java dev
Confluent Kafka Sink	Medium	Real-time	Unlimited	Confluent Platform/Cloud
Stream Connect (Native)	Medium	Near-RT	150–300 msg/s/thread	Automation Engine license
iPaaS (MuleSoft/Informatica)	Medium-High	Batch/RT	Unlimited	iPaaS license + expertise
Parallel Import Engine	High	Batch	4M+ records in 3hrs	Deep SN platform knowledge

1. External Script Pre-Conversion

Convert AVRO to JSON/CSV outside ServiceNow using Apache Avro Tools CLI, Python fastavro, or the avroconvert utility. Upload via Import Set REST API. Automate with cron or CI/CD pipeline. Simplest approach but no real-time capability and requires external orchestration.

2. MID Server with Custom Java JAR

Deploy the Apache AVRO Java library on the MID Server. Build a fat JAR with Maven Assembly Plugin, upload via MID Server > JAR Files (syncs to extlib/). Trigger via JavascriptProbe through the ECC Queue. The MID Server converts AVRO to JSON and pushes back via Import Set API. Caveat: no out-of-box file import from MID Server; the MID must call back to the instance REST API.

3. Confluent ServiceNow Sink Connector

For Kafka-based AVRO streams, the Confluent Sink Connector reads AVRO messages, deserializes via Schema Registry (using io.confluent.connect.avro.AvroConverter), and writes directly to ServiceNow tables. Supports AVRO, JSON Schema, and Protobuf inputs with at-least-once delivery, dead-letter queues, and mTLS. Set "input.data.format": "AVRO" in connector config. This is the most production-ready path for streaming AVRO if you’re in a Confluent ecosystem.

4. ServiceNow Stream Connect

ServiceNow’s native Kafka integration supports AVRO schemas through its Schema Management module (importable from Confluent Registry or created locally). Consumer types include Kafka Message Flow Trigger, ETL Consumer, Transform Map Consumer, and Script Consumer. Throughput: 150–300 msg/s/thread, max 2 MB message size, 30 topics/instance. Important caveat: Stream Connect’s AVRO support operates at schema-definition level, not full binary wire-format deserialization. Community reports indicate data corruption risks with third-party Kafka AVRO messages.

5. iPaaS Middleware (MuleSoft / Informatica)

MuleSoft’s DataWeave 2.2+ natively reads AVRO (application/avro MIME type) and the certified ServiceNow Connector v6.18+ handles all CRUD operations — the cleanest single-platform solution for file-based AVRO. Informatica offers native AVRO across Kafka/S3/GCS connectors with documented batch-size tuning (200 → 10,000 yields 2.5x read improvement). Azure Data Factory has AVRO support but its ServiceNow connector is read-only.

6. High-Performance Parallel Import

For 1M+ records, a documented Community case study achieved 4 million records in ~3 hours. The technique caches data as JSON in the sysevent table, distributes work via sys_trigger with System ID = ALL NODES, and processes in parallel across all application nodes using Script Actions. Critical tuning: glide.import_set_load_batch_size, 1:1 field name matching to skip transform overhead, and deferring complex calculations to post-import processing.

Platform Considerations

Import Set format support: CSV, CSV (tab), Excel (.xls/.xlsx), XML, JSON, Custom (Parse by Script), Custom (Load by Script). No binary format support. API payload limit: 10 MB default (25 MB max). Best practice: 100K records per Import Set.

Attachment whitelist: The glide.attachment.extensions property must include .avro if any pattern requires uploading raw AVRO files to ServiceNow. It is not in the default allowed list.

AVRO complex types: Nested records, arrays, maps, and unions have no direct equivalent in ServiceNow’s flat table structure. Flatten during conversion or store as JSON strings in ServiceNow string fields.

Schema evolution: AVRO’s schema evolution (fields added/removed over time) is handled natively by Databricks and Kafka Schema Registry. For batch file patterns, build schema-version-aware conversion logic.

Upgrade safety: Custom Java JARs on MID Server may require revalidation during ServiceNow upgrades. Use scoped applications for all custom ServiceNow-side logic to maintain deployment independence.

Workflow Data Fabric availability: Zero Copy connectors to Databricks, Snowflake, BigQuery, Oracle, Redshift, Cloudera, SQL Server, and Teradata are being expanded with each release. Confirm your entitlement via the Workflow Data Fabric Hub in the ServiceNow Store (free, requires custom table entitlement).

Decision Framework

Organization has Databricks: Use Databricks intermediary (recommended). AVRO publishers load into Databricks, ServiceNow consumes via Zero Copy (Workflow Data Fabric) for read scenarios or Reverse ETL for record creation. This aligns with ServiceNow’s strategic direction.
Organization has Kafka with AVRO: Use Confluent Sink Connector (Pattern 3) for highest reliability, or ServiceNow Stream Connect (Pattern 4) for native workflow integration. Consider Databricks as a Kafka consumer that also provides analytics.
Organization has MuleSoft/Informatica: Use iPaaS Pattern 5. MuleSoft’s DataWeave provides the cleanest single-step AVRO-to-ServiceNow transformation.
No data platform, batch files only: Pattern 1 (external pre-conversion) for quick wins under 100K records. Pattern 2 (MID Server) for on-premise sources up to 1M records.
Millions of records with SLAs: Combine any upstream conversion with Pattern 6 parallel import techniques.

Conclusion

The architecturally sound answer to “how do I consume AVRO in ServiceNow?” is: don’t. Push format conversion to where it belongs — the data platform — and let ServiceNow consume clean data through its strongest channels.

Databricks as an intermediary is the recommended strategic pattern because it natively handles AVRO, provides Delta Lake for reliable storage and incremental processing, supports schema evolution, and is now officially partnered with ServiceNow for Zero Copy bi-directional integration via Workflow Data Fabric. The publishing team loads AVRO into Databricks, and ServiceNow consumes the result — no custom parsers, no binary format gymnastics, no tight coupling.

For organizations not yet on Databricks, the Confluent Kafka Sink Connector and MuleSoft iPaaS patterns offer strong production-grade alternatives. The key principle remains the same: AVRO deserialization must always occur outside ServiceNow’s core engine, and the integration architecture should be designed so that ServiceNow never needs to know or care that the original data was in AVRO format.

Sources

ServiceNow Press Room: ServiceNow and Databricks Announce Zero Copy Partnership (October 2024)
ServiceNow Newsroom: Workflow Data Fabric Ecosystem for AI Agents (Knowledge 2025, May 2025)
Databricks Docs: ServiceNow Connector; AVRO File Data Source
ServiceNow Community: Stream Connect Implementation Tips; AVRO with Stream Connect Producer
Confluent Docs: ServiceNow Sink Connector for Confluent Cloud/Platform
ServiceNow Community: Import 4 million Records in 3 Hours; Importing & Transforming External Data
ServiceNow Docs: Workflow Data Fabric Hub (Zurich); Stream Connect for Apache Kafka

Consuming AVRO Files into ServiceNow

Executive Summary

The Fundamental Architecture Question

Recommended: Databricks as the Intermediary

Option A: Zero Copy via Workflow Data Fabric (Strategic)

Option B: Reverse ETL from Databricks to ServiceNow (Operational)

Option C: Kafka Bridge — Databricks + Kafka + ServiceNow Stream Connect

Alternative Patterns (When Databricks Is Not Available)

1. External Script Pre-Conversion

2. MID Server with Custom Java JAR

3. Confluent ServiceNow Sink Connector

4. ServiceNow Stream Connect

5. iPaaS Middleware (MuleSoft / Informatica)

6. High-Performance Parallel Import

Platform Considerations

Decision Framework

Conclusion

Sources

Employee Directory for Alumni Users?

Consuming AVRO Files into ServiceNow