Concurrent imports
Summarize
Summary of Concurrent imports
Concurrent imports in ServiceNow Yokohama enable splitting large incoming data sets into multiple import sets that are transformed concurrently. This feature helps reduce overall import processing time, particularly for large data sets with complex scripts. It is best used when the order of processing does not matter or when imports can be partitioned to maintain order within subsets.
Show less
Concurrent imports introduce additional processing and monitoring overhead, so they should only be enabled after optimizing other parameters such as database indexes and transformation logic.
Scheduling and Execution
To enable concurrent imports, select the "Concurrent Import" option on the Scheduled Data Import form. When scheduled, data is loaded into a temporary staging table and then transformed into the target table by multiple import sets, up to the limit defined by the system property glide.scheduledimport.max.concurrent.importsets (default is 10). The number of import sets scales with the cluster size.
Each active node runs Import Set Transformer jobs every minute, which concurrently process import sets from the job queue based on available worker threads.
Monitoring and Management
- A Concurrent Import Set record tracks all related import sets, jobs, and transform histories for each concurrent import, allowing you to monitor progress, resume, or reprocess import sets as needed.
- The Concurrent Import Sets Jobs queue manages the processing status and job types for each import set.
Partitioning and Hierarchical Imports
Partitioning allows maintaining processing order within subsets of data by assigning rows with the same partition key to the same import set, which is then processed sequentially. By default, records are allocated in a round-robin manner, but custom scripts can define partition keys.
Hierarchical imports enable scheduling child import sets to run sequentially after parent imports complete. In concurrent imports, the last Import Set Transformer job triggers the next import in the hierarchy using a generated execution plan.
Synchronized Inserts and Coalesce
Concurrent imports use coalesce fields to define record uniqueness. During transformation, if a record with matching coalesce values exists, it is updated; otherwise, a new record is inserted. Write locks prevent multiple import sets from inserting duplicate records simultaneously.
Key Tables Involved
- Concurrent Import Set (sysconcurrentimportset): Stores details of each concurrent import set.
- Concurrent Import Set Jobs (sysconcurrentimportsetjob): Lists import sets pending processing.
- Execution Context for Scheduled import (sysexecutioncontext): Defines the next scheduled import for hierarchical imports.
- Hierarchical scheduled import execution plan (sysexecutionplan): Contains the execution tree of parent and child scheduled imports.
Domain Separation
Domain separation can be enabled for concurrent imports by adding the sysdomain field to the scheduled import table. Both data loading and transformation jobs respect the specified domain context to ensure proper data segregation.
Split incoming data into multiple import sets and transform the import sets concurrently to reduce processing time.
Running a concurrent import can be helpful when order does not matter and imports take a long time due to large data sets with time-consuming scripts. If order matters, you can split the import into multiple partitions to ensure that each partition is processed in order.
Enable concurrent imports only after fine-tuning all other parameters, such as database indexes and transformations.
Scheduling concurrent imports
You enable concurrent imports by selecting Concurrent Import on the Scheduled Data Import form. For instructions, see Schedule a data import.
When the schedule runs a concurrent import, the system pulls the data from databases, Excel spreadsheets, CSV files, or other sources to a temporary staging table, and then transforms the data from the staging table to the target table.
When you run a concurrent import, the system creates multiple import sets, up to the value of the glide.scheduled_import.max.concurrent.import_sets system property (default = 10). For example, a two-node cluster produces four import sets, and a ten-node cluster produces ten import sets.
Import Set Transformer job
Each active node runs two Import Set Transformer jobs every minute, and those jobs poll the Concurrent Import Sets Jobs queue, pick import sets from the queue, and transform those import sets. All jobs run concurrently, depending on the availability of worker threads.
Concurrent Import Set record
Each concurrent import creates a Concurrent Import Set record. The form view shows all related import sets, concurrent import set jobs, and transform histories.
You can resume or reprocess any import set. For more information, see Monitor concurrent import sets.
Concurrent Import Sets Jobs queue
After loading data, the system adds the import sets to the Concurrent Import Sets Jobs table. The Concurrent Import Sets Jobs table indicates the job type and status of each concurrent import set job.
For more information, see Monitor concurrent import set jobs.
Partitioning concurrent imports
You can partition import sets to maintain the processing order within each partition.
By default, the system allocates records to import sets in a round robin fashion. However, you can write a custom script to define a custom partition key that identifies the target import set. Every row with the same partition key adds to the same import set, and the data in that import set is processed in sequential order.
Hierarchical imports
You can create a scheduled import set hierarchy by scheduling an import to run after another import set completes. One parent scheduled import can have many child scheduled imports, and each child scheduled import executes in the order specified. For concurrent scheduled imports, child scheduled imports can be started only after all Import Set Transformer jobs complete.
The last Import Set Transformer job starts the next import in the hierarchy.
The system generates an execution plan at the beginning of parent import process. Each import process uses the execution plan to fetch the next process to invoke. For concurrent imports, the last Import Set Transformer job fetches the next import and executes it.
Synchronized inserts
Coalesce fields help define uniqueness among records. The transformation process checks for an existing record with the coalesce values and updates the existing record, if it exists, or inserts a new record if none exists. For more information, see Updating records using coalesce.
By default, concurrent imports allow each running import set to insert new records. When an import set inserts a record, it establishes a write lock on the target table to prevent other import sets from inserting the same record.
Tables for concurrent imports
| Table | Description |
|---|---|
| Concurrent Import Set (sys_concurrent_import_set) | Stores details of each concurrent import set in import set records. |
| Concurrent Import Set Jobs (sys_concurrent_import_set_job) | Lists the import sets to be processed. |
| Execution Context for Scheduled import (sys_execution_context) | Specifies the execution context for each scheduled import. The execution context specifies the next scheduled import to use when processing a hierarchical scheduled import. |
| Hierarchical scheduled import execution plan (sys_execution_plan). | Stores the execution plan for hierarchical imports. The execution plan is a tree structure that identifies which scheduled import runs after the preceding scheduled input. |
Domain Separation with concurrent imports
You can add the sys_domain field to a scheduled import table to enable domain separation for the import set. Both import set loading and transform jobs run in the domain specified in the scheduled import set job.