Concurrent imports

  • Release version: Yokohama
  • Updated January 30, 2025
  • 3 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Concurrent imports

    Concurrent imports in ServiceNow Yokohama enable splitting large incoming data sets into multiple import sets that are transformed concurrently. This feature helps reduce overall import processing time, particularly for large data sets with complex scripts. It is best used when the order of processing does not matter or when imports can be partitioned to maintain order within subsets.

    Show full answer Show less

    Concurrent imports introduce additional processing and monitoring overhead, so they should only be enabled after optimizing other parameters such as database indexes and transformation logic.

    Scheduling and Execution

    To enable concurrent imports, select the "Concurrent Import" option on the Scheduled Data Import form. When scheduled, data is loaded into a temporary staging table and then transformed into the target table by multiple import sets, up to the limit defined by the system property glide.scheduledimport.max.concurrent.importsets (default is 10). The number of import sets scales with the cluster size.

    Each active node runs Import Set Transformer jobs every minute, which concurrently process import sets from the job queue based on available worker threads.

    Monitoring and Management

    • A Concurrent Import Set record tracks all related import sets, jobs, and transform histories for each concurrent import, allowing you to monitor progress, resume, or reprocess import sets as needed.
    • The Concurrent Import Sets Jobs queue manages the processing status and job types for each import set.

    Partitioning and Hierarchical Imports

    Partitioning allows maintaining processing order within subsets of data by assigning rows with the same partition key to the same import set, which is then processed sequentially. By default, records are allocated in a round-robin manner, but custom scripts can define partition keys.

    Hierarchical imports enable scheduling child import sets to run sequentially after parent imports complete. In concurrent imports, the last Import Set Transformer job triggers the next import in the hierarchy using a generated execution plan.

    Synchronized Inserts and Coalesce

    Concurrent imports use coalesce fields to define record uniqueness. During transformation, if a record with matching coalesce values exists, it is updated; otherwise, a new record is inserted. Write locks prevent multiple import sets from inserting duplicate records simultaneously.

    Key Tables Involved

    • Concurrent Import Set (sysconcurrentimportset): Stores details of each concurrent import set.
    • Concurrent Import Set Jobs (sysconcurrentimportsetjob): Lists import sets pending processing.
    • Execution Context for Scheduled import (sysexecutioncontext): Defines the next scheduled import for hierarchical imports.
    • Hierarchical scheduled import execution plan (sysexecutionplan): Contains the execution tree of parent and child scheduled imports.

    Domain Separation

    Domain separation can be enabled for concurrent imports by adding the sysdomain field to the scheduled import table. Both data loading and transformation jobs respect the specified domain context to ensure proper data segregation.

    Split incoming data into multiple import sets and transform the import sets concurrently to reduce processing time.

    Running a concurrent import can be helpful when order does not matter and imports take a long time due to large data sets with time-consuming scripts. If order matters, you can split the import into multiple partitions to ensure that each partition is processed in order.

    Note:
    Concurrent imports add processing and monitoring overhead. Use them only with large data sets.

    Enable concurrent imports only after fine-tuning all other parameters, such as database indexes and transformations.

    Scheduling concurrent imports

    You enable concurrent imports by selecting Concurrent Import on the Scheduled Data Import form. For instructions, see Schedule a data import.

    When the schedule runs a concurrent import, the system pulls the data from databases, Excel spreadsheets, CSV files, or other sources to a temporary staging table, and then transforms the data from the staging table to the target table.

    When you run a concurrent import, the system creates multiple import sets, up to the value of the glide.scheduled_import.max.concurrent.import_sets system property (default = 10). For example, a two-node cluster produces four import sets, and a ten-node cluster produces ten import sets.

    Import Set Transformer job

    Each active node runs two Import Set Transformer jobs every minute, and those jobs poll the Concurrent Import Sets Jobs queue, pick import sets from the queue, and transform those import sets. All jobs run concurrently, depending on the availability of worker threads.

    Concurrent Import Set record

    Each concurrent import creates a Concurrent Import Set record. The form view shows all related import sets, concurrent import set jobs, and transform histories.

    You can resume or reprocess any import set. For more information, see Monitor concurrent import sets.

    Concurrent Import Sets Jobs queue

    After loading data, the system adds the import sets to the Concurrent Import Sets Jobs table. The Concurrent Import Sets Jobs table indicates the job type and status of each concurrent import set job.

    For more information, see Monitor concurrent import set jobs.

    Partitioning concurrent imports

    You can partition import sets to maintain the processing order within each partition.

    By default, the system allocates records to import sets in a round robin fashion. However, you can write a custom script to define a custom partition key that identifies the target import set. Every row with the same partition key adds to the same import set, and the data in that import set is processed in sequential order.

    Hierarchical imports

    You can create a scheduled import set hierarchy by scheduling an import to run after another import set completes. One parent scheduled import can have many child scheduled imports, and each child scheduled import executes in the order specified. For concurrent scheduled imports, child scheduled imports can be started only after all Import Set Transformer jobs complete.

    The last Import Set Transformer job starts the next import in the hierarchy.

    The system generates an execution plan at the beginning of parent import process. Each import process uses the execution plan to fetch the next process to invoke. For concurrent imports, the last Import Set Transformer job fetches the next import and executes it.

    Synchronized inserts

    Coalesce fields help define uniqueness among records. The transformation process checks for an existing record with the coalesce values and updates the existing record, if it exists, or inserts a new record if none exists. For more information, see Updating records using coalesce.

    By default, concurrent imports allow each running import set to insert new records. When an import set inserts a record, it establishes a write lock on the target table to prevent other import sets from inserting the same record.

    Tables for concurrent imports

    Table Description
    Concurrent Import Set (sys_concurrent_import_set) Stores details of each concurrent import set in import set records.
    Concurrent Import Set Jobs (sys_concurrent_import_set_job) Lists the import sets to be processed.
    Execution Context for Scheduled import (sys_execution_context) Specifies the execution context for each scheduled import. The execution context specifies the next scheduled import to use when processing a hierarchical scheduled import.
    Hierarchical scheduled import execution plan (sys_execution_plan). Stores the execution plan for hierarchical imports. The execution plan is a tree structure that identifies which scheduled import runs after the preceding scheduled input.

    Domain Separation with concurrent imports

    You can add the sys_domain field to a scheduled import table to enable domain separation for the import set. Both import set loading and transform jobs run in the domain specified in the scheduled import set job.