The CreatorCon Call for Content is officially open! Get started here.

Akash Bhardwaj
Tera Explorer

Here’s a breakdown of how we technically implemented a Centralized Outbound API Maintenance module in ServiceNow — the same one that finally ended those 2 AM manual retries and “who re-sent that payload?” conversations.

  1. Architecture Overview

At the core, the design revolves around three major components:

  1. Outbound API Log Table – captures every outbound call made from the platform.
  2. Retry & Recovery Engine – a scheduled job that re-processes failed calls based on retry policies.
  3. Health Check Monitor – a proactive job that halts retries when a downstream system is unavailable.

Together, these provide full visibility, self-healing retries, and audit-ready traceability.

 

  1. Outbound API Log Table

We created a custom table called Integration_API,

Key Fields

Field

Description

u_api_name

Integration Point/API Name

 

u_http_method

GET/POST/PUT/PATCH.

u_endpoint_url

Target endpoint URL.

u_request_payload

The JSON/XML payload sent.

u_response_body

Response returned by the target system.

u_status_code

HTTP code for quick filtering.

u_result_state

Success, Failed, Retrying, Maxed Out, Skipped.

u_retry_count

Number of retry attempts so far.

u_next_retry_time

Timestamp for next scheduled retry.

u_integration_owner

Reference to the team responsible for the integration.

This table acts as a single pane of glass for all outbound API traffic — whether it’s triggered via Scripted REST APIs, IntegrationHub, or Flow Designer actions.

 

  1. Logging from Integrations

Instead of embedding retry logic in every Script Include, we centralized it into a utility script:

var OutboundAPIUtils = Class.create();

OutboundAPIUtils.prototype = {

    initialize: function() {},

 

    logOutboundCall: function(apiName, method, url, payload, response, status) {

        var log = new GlideRecord('integration_api);

        log.initialize();

        log.u_api_name = apiName;

        log.u_http_method = method;

        log.u_endpoint_url = url;

        log.u_request_payload = JSON.stringify(payload);

        log.u_response_body = response ? response.getBody() : '';

        log.u_status_code = response ? response.getStatusCode() : '';

        log.u_result_state = (status == 'success') ? 'Success' : 'Failed';

        log.insert();

    }

};

Every outbound integration simply calls:

new OutboundAPIUtils().logOutboundCall('CRM Incident Sync', 'POST', targetURL, requestBody, response, result);

This ensures every API transaction is captured consistently — with no developer guesswork.

 

  1. Retry & Recovery Engine

Next, we built a Scheduled Script Job that runs every 30 minutes.

Logic Summary:

  1. Query u_outbound_api_log where
    • u_result_state = Failed
    • u_retry_count < Max_Retries
    • u_next_retry_time <= now()
  2. For each record:
    • Attempt to resend using the original payload.
    • Increment u_retry_count.
    • If successful, mark as Success.
    • If not, reschedule next retry with exponential backoff (e.g., 15 → 30 → 60 minutes).
    • If the max retry limit is reached, set to Maxed Out.

Code Snippet (simplified):

var retry = new GlideRecord('u_outbound_api_log');

retry.addQuery('u_result_state', 'Failed');

retry.addQuery('u_retry_count', '<', 3);

retry.query();

 

while (retry.next()) {

    try {

        var response = new sn_ws.RESTMessageV2();

        response.setHttpMethod(retry.u_http_method);

        response.setEndpoint(retry.u_endpoint_url);

        response.setRequestBody(retry.u_request_payload);

        var res = response.execute();

 

        if (res.getStatusCode() == 200) {

            retry.u_result_state = 'Success';

        } else {

            retry.u_retry_count++;

            retry.u_next_retry_time = gs.minutesAgoStart(retry.u_retry_count * 30);

        }

    } catch (ex) {

        retry.u_retry_count++;

    }

    retry.update();

}

This job single-handedly removed dozens of ad-hoc retry scripts across different modules.

 

  1. Health Check Monitor

To prevent “API storms,” a health check job runs every hour.
It pings each unique endpoint from integration_api using a lightweight GET request.

If an endpoint returns consistent failures or timeouts:

  • The job updates a flag in a companion table u_integration_health_monitor.
  • The retry engine then skips retry attempts for that endpoint until it’s marked healthy again.

Admins get an alert that says:

“Retries paused for CRM API – target system unreachable.”

Once the system responds successfully, retries resume automatically.

 

  1. Manual & Bulk Retry UI

We added two UI actions on the log table:

  • Retry Now → reprocess a single failed record immediately.
  • Bulk Retry → reprocess all failed calls for a specific API or time window.

Both use the same utility functions as the scheduler, ensuring consistency between manual and automated runs.

Each manual retry is logged in an Audit Table (u_api_retry_audit) with:

  • Who triggered the retry.
  • Timestamp.
  • Result of the action.

That transparency has been a game-changer for governance and audit reviews.