AI Control Tower and Agentic AI in BSS Transformations

Alex_D · ‎06-09-2026

This Q&A captures a technical exchange between ServiceNow and a telecommunications operators undertaking a large-scale AI integrations. The discussion covers AI capability selection, integration architecture, governance requirements, and agentic orchestration patterns.

Created by Alexander Dmytriiev, Iryna Shyshkova, Jacqueline Eggenschwiler

1. Capability Scope & Decision Framework

Q: How does ServiceNow structure its AI capabilities across different levels of autonomy, and where does each tier deliver the strongest business results in complex operational environments?

ServiceNow's AI stack for BSS transformations runs across three distinct layers. The most important distinction is between the two agentic tiers - AI Specialists (job-level autonomy) and AI Agents / Agent Studio (task-level autonomy) - followed by the GenAI assistance layer (ServiceNow Otto) and the classic ML layer (Predictive Intelligence). Each operates at a different level of autonomy and requires a different deployment decision.

Tier 1 - AI Specialists & Autonomous Workforce

AI Specialists are domain-specific digital workers that own and execute an entire job function end-to-end - not individual tasks. The key architectural distinction: an AI Specialist directs teams of AI Agents and tools to resolve issues autonomously within its discipline, with fallback to human escalation in ambiguous or high-risk cases.

What makes them distinct from general AI Agents:

They inherit 20+ years of platform operational intelligence and deep enterprise business context by default.
All permission sets, authorizations, and policies from the AI Platform are applied to AI Specialists automatically - no separate configuration step.
Each Specialist knows what to work on, when to escalate to an L2 manager, and which AI Agents to direct for sub-tasks.
They are coachable over time - compounding value as they learn from your specific environment.

Currently available Specialists: L1 ServiceDesk Specialist (ITSM). Additional Specialists across IT, HR, Customer Service, and other domains are on the near-term roadmap as the Autonomous Workforce expands.

Strong outcome use cases for BSS:

Zero-touch L1 triage and resolution for network service incidents (target: zero incidents requiring human first-touch)
Autonomous order fallout identification and remediation - Specialist coordinates SOMT agents and Integration Hub actions end-to-end
Customer complaint intake-to-resolution without handoff, with human escalation gated by configurable confidence threshold

Tier 2 - AI Agents / AI Agent Studio

AI Agents operate at the task level. They are the execution layer that AI Specialists orchestrate - and can also run independently for well-defined, bounded workflows. AI Agent Studio is the build, test, evaluation and deployment environment.

Agent Studio: no-code/low-code environment for building, configuring, testing, and deploying agents; version control included; role-based access control on all agent definitions.
Agent tooling: each agent is configurable with RAG (retrieval-augmented generation), CRUD operations against Now Platform tables, Scripted REST calls, Flow Actions, GenAI skills, and MCP client connections to external tools.
Identity model: agents execute under a dedicated 'AI user' identity or a 'dynamic user' model that preserves the delegating user's identity in the audit trail - this is the mechanism that satisfies dedicated IAM identity requirements for in-platform agents.
Prebuilt agent library: growing catalog across CRM, SOM (SOMT), TSM, ITSM, and other workflows - these are OOTB starting points that can be extended in Agent Studio.
Hybrid workflow composition: agentic playbooks blend deterministic flow steps with generative reasoning steps; human-in-the-loop approval gates are configurable at any step.
MCP client support: agents can consume external MCP servers - governed through AI Gateway - without leaving the AI Platform orchestration boundary.
MCP server exposure: expose GenAI skills, Flow Actions, Knowledge Graphs, AI agents, playbooks, CRUD operations, and catalog items (H2’26) as MCP server tools for consumption by external agents.

Decision rule - AI Agent vs. ServiceNow Otto Skill Kit: use an AI Agent when the use case is dynamic, has ambiguous inputs or variable resolution paths, and requires reasoning. Use a custom skill (ServiceNow Otto Skill Kit) when you want to inject GenAI into a specific step of an otherwise deterministic flow and need direct control over LLM provider, output format, and deployment vector.

Strong outcome use cases for BSS:

Autonomous order fallout triage: agent identifies fallout reason, queries SOMT and external OSS via MCP, proposes or executes remediation, and raises approval request for financial adjustments.
Intent-driven complaint resolution: agent reads customer intent from free text, retrieves entitlements and order history, escalates with full context or resolves autonomously.
Proactive customer-impacting incident handling: agent correlates TSOM network event with affected service inventory (TNI), drafts outbound customer notification, and routes work order to FSM.
Quote configuration acceleration: agent interprets eligibility rules, validates CPQ configuration, and flags policy violations before a human commits the quote.

Tier 3 - GenAI Skills

GenAI Skills is the generative AI assistance layer - it operates within deterministic workflows and user interfaces to reduce effort and accelerate throughput. It does not reason or plan autonomously; it generates, summarizes, and recommends within a defined step.

ITSM/TSM/HRSD: record and case summarization, resolution note generation, work-note drafting, agent-to-agent and virtual-to-agent transfer summarization, similar record surfacing.
Customer service (TSM): reply drafting for chat and email, case summarization for complex multi-touch interactions, knowledge article generation from resolved cases.
Chat options: GenAI-powered conversational experience replacing scripted topic trees; handles ambiguous intents and multi-turn resolution flows.
Creator / Build Agent: natural language to flows, UIs, and configurations; ATF test coverage auto-generated; works across ITSM, HRSD, CSM, and custom apps. Replaces what previously required professional services for basic app scaffolding.
GenAI Skill Kit: framework for building custom GenAI skills beyond OOTB. Supports custom LLM provider selection (including third-party models via GenAI Controller), output format control, and deployment to Virtual Agent or inline workflow steps.
Analytics: adoption and usage visibility - tracks which skills are being used, by whom, and with what resolution rate. Feeds the adoption metric conversation with AI skeptics.

Strong outcome use cases for BSS:

AHT reduction in customer care: summarization and recommended reply drafting cuts average handle time measurably; documented at 30–40% reduction in wrap-up effort in comparable deployments.
Self-service deflection: GenAI Virtual Agent handles ambiguous service requests - order status, eligibility questions, complaint intake - without topic scripting.
Knowledge at scale: resolved TSM and ITSM cases auto-generate knowledge articles, closing the loop between resolution and deflection.
Developer velocity: Build Agent for Creator reduces time-to-first-working-flow from days to hours for common BSS workflow patterns.

Tier 4 - Predictive Intelligence (Classic ML)

Predictive Intelligence is the platform's mature machine learning layer - distinct from both GenAI and agentic AI. It runs classification, similarity, clustering, and regression models trained on each customer's own historical data, managed through the Predictive Intelligence Workbench.

Key differentiator: models are trained per-customer-instance on your own historical records - not a shared or pre-trained model. This means accuracy improves with your data and reflects your actual routing logic, not a generic baseline.

Auto-categorization and routing: ML model predicts category, assignment group, and priority from incident or case content at creation time - reducing manual triage and misroutes.
Similarity: surfaces similar resolved records to the agent working a new ticket, accelerating resolution without requiring knowledge article creation.
Clustering: groups open tickets by semantic similarity for major incident detection and bulk resolution.
Regression: predicts numeric outcomes (e.g., estimated resolution time, SLA breach probability) for proactive management.
Workbench: no professional services required for initial configuration; solution templates with pre-selected tables and fields; can be trialed on sub-production instance before commit.
Group Action Framework (GAF) integration: Predictive Intelligence can provide optimized clustering as a fallback within GAF, combining ML clustering with AI Search when no direct match is found.

Strong outcome use cases for BSS:

Incident auto-categorization in network operations: eliminates manual triage for high-volume, repetitive fault types (power, access, hardware) - documented at 97% accuracy in comparable deployments.
Order case routing: predicts correct assignment group for order fallout cases at creation, reducing wrong-queue assignments and improving first-touch resolution rate.
SLA breach prediction: regression model flags tickets at risk before breach occurs, enabling proactive intervention - particularly relevant for telco SLA obligations to enterprise customers.
Major incident clustering: groups alerts by semantic similarity in TSOM, reducing alert storm noise during network events and accelerating root cause identification.

Q: What types of workflows are best suited to ServiceNow’s native AI capabilities, and under what circumstances does it make architectural sense to delegate orchestration responsibility to an external layer?

The default answer is: ServiceNow orchestrates, AI Control Tower governs, and external systems participate as consumed services. It is the lowest-complexity, lowest-governance-overhead architecture for any BSS transformation where Now Platform is the system of record for order, service, customer, and operations workflows.

The reason: orchestration that sits co-located with the consequential write-path produces one accountability boundary, one audit trail, one HITL enforcement point, and a deterministic write-path. When orchestration drifts away from the system that owns the record and absorbs the risk, you pay for that distance in audit reconstruction, identity proxying, and weaker guarantees that the right thing happened.

ServiceNow as primary orchestrator covers the following BSS domains today:

Quote-to-order with eligibility and policy automation - CPQ + SOMT orchestrates; SOMT decomposition is purpose-built to own this boundary end-to-end.
Order fallout auto-remediation - SOMT agents identify, triage, and remediate fallout; external OSS systems are consumed via MCP or Integration Hub, not the orchestrator.
Intent-driven self-service and complaint resolution - TSM orchestrates the full customer journey; billing, network, and IT participate as data and action providers.
Proactive customer-impacting incident handling - TSOM/TNI orchestrates event-to-resolution; Now creates incidents, routes work orders via FSM, and triggers customer notifications.
Field dispatch with skills routing and ETA prediction - FSM orchestrates; external systems (scheduling tools, inventory) consumed as downstream actions.

AI Control Tower (AICT) is the governance plane above all of this - one place to register every agent (Now-native and external), set and enforce policy, monitor execution, and produce audit evidence. For BSS transformations operating under regulatory frameworks such as the EU AI Act, AICT is not optional, but a mechanism that turns agent policy documents into enforced, auditable rules.

External systems do not need to be external orchestrators. Retrieval-only enrichment from OSS, billing, or legacy systems is handled via MCP client connections or Integration Hub from within a Now-orchestrated flow. A single downstream API call does not require handing orchestration out. The scope of what ServiceNow can orchestrate without breaking the single-boundary model is broader than most programs assume at the outset.

When to hand orchestration to an external layer - these are genuine exceptions, not the default:

The flow requires an authoritative write to a system of record that lives entirely outside Now Platform within the same transaction - and that system cannot be reached as a consumed service.
Two or more peer agents from different platform domains must negotiate state with each other in real time, not just exchange data - creating a genuine need for a neutral coordination layer above both.
Data residency or sovereignty requirements mandate that specific processing happen outside Now Platform's cloud boundary.
Enterprise policy explicitly requires vendor-portable orchestration logic that cannot be locked to any single platform.

In each of these cases, the recommended pattern is a clean domain boundary with event-driven handoff. ServiceNow orchestrates its domain fully; the external orchestrator owns its domain fully; they exchange at the boundary via A2A or publish/subscribe. Split governance across two overlapping control planes is the pattern to avoid.

2. Integration Patterns & Orchestration Architecture

Q: How should the relationship between ServiceNow’s agentic layer and an enterprise-wide orchestration platform be designed, and what architectural patterns introduce governance or operational risk?

The starting point is accountability, not integration topology. The right architecture puts orchestration as close as possible to the action that bears the consequence - the system that owns the record and absorbs the risk of the write. That principle resolves most architecture disputes before a line of integration code is written.

Recommended architecture - ServiceNow + AI Control Tower as orchestrator within domain boundaries:

AI Platform orchestrates all agentic flows. AI Control Tower (AICT) is the single governance, observability, and policy enforcement plane - not a dashboard, but the architecture itself. Every agent, ServiceNow-native or external, is registered, scoped, and traceable through AICT before it goes live. External capabilities - OSS tools, billing systems, third-party AI models - are consumed via MCP connections or Integration Hub from within ServiceNow-orchestrated flows; they do not need to become co-orchestrators.

What this delivers:

One IAM boundary: agent identities, delegating user identities, and execution records all live in one control plane.
One HITL enforcement point: approval routing and hard limits are defined once in AICT, enforced consistently across every agent regardless of origin.
One audit trail: no reconstruction across two systems; every agent action produces a structured trace in a single location, exportable to SIEM.
Deterministic write-path: the agent making the decision is co-located with the system absorbing the consequence - eliminating identity proxying and write-path uncertainty.
Lowest integration complexity: external systems participate as consumed services, not peer orchestrators requiring their own governance alignment.

When a domain boundary is genuinely required - fallback architecture:

If an external system owns a domain where it is the authoritative system of record and cannot function as a consumed service (e.g., a network management platform with its own agent layer that must retain orchestration authority over element-level operations), the right pattern is clean domain separation with event-driven handoff - not a shared orchestration layer. AI Platform orchestrates its BSS domain fully; the external system orchestrates its domain fully; they exchange at the boundary via A2A or publish/subscribe events. Each side runs its own governance plane, with audit and identity signals from both exported to a common SIEM.

What to avoid: a shared or split orchestration layer where neither platform is fully in control of any flow. This is the highest-overhead architecture - it requires composite audit reconstruction across two control planes, introduces identity proxying risk at every cross-boundary write, and weakens the human-in-the-loop guarantee because approval routing must be coordinated between two enforcement points. If a flow requires the orchestrator to change mid-execution, the boundary is in the wrong place - split the flow, not the orchestrator.

Q: When agentic workflows span multiple operational domains with different systems of record, how should orchestration ownership be determined and where should domain boundaries be drawn?

The boundary sits where the authoritative lifecycle owner of the dominant artifact lives - not where the most agents reside or the broadest tooling exists. "Who owns the artifact's lifecycle and SLA" resolves most boundary disputes.

Order management: Order management (quote → order → service → activation):
Now Platform (CPQ + SOMT) orchestrates; SOMT decomposition is purpose-built for this boundary.
Customer journey: Customer journey (care, complaint, dispute, retention):
Now Platform (TSM) orchestrates; billing, network, and IT participate via Integration Hub, A2A, or MCP.
Network operations: Network operations (fault and performance):
NMS/EMS or external orchestrator leads; Now participates for incident creation, work order dispatch, and customer notification.
Billing: Billing and revenue events:
Billing platform leads; ServiceNow Otto participates for inquiry, dispute, and adjustment workflows.

General rule: if the orchestrator changes mid-flow, the boundary is wrong - split into two flows with an event-driven handoff, each with one clear orchestrator.

Q: How does ServiceNow support open agent interoperability protocols such as MCP and A2A, and what patterns should guide their use in a multi-agent enterprise architecture?

The modern agent stack has settled into two layers, and ServiceNow participates in both: MCP for agent-to-tool integration (the vertical layer) and A2A for agent-to-agent coordination (the horizontal, peer-delegation layer). ServiceNow can act as MCP client, MCP server, or A2A peer.

MCP - as a consumer of external MCP servers: a structured way to onboard, approve, and govern those connections via AI Gateway within AI Control Tower.

MCP - as a provider: flexibility to create servers and choose which tools are exposed, including GenAI skills, AI Agents, flow actions, playbooks, CRUD operations, conversational catalog items, and Search-as-a-tool (H2’26).

A2A: ServiceNow AI Agents can act as both primary and secondary agents. AI Agent Studio provides a structured way to onboard external AI agents by registering their agent cards. ServiceNow-native agents (prebuilt or custom) can be made discoverable externally via A2A.

ServiceNow also supports a "build anywhere, run natively" model for agent creation. Developers working in external environments - including Claude Code, Cursor, VS Code, or any third-party AI coding tool - can author agents and workflows using the ServiceNow SDK and Build Agent Skills directly in their IDE of choice. The resulting agents land on the AI Platform with the same governance, security, and policy enforcement as agents built natively in ServiceNow Studio: both paths produce governed output that registers in AI Control Tower and runs under the same ACL, role, and audit infrastructure. This matters for A2A architecture because it collapses a common objection - that external agent ecosystems are inherently separate from ServiceNow's control plane. An agent authored in Claude Code is not an "external agent calling in via A2A"; it is a ServiceNow-native agent that happened to be written in an external IDE. A2A is then reserved for genuinely heterogeneous agents from peer platforms (Microsoft Copilot agents, Google ADK agents, etc.) that need to delegate to ServiceNow agents at the domain boundary - not for bridging a build-time choice.

Recommended patterns:

Use MCP for passive capability and tool access; use A2A for delegating to a peer that owns its own reasoning, state, and lifecycle.
Distinguish build-time choice from runtime architecture: agents authored in external environments - Claude Code, Cursor, VS Code, or any third-party AI coding tool - using the ServiceNow SDK and Build Agent Skills land on the Now Platform as fully native agents. They inherit the same ACLs, role model, governance, and AI Control Tower registration as agents built inside ServiceNow Studio. An agent built in Claude Code is not an external A2A peer; it is a ServiceNow-native agent with an external build origin. A2A is for genuinely heterogeneous agents from peer platforms that need to delegate across a domain boundary - not for bridging a build-time tooling decision.
Do not allow any external agent to bypass ServiceNow ACLs with direct table or raw API writes - keep authorization enforcement close to the execution system.
Scope agent identities tightly at configuration time - assign only the roles the agent needs to execute its designated tasks. The AI-user and dynamic-user identity models enforce this at the platform level; the agent cannot exceed the permissions of the identity it runs under.
Keep data authoritative in the systems that own it - do not copy it up into an orchestration tier.

3. Governance Guardrails

Q: How does ServiceNow ensure that each AI agent operates under a distinct, traceable identity, and that actions taken on behalf of a user remain attributable throughout the audit trail?

For agents built and run within ServiceNow, this is covered natively. Every agent action is tied to the delegating user's identity or the dedicated "AI user" identity (selected at configuration time) and captured in the audit trail out of the box. AI Control Tower provides a meta-level view - showing which agents are active, what permissions they carry, and under whose authority.

For agents outside ServiceNow interacting with the platform, identity provisioning happens in the enterprise IAM layer. AICT surfaces and monitors those connections; an integration with an external IAM governance tool can close the enforcement loop.

Design-time controls (via AI Agent Studio): system prompts, allowed skills and tools, version control. Each agent has a dedicated platform identity. Data access is bound by standard Now Platform ACLs and business rules, and the agent operates as a constrained user - role, scope, and ACL design constrain the agent the same way they constrain a human user.

Q: What mechanisms does ServiceNow provide for cataloguing and governing AI agents and MCP connections before they are permitted to operate in a production environment, including any requirements for external catalog registration?

AI Control Tower is the natural home for this. It already tracks active MCP connections and agent activity, making it a practical AI Gateway - the control point through which agents must pass before going live. For organizations that also require ServiceNow agents to be registered in an external enterprise AI catalog, ServiceNow supports this via A2A agent cards. Enabling the discoverable flag on any agent in AI Agent Studio publishes a standard agent card at /.well-known/agent_json - the mechanism external orchestrators and catalog systems use to discover and invoke the agent (for security reasons invocation and discovery uses different endpoints). Federated token authentication is supported for A2A between a secondary IdP and a ServiceNow ID - each user needs a corresponding ServiceNow ID that respects roles and ACLs. AICT remains the internal registration and governance plane; the agent card is the external advertisement.

Q: What observability signals does ServiceNow generate for agentic AI activity, and how can those signals be made available to external monitoring and security platforms?

AICT is the native observability plane for agent actions. It captures execution data at three levels: session (overall outcome, duration, token consumption, aggregate quality and safety scores), trace (individual reasoning turns within a session, including latency and span count), and span (granular step-level execution detail). Quality metrics - action completion, tool selection quality, task completeness, tool calling quality - are evaluated via LLM-as-judge using Galileo for third-party agents and NASK Auto Eval for ServiceNow-native agents. Safety metrics - tone, toxicity - apply to external agents. Both quality and safety scores trend over time and surface lowest-performing systems for admin investigation.

For downstream export of agentic signals specifically, the Traceloop integration extends trace collection depth.

Q: How does ServiceNow handle human oversight and hard enforcement boundaries for high-consequence agent actions, and what controls exist that agent logic cannot override?

For in-platform agents, this is core to what AICT does as the design principle. High-impact actions route to human approval, and platform-level guardrails cannot be overridden by agent logic.

Hard limits that cannot be overridden by agent logic (architectural):

ACLs sit below the agent and evaluate on every operation - agent reasoning cannot bypass them.
Business rules enforce server-side validation independently of the caller.
Approval workflows are platform mechanisms - the agent issues a request, the platform decides; there is no path for an agent to self-approve.
Agent deviation detection flags behavioral drift from the agent's defined baseline before output is acted upon - a correctness guardrail enforced at the platform level, not configurable by the agent itself.
Output screening scans agent-generated output for PII leakage and vulnerabilities before it reaches the end user or any downstream system.

Pre-deployment gate: Before any agent reaches production, Agentic Evaluations runs automated quality checks against three core metrics: overall task completeness, tool choice accuracy, and tool calling correctness. The system produces a deployment readiness verdict - "Ready for deployment" or "Deploy with close monitoring" - and requires human review before sign-off. This is a formal HITL gate at design time, separate from runtime controls.

Runtime controls via AI Control Tower:

Per-agent kill switch (immediate suspension, H2’26)
Usage metering and cost budgets
Real-time execution telemetry
Centralized agent inventory across the instance
Approval workflow as part of the intake-to-deploy lifecycle y

High-impact action handling: combine (a) pre-deployment evaluation gate with human sign-off, (b) declared approval routing at design-time, (c) ACL, business-rule, deviation detection, and output screening as hard platform prohibitions, and (d) AICT real-time monitoring with audit signals. Irreversible, financial, or customer-affecting actions can require human approval before execution - or be blocked entirely at the platform level regardless of agent intent.

For out-of-platform agents, the cleanest path is routing external agent traffic through AICT as a gateway so policies fire before execution.