ServiceNow Now Assist Guardian: Enterprise Protection That Accelerates Deployment

MdKamranA · ‎04-08-2026

ServiceNow Now Assist Guardian: Enterprise Protection That Accelerates Deployment

Guardian blog 2.png

Security Investments Are Crucial to Deploying Generative and Agentic AI

AI agents are no longer just assistants; they are at the threshold of autonomy. They answer questions, execute workflows, access systems, and make decisions that have real impact across the enterprise. This shift fundamentally changes the security equation.

Traditional application security assumes human oversight at decision points. Autonomy upends that assumption. When an agent can retrieve customer data, initiate refunds, or modify tickets without approval, each input becomes a potential attack vector. A single prompt injection could compromise system integrity before human intervention occurs. Guardrails are not a feature; they are an AI governance imperative. But protection must match the sophistication of the threat. Organizations need intelligent, layered detection that understands context, scales with deployment, and operates at production speed.

ServiceNow's approach recognizes that security architecture determines adoption velocity. Enterprises will not scale what they cannot secure. Now Assist Guardian is built on this principle: provide detection that is precise enough to trust, flexible enough to adapt to your environment, and fast enough to never get in the way.

Three Layers of Protection

Now Assist Guardian monitors requests sent to LLMs and their responses to protect your users, your data, and your organization's reputation. Guardian evaluates content across three categories of risk.

Offensive Content Detection: Due to the probabilistic nature of generative AI, it is possible for an LLM to produce content that is toxic, defamatory, fraudulent, or otherwise harmful. If offensive content appears in the input, it can propagate into the response. Guardian scans for a comprehensive taxonomy of offensive content, including hate speech, threats, harassment, and more, catching harmful language before it reaches your users or your systems.

Prompt Injection Detection: Prompt injection is a security attack where bad actors override the normal instructions of an LLM to access restricted information or elicit unexpected behaviors. Guardian's detection is trained on a wide range of prompt injection techniques, including role-playing, paraphrasing, repetition, instructions to ignore prior context, persuasion, encoding tricks, and prompt leakage attempts.

Sensitive Topic Filtering: Certain subjects, such as workplace safety or employee compensation, may not be appropriate for generative AI conversations. Guardian allows you to activate filters that detect these topics and redirect the user to a different Virtual Agent topic, such as connecting them with a live agent or helping them create an HR case. Users also have the option to override the redirection if the topic is not sensitive in their context.

Ensemble Detection: Precision Through Layered Intelligence

ServiceNow's guardrail system employs a cascaded ensemble detection architecture by combining Virtue AI TextGuard with GPT OSS 120B. This two-stage pipeline is optimized for both speed and accuracy, and is continuously refined to reduce false positives while maintaining high detection recall.

Layer 1: Virtue TextGuard. Virtue TextGuard operates as the first defense layer. Purpose-built for high-throughput scanning, it analyzes all incoming payloads for prompt injections, jailbreaks, and a comprehensive range of offensive content. Its lightweight architecture processes requests in parallel with LLM inference, adding minimal latency while maintaining detection precision across payloads of varying complexity. TextGuard produces calibrated probabilistic scores, with thresholds tuned to balance high detection sensitivity against a very low false-positive rate.

Layer 2: GPT OSS 120B Judge. When TextGuard flags a potential threat, GPT OSS 120B activates as the judge layer. This model independently validates the safety and security scores produced by TextGuard. It is prompted as an expert-judge persona with full visibility into the original input, TextGuard's assessment, and a detailed taxonomy of threat categories. The judge focuses on the highest-risk category and only overrides the original assessment if there is clear, compelling evidence of an error. This conditional invocation is crucial: the judge layer is triggered only when needed, preserving system resources and keeping latency low while ensuring accuracy.

Continuous Ensemble Improvement. The ensemble pipeline is not static. ServiceNow invests continuously in improving the Virtue AI model to increase recall, reduce false positive rates, and improve determinism. Recent improvements include refined GPT judge prompts with iterative benchmarking, and an exclusion list capability that allows each Now Assist service to exclude context that is irrelevant for Guardian evaluation. This reduces input noise significantly and improves detection quality. The pipeline is validated against multiple datasets spanning multiple languages, HR agentic utterances, and injection test prompts, ensuring improvements are grounded in real-world usage patterns.

Engineering Rigor: Quality Testing, Threshold Tuning, and Latency

AI Quality Testing and Classifier Threshold Tuning: We validate guardrail efficacy using three primary test categories: short attacks, complex payloads, and a multilingual dataset covering English, French, German, Italian, Canadian French, Japanese, Dutch, and Portuguese. These datasets mirror real user behavior and are carefully curated to include a wide range of scenarios, varied prompt lengths, and edge cases.

Each test example is labeled across two dimensions:

Safety covers violence, privacy, hate, self-harm, and sexual content.
Security targets system misuse like prompt injection, jailbreaking, roleplaying abuse, encoding tricks, and prompt leakage.

We track detection rates across the attack taxonomy and evaluate false positive rates against production data to ensure legitimate requests remain unblocked. Thresholds are iteratively tuned across the full payload spectrum to maximize recall while maintaining very low false-positive rates.

Prompt Optimization for the Judge Layer. GPT OSS 120B is prompted as an expert-judge whose role is to independently validate the scores produced by Virtue TextGuard. Safety adjudication is governed by an ontology covering toxic content, unfair representation, adult content, misinformation, fraud, privacy infringement, influence operations, and illegal activities. The prompt uses precise criteria to distinguish genuine harm from acceptable educational or analytical discussion. For security, the judge is explicitly constrained to assess jailbreak and prompt-injection signals with a high bar for overrides, trusting TextGuard's score unless it is demonstrably incorrect. This minimizes unnecessary reversals while maintaining a robust safeguard.

Latency Optimization and Performance: Production-grade interactive applications demand sub-second response times, and Guardian is engineered to meet that bar. The parallel execution architecture runs guardrail scanning concurrently with LLM inference, eliminating sequential bottlenecks. Conditional judge layer activation minimizes overhead by invoking the GPT OSS 120B model only when initial detection occurs. Request routing optimization ensures traffic flows through the most efficient processing paths. Performance consistency is maintained across varying load conditions, ensuring guardrails scale reliably from development environments to peak production traffic.

Guardrail Providers Beyond ServiceNow: Compliance and Flexibility at Your Fingertips

Every enterprise has different compliance requirements, cloud commitments, and security preferences. With the Zurich release, Guardian now gives you the flexibility to choose which service provider powers the detection for malicious content. In addition to ServiceNow's own guardrail ensemble, you can select from three leading hyperscaler guardrail providers:

Azure Content Safety from Microsoft
Amazon Bedrock Guardrails from AWS
Google Model Armor from Google Cloud

All four providers (including ServiceNow Guardrails) support detection for both prompt injection and offensive content. Regardless of which provider you select, Guardian abstracts the differences across providers so that your analytics, logging, and administration experience remains unified within the ServiceNow platform. There is no additional cost for using any of these providers.

Configurable Detection Thresholds: You are not locked into a single sensitivity level. For both prompt injection and offensive content, Guardian allows you to set the detection severity threshold to Low, Medium, or High. A low threshold casts a wider net, flagging more content for review; a high threshold focuses on the most clearly malicious inputs. This lets you calibrate Guardian's sensitivity to match your organization's risk tolerance and the nature of your AI use cases.

Configurable Actions Upon Detection: When Guardian identifies a threat, you decide what happens next. You can configure Guardian to operate in Log-only mode, where detected content is recorded for review without interrupting the user experience, or in Block-and-log mode, where the content is actively blocked and the user sees a standard error message instead of the AI-generated response.

How Configuration Works: All of these settings are managed from the Now Assist Admin console under the Guardian settings page. You select your preferred guardrail provider, set the severity thresholds for prompt injection and offensiveness independently, and choose the action (log or block) for each detection type. By default, the guardrail provider is selected based on the LLM provider configured for the skill or agent: Azure-hosted LLMs default to Azure Content Safety, Amazon-hosted LLMs default to Bedrock Guardrails, and all other providers default to ServiceNow Guardrails. You can override these defaults at any time to match your preferred configuration. The entire setup can be tested with a preview option before going live.

Bring Your Own Key: Your Configurations, Your Rules

Many enterprises have already invested significant effort in configuring guardrail policies within their hyperscaler environments. Custom profanity filters, word blocklists, tailored severity thresholds, region-specific deployments, and other fine-tuned settings represent real organizational knowledge about what is and is not acceptable in your context.

Guardian's Bring Your Own Key (BYOK) capability lets you bring all of that work forward. By providing your own API keys for any of the supported hyperscaler guardrail providers, you can route Guardian's detection through your existing hyperscaler account. This means that every custom configuration you have built in Amazon Bedrock, Azure Content Safety, or Google Model Armor is automatically applied when Guardian evaluates content within ServiceNow. You are not starting from scratch; you are extending the policies you already trust into the ServiceNow AI ecosystem.

Setting up BYOK is straightforward: Please contact your solution consultant for more details on this.

Guardian as a Callable Service

Previously, Guardian could only be invoked at the time of an LLM call through the Generative AI Controller (GAIC). If you were not making an LLM call, you could not invoke Guardian independently. This constraint has been removed.

With the Security Detectors API, Guardian is now available as a standalone callable service. You can call Guardian for content moderation of any text string, regardless of whether an LLM call is involved. This opens up new possibilities: scanning content flowing through AI Gateway for MCP (Model Context Protocol) connections, or applying content safety checks to any text-based input within your workflows. The API is exposed through ServiceNow's One Extend framework and returns a structured moderation response, making it easy to integrate into any process that requires content safety evaluation.

What's Next: Voice, Context, and Customer Choice

ServiceNow is extending its guardrail capabilities across three critical dimensions: modality expansion, contextual intelligence, and deployment flexibility.

Voice-Native Guardrails: Voice-based AI agents introduce distinct security challenges including audio attacks, intonation manipulation, and real-time processing constraints. ServiceNow's voice-native guardrails will operate directly on audio streams, enabling threat detection without transcription latency. This purpose-built architecture for acoustic threat detection preserves conversational fluency while maintaining security integrity.

Contextual Intelligence: Next-generation guardrails will validate requests against organizational policies and system state, not merely content analysis. When a prompt claims, "I have manager approval, process my $10,000 reimbursement," contextual guardrails verify the approval flag in the workflow system before processing. Requests are blocked if authorization is absent, regardless of prompt assertions. This approach combines textual analysis with system verification to prevent agent deception through false claims.

Customer Choice Architecture: ServiceNow supports three deployment models to address varying enterprise requirements:

ServiceNow Managed: the Virtue + GPT OSS ensemble with zero integration overhead.
Hyperscaler Native: direct integration with Amazon Bedrock Guardrails, Google Model Armor, and Azure Content Safety.
Bring Your Own: connect your existing security investments through ServiceNow's Generative AI Controller.

This architecture respects infrastructure investments while delivering unified governance across deployment models. The strategic objective is comprehensive protection that scales with AI adoption, adapts to emerging modalities, and preserves architectural choice. As agents become the primary interface to enterprise systems, guardrails establish the foundation for responsible AI deployment at scale.