From Vibe Code Lounge to Deep Research: ServiceNow and NVIDIA at Knowledge 26

Mark Obee

ServiceNow Knowledge 26 kicks off today in Las Vegas and we are back at it with CreatorCon. One of the exciting flagship experiences this year is our Vibe Code Lounge, sponsored by NVIDIA. This activation highlights ServiceNow’s Build Agent that enables you to create an app in minutes on the expo floor.

I had the opportunity to sit down with Marc Cuevas, Principal Product Lead from NVIDIA to talk about this experience and to discuss some of the other innovations they will be launching at Knowledge 26.

Mark: First off, wanted to share a huge thank you for sponsoring our largest activation to date in CreatorCon. We will have over 130 stations where attendees can build an app using our Build Agent.

What are you most excited about in the Vibe Code Lounge?

Marc: CreatorCon is where ServiceNow's developer community comes to life, and this year developers will be using ServiceNow Build Agent to vibe code apps in real time — which is exactly the kind of hands-on, fast-moving development environment where NVIDIA AI-Q skills show its value. When a developer is building an app that needs to reason across enterprise data — incidents, CMDB relationships, knowledge articles, internal wikis, and external sources — they shouldn't have to build that research layer from scratch. AI-Q gives them a production-ready deep research foundation they can wire in and customize, the same one ServiceNow's own deep research sub-agents are built on, as a part of the Autonomous Workforce. Seeing developers at CreatorCon take that blueprint and run with it is a real proof point that enterprise AI is moving from concept to something developers can ship.

Mark: What are other innovations you will be highlighting at Knowledge 26?

Marc: Beyond AI-Q and the deep research agents, we're excited to showcase the full stack that makes enterprise agentic AI possible. NVIDIA OpenShell is a big part of that story — it's the secure runtime behind Project Arc, ServiceNow's autonomous desktop agent — so enterprises can give agents real access without sacrificing control or governance. We'll also be talking about our collaboration with ServiceNow’s NOWAI-Bench to give developers and IT leaders models and agents grounded on enterprise workflows. Taken together, these innovations represent NVIDIA's commitment to being the trusted infrastructure layer underneath ServiceNow's AI platform — from the models and blueprints developers build with, to the runtime that makes sure those agents are safe, performant, and enterprise-ready.

Mark: Thank you again for the partnership. Our friends at NVIDIA were kind enough to provide a step-by-step guide on how to get started on Deep Research Agents. You can get started today!

Happy Building!

Mark

How to Build Deep Research Agents for Autonomous Workforces with NVIDIA AI-Q and ServiceNow

Introduction

Domain AI specialists are ready to sense, decide, act, and be overseen across the enterprise. But some jobs are investigations, not single actions. They require assembling evidence across configuration management database (CMDB) relationships, change history, knowledge articles, policy, and external signals before any workflow fires.

This is a walk through of how to build a deep research agent that can operate across those boundaries by connecting to custom enterprise data sources, reasoning over them, and producing traceable, governed outputs. While the example focuses on a ServiceNow Autonomous Workforce specialist, the pattern applies to any developer building agents that need to move beyond isolated tools and into real enterprise context. Built on the NVIDIA AI-Q blueprint, the deep research agent uses the best of open and frontier LLMs, is optimized by the NVIDIA NeMo Agent Toolkit, and is monitored with LangSmith. The result: a faster time-to-production for governed, long-running research that lives inside the ServiceNow trust boundary.

NVIDIA runs this pattern in production on its internal AI Factory. AI-Q reasons across ten proprietary sources through security-aware access controls, so the agent only surfaces what each user is already authorized to see. Those sources include SharePoint, Confluence, ServiceNow, NVBugs, and more. The same AI-Q agent that tops both Deep Research Bench I and II.

The NVIDIA AI-Q blueprint and the NeMo Agent Toolkit are both part of the broader NVIDIA Agent Toolkit, a collection of tools, models, and runtimes for building, evaluating, and optimizing safe, long-running autonomous agents.

What you'll build: A deep agent for the autonomous workforce

You will learn:

How to deploy the NVIDIA AI-Q blueprint to power enterprise research agents, with the ServiceNow Autonomous Workforce as a reference implementation
How to configure shallow and deep research agents using a mix of enterprise and frontier models like NVIDIA Nemotron, ServiceNow Apriel, and frontier LLMs
How to monitor agent traces and performance using LangSmith, with ServiceNow AI Control Tower as an example of enterprise-grade observability
How to bring your own enterprise data sources securely into the NVIDIA AI-Q Deep Researcher so they sit alongside the Knowledge, Incident, Change, and CMDB context the Autonomous Workforce already provides, illustrated with a working public example against the NIST National Vulnerability Database

Set up

NVIDIA API Key for access to open models such as Nemotron and Apriel
OpenAI API Key for access to a frontier orchestrator such as GPT-5.2 (optional for an on-prem-only deployment)
Tavily API Key for web search (optional; skip for air-gapped environments)
Python
Docker Compose
Optional: LangSmith for monitoring and experiment tracking
Optional: A ServiceNow developer instance (get one from developer.servicenow.com) plus a service account if you want to wire up the ServiceNow stub later in this post. You do not need this to complete the walkthrough end to end — the public NVD data source gives you a runnable tutorial today.

How to build long-running research specialists on NVIDIA and ServiceNow

Video 1. A walkthrough on building governed, long-running deep-research specialists with NVIDIA AI-Q on the ServiceNow AI Platform.

Install and run the blueprint

Clone the blueprint repository and configure your API keys. Copy the environment template first.

cp deploy/.env.example deploy/.env

Open deploy/.env and fill in the required values.

# Required
NVIDIA_API_KEY=nvapi-...
TAVILY_API_KEY=tvly-...

# Optional: raises NVD rate limits — see "Add a data source" below
NVD_API_KEY=

# Optional: enables trace monitoring (covered later in this post)
LANGSMITH_API_KEY=lsv2-...

# Optional: credentials for any external enterprise system you plug in later
# (see "Add a data source"). The Autonomous Workforce already provides
# native ServiceNow context (Knowledge, Incident, Change, CMDB) to the
# deep researcher, so nothing below is required to run the tutorial.
EXTERNAL_SOURCE_BASE_URL=
EXTERNAL_SOURCE_API_KEY=

The NVIDIA_API_KEY grants access to NVIDIA-hosted models like Nemotron. The TAVILY_API_KEY enables web search. The optional NVD_API_KEY raises rate limits on the public NIST vulnerability database used later in the tutorial. The EXTERNAL_SOURCE_* variables are placeholders for any external enterprise system you bring in later; the Autonomous Workforce already exposes native ServiceNow data to the deep researcher, so they are not required to run the tutorial as written.

Next, build and start the full stack. Starting multiple containers at once means the first build can take a few minutes, based on your internet connection and hardware specs.

docker compose -f deploy/compose/docker-compose.yaml up --build

This launches three services:

aiq-research-assistant: The FastAPI backend on port 8000
postgres: PostgreSQL 16 for async job state and conversation checkpoints
frontend: The Next.js web UI on port 3000

Once all services report healthy, open http://localhost:3000. Figure 1 shows the AI-Q Research Assistant chat interface, where you type a research query and watch the agent work in real time. In production on the Autonomous Workforce, this same API is called by an AI specialist — a Level 1 Service Desk AI Specialist, for example — instead of a human at a browser.

Figure 1. The AI-Q Research Assistant producing a citation-backed report for a ServiceNow research query.

Customize AI-Q for the Autonomous Workforce: Workflow, tracing, and model configuration

Open configs/config_web_docker.yml. This single file controls the LLMs, tools, agents, and workflow configuration.

The llms section declares named models. The enable_thinking flag toggles chain-of-thought reasoning on or off for Nemotron, so you can route fast responses through a non-thinking configuration and reserve thinking mode for the sub-agents that need multi-step reasoning.

llms:
  nemotron_fast:
    _type: nim
    model_name: nvidia/nemotron-3-super-120b-a12b
    temperature: 0.6
    max_tokens: 8192
    chat_template_kwargs:
      enable_thinking: false

  nemotron_thinking:
    _type: nim
    model_name: nvidia/nemotron-3-super-120b-a12b
    temperature: 1.0
    max_tokens: 100000
    chat_template_kwargs:
      enable_thinking: true

  orchestrator:
    _type: openai
    model_name: 'gpt-5.2'

nemotron_fast handles tool-selection and short-form responses where chain-of-thought would only add latency. nemotron_thinking powers the planner and researcher sub-agents that benefit from a wide context window and multi-step reasoning. orchestrator drives the deep research loop and the final synthesis. To keep every inference on-premises and inside the ServiceNow trust boundary, point orchestrator at a self-hosted NIM endpoint and drop the frontier model.

The blueprint ships with both a shallow and a deep research agent. The configuration below wires them up with the built-in web search so you have a working baseline to verify before adding data sources.

functions:

  shallow_research_agent:
    _type: shallow_research_agent
    llm: nemotron_thinking
    tools:
      - web_search_tool
    max_llm_turns: 10
    max_tool_calls: 5

  deep_research_agent:
    _type: deep_research_agent
    orchestrator_llm: orchestrator
    planner_llm: nemotron_thinking
    researcher_llm: nemotron_thinking
    max_loops: 2
    tools:
      - advanced_web_search_tool

The shallow research agent runs a bounded tool-calling loop — up to 10 LLM turns and 5 tool calls — then returns a concise answer with citations. Routine questions a specialist dispatches mid-workflow ("What is our current MFA policy for contractors?") resolve in seconds. The deep research agent uses a LangChain deep agent with a ToDo list, file system, and sub-agents to produce long-form, citation-backed reports that a specialist can turn into a governed change, a root-cause note, or a policy response.

We'll extend this configuration with a working enterprise data source (the public NIST National Vulnerability Database) and a ServiceNow stub in the Add a data source section below.

Monitor the traces

To monitor AI-Q from inside ServiceNow’s AI Control Tower, use the NeMo Agent Toolkit multi-exporter telemetry to fan the same span stream to two backends, LangSmith for developer-side debugging, and ServiceNow AI Control Tower via its OpenLLMetry/Traceloop ingestion path. Add your LANGSMITH_API_KEY and an OTLP endpoint to deploy/.env, then declare both exporters in the config.

general:

  telemetry:

    tracing:

      langsmith:

        _type: langsmith

        project: aiq-servicenow-k26

        api_key: ${LANGSMITH_API_KEY}

      otelcollector:

        _type: otelcollector

        endpoint: ${SNOW_OTLP_ENDPOINT}

        headers:

          Authorization: ${SNOW_OTLP_AUTH}

Each query now produces one trace in two places. LangSmith shows what the agent did, every tool call, model invocation, and sub-agent hand-off, expandable node by node, with latency and token usage charted over time. The AI Control Tower shows what it was allowed to do, the same run attached to the governed AI asset, evaluated against the policies, risk tier, and regulatory content (NIST AI RMF, EU AI Act) registered for that specialist. The two lenses together let a developer reproduce a failure and a governance reviewer audit the same run without a second round of instrumentation.

Figure 2. A LangSmith trace for a shallow research query showing multiple tool calls and a citation-backed answer.

Shallow research sample query (the kind a Level 1 Service Desk AI Specialist dispatches during triage):

What are the known causes of the "SAP portal login timeout" errors reported across incidents in the last 30 days?

Deep research sample query (a cross-source investigation a specialist escalates to):

Analyze the current state of the SAP portal degradation affecting the Finance business unit. Correlate incidents from the last 30 days against changes on the upstream authentication service in the CMDB, identify any open problems against the same service, summarize relevant knowledge articles and vendor advisories from 2025–2026, and produce a root-cause hypothesis with citations suitable for attaching to a change request.

Expand the trace to inspect each node. The tool calls to ServiceNow endpoints are especially useful for debugging — you can see exactly what query the agent sent to the Table API and what records came back. Beyond individual traces, use LangSmith to track latency, token usage, and error rates over time, and set alerts for regressions.

Optimize a deep agent

To tune the deep research agent for an AI specialist's domain, start by examining how it assembles its sub-agents. The deep research agent uses the create_deep_agent factory from LangChain's deepagents library.

from deepagents import create_deep_agent

return create_deep_agent(
    model=self.llm_provider.get(LLMRole.ORCHESTRATOR),
    system_prompt=orchestrator_prompt,
    tools=self.tools,
    subagents=self.subagents,
    middleware=custom_middleware,
    skills=self.skills,
).with_config({"recursion_limit": 1000})

The factory wires together the orchestrator LLM, the tools, and two sub-agents.

self.subagents = [
    {
        "name": "planner-agent",
        "system_prompt": render_prompt_template(
            self._prompts["planner"], tools=self.tools_info,
        ),
        "tools": self.tools,
        "model": self.llm_provider.get(LLMRole.PLANNER),
    },
    {
        "name": "researcher-agent",
        "system_prompt": render_prompt_template(
            self._prompts["researcher"], tools=self.tools_info,
        ),
        "tools": self.tools,
        "model": self.llm_provider.get(LLMRole.RESEARCHER),
    },
]

Context management is central to how deep agents work. The planner agent produces a JSON research plan. The researcher agent receives only this plan — not the orchestrator's thinking tokens or the planner's internal reasoning. By passing only a structured payload, AI-Q reduces token bloat and prevents the "lost in the middle" phenomenon, where LLMs forget critical instructions buried deep in massive context windows. This isolation keeps each sub-agent focused. The following example shows a planner output for an Autonomous Workforce query about a portal degradation:

{
  "report_title": "SAP Portal Degradation: Root-Cause Hypothesis for Finance",
  "report_toc": [
    {
      "id": "1",
      "title": "Incident Pattern in the Last 30 Days",
      "subsections": [
        {"id": "1.1", "title": "Volume and distribution by business unit"},
        {"id": "1.2", "title": "Symptoms and error codes"}
      ]
    },
    {
      "id": "2",
      "title": "Upstream Dependencies and Recent Changes",
      "subsections": [
        {"id": "2.1", "title": "CMDB dependency walk from the SAP portal"},
        {"id": "2.2", "title": "Change records on the authentication service"}
      ]
    },
    {
      "id": "3",
      "title": "Knowledge and Vendor Signals",
      "subsections": [
        {"id": "3.1", "title": "Matching knowledge articles"},
        {"id": "3.2", "title": "Relevant vendor advisories"}
      ]
    }
  ],
  "queries": [
    {
      "id": "q1",
      "query": "Incidents referencing 'SAP portal' or category 'Authentication' opened in the last 30 days ...",
      "target_sections": ["Incident Pattern in the Last 30 Days"],
      "rationale": "Establishes the recent incident footprint before pivoting to upstream causes"
    }
  ]
}

This architecture has been tuned to perform well on both Deep Research Bench and Deep Research Bench II.

To customize the agent for your ServiceNow domain, edit the prompt templates in src/aiq_aira/agents/deep_researcher/prompts/. For example, open planner.j2 and instruct the planner to keep outlines to three sections or fewer for tighter root-cause reports, or to require each section to declare its evidence type (incident, change, CMDB relationship, knowledge article, vendor advisory). You can also add debug logging to inspect intermediate state (like /planner_output.md) to see how your prompt changes affect the context passed between sub-agents.

Add a data source

On the Autonomous Workforce, the ServiceNow AI Platform already supplies the deep researcher with native context from Knowledge, Incident, Change, Problem, and the CMDB, governed by existing ACLs, audited, and registered as AI assets in the ServiceNow AI Control Tower. When you need to add context that is located outside ServiceNow: a vendor advisory feed, a SaaS ticketing backend, an internal observability API, a custom data lake endpoint, an MCP-exposed enterprise tool. This section shows the pattern for bringing those external enterprise sources securely into the AI-Q Deep Researcher.

The blueprint implements every tool as a NeMo Agent Toolkit function, which means "adding a data source" is really "registering a function and referencing it in config." To keep the tutorial end-to-end runnable, the worked example is a business-relevant public API — the NIST National Vulnerability Database (NVD) CVE API. It complements the SAP portal authentication scenario introduced earlier: the deep researcher can pull CVEs that match vendor, product, or keyword terms and fold those findings into the root-cause report alongside the ServiceNow incidents, changes, and CMDB relationships the platform already provides. After the NVD example, a generic "Bring your own enterprise source" subsection shows the same pattern as a clearly labeled stub with TODO comments you adapt to your internal system.

Step 1: Implement the NeMo Agent Toolkit function

The following example calls the public NVD CVE API 2.0 (https://services.nvd.nist.gov/rest/json/cves/2.0) with a keywordSearch parameter. No authentication is required for light usage; an optional NVD_API_KEY raises your rate limit.

# sources/nvd_cve/src/register.py
import httpx
from pydantic import Field, SecretStr
from nat.builder.builder import Builder
from nat.builder.function_info import FunctionInfo
from nat.cli.register_workflow import register_function
from nat.data_models.function import FunctionBaseConfig


class NvdCveConfig(FunctionBaseConfig, name="nvd_cve"):
    """Search the NIST National Vulnerability Database for CVEs matching a keyword."""
    api_url: str = Field(default="https://services.nvd.nist.gov/rest/json/cves/2.0")
    api_key: SecretStr | None = Field(default=None, description="Optional NVD API key")
    max_results: int = Field(default=10)
    timeout_seconds: float = Field(default=30.0)


@register_function(config_type=NvdCveConfig)
async def nvd_cve(config: NvdCveConfig, builder: Builder):
    async def search(query: str) -> str:
        """Search the public NIST NVD for CVEs whose descriptions match a natural-language keyword. Returns CVE ID, severity, a short description, and a reference URL for each match."""
        params = {"keywordSearch": query, "resultsPerPage": config.max_results}
        headers = {}
        if config.api_key is not None:
            headers["apiKey"] = config.api_key.get_secret_value()

        async with httpx.AsyncClient(timeout=config.timeout_seconds) as client:
            resp = await client.get(config.api_url, params=params, headers=headers)
            resp.raise_for_status()
            payload = resp.json()

        lines: list[str] = []
        for item in payload.get("vulnerabilities", []):
            cve = item.get("cve", {})
            cve_id = cve.get("id", "UNKNOWN")
            descriptions = cve.get("descriptions", [])
            summary = next((d.get("value", "") for d in descriptions if d.get("lang") == "en"), "")
            metrics = cve.get("metrics", {})
            severity = "UNKNOWN"
            for key in ("cvssMetricV31", "cvssMetricV30", "cvssMetricV2"):
                if key in metrics and metrics[key]:
                    severity = metrics[key][0].get("cvssData", {}).get("baseSeverity", severity)
                    break
            url = f"https://nvd.nist.gov/vuln/detail/{cve_id}"
            lines.append(f"- {cve_id} ({severity}): {summary.strip()} — {url}")

        if not lines:
            return f"No CVEs found for query: {query}"
        return "\n".join(lines)

    yield FunctionInfo.from_fn(search, description=search.__doc__)

NeMo Agent Toolkit validates the config fields at startup, so a bad URL or missing field fails fast rather than at runtime. The agent uses the function's docstring to decide when to call the tool, so write docstrings the way you want the planner to reason about the tool, and cite CVEs by ID and link, not just keyword.

Step 2: Reference the tool in the config

Declare the new tool under functions, then add it to each agent's tools list:

functions:

nvd_cve_tool:
    _type: nvd_cve
    api_key: ${NVD_API_KEY}   # optional — raises rate limit
    max_results: 10

  shallow_research_agent:
    _type: shallow_research_agent
    llm: nemotron_thinking
    tools:
      - nvd_cve_tool
      - web_search_tool

  deep_research_agent:
    _type: deep_research_agent
    orchestrator_llm: orchestrator
    planner_llm: nemotron_thinking
    researcher_llm: nemotron_thinking
    tools:
      - nvd_cve_tool
      - advanced_web_search_tool

You don't need to change any agent code. The agent discovers the new tool's name and description automatically, and the LLM calls it when a query matches.

Bring your own enterprise source

Use the same shape, a FunctionBaseConfig plus a registered async function, for any external system you want to expose to the deep researcher. The snippet below is a stub for a generic enterprise search endpoint (think: your SIEM, your internal vendor-advisory portal, a procurement catalog, an observability API). It intentionally raises NotImplementedError so you cannot accidentally ship it unverified. The TODO comments call out the decisions you need to make on your system (endpoint, auth, rate limits, least-privilege scope, and result shape) before the tool is production-ready.

 sources/enterprise_source/src/register.py
# TODO: Verify this tool against your target enterprise system before enabling it.
#       Confirm the search endpoint, authentication model, rate limits, and the
#       least-privilege scope your agents should be allowed to read.
from pydantic import Field, SecretStr
from nat.builder.builder import Builder
from nat.builder.function_info import FunctionInfo
from nat.cli.register_workflow import register_function
from nat.data_models.function import FunctionBaseConfig


class EnterpriseSourceConfig(FunctionBaseConfig, name="enterprise_source"):
    """STUB: Search tool for an external enterprise system."""
    base_url: str = Field(description="Base URL of the enterprise system's search API")
    api_key: SecretStr = Field(description="API key or bearer token for the service account")
    max_results: int = Field(default=10)


@register_function(config_type=EnterpriseSourceConfig)
async def enterprise_source(config: EnterpriseSourceConfig, builder: Builder):
    async def search(query: str) -> str:
        """STUB: Search an external enterprise system and return a stable ID, title, short description, and reference URL for each match."""
        # TODO: Implement the HTTP call against your system's search endpoint.
        #       - Choose an auth model (bearer, OAuth 2.0 client credentials, mTLS).
        #       - Page results and respect rate limits.
        #       - Enforce least-privilege: the service account should see only what
        #         the agent is allowed to cite, and nothing more.
        #       - Return a stable record ID plus a link so the planner can cite
        #         each hit, not just summarize it.
        raise NotImplementedError(
            "Wire up the call to your enterprise search API and remove this guard "
            "once you have validated auth, scope, rate limits, and result shape."
        )

    yield FunctionInfo.from_fn(search, description=search.__doc__)

A few notes on bringing external sources in securely:

Let ServiceNow own what it already owns. Don't write a Table API wrapper for Knowledge, Incident, Change, Problem, or CMDB. The Autonomous Workforce already exposes that context to the deep researcher and keeps it under ACLs, audit, and governance you already own. Your external tool lives next to that native context, not on top of it.
Prefer MCP where it fits. For systems that already expose a Model Context Protocol server, point the agents at the MCP server instead of hand-rolling a function per endpoint.
Register every new tool as a governed AI asset. The ServiceNow AI Control Tower will inventory, risk-assess, and monitor the tool against NIST AI RMF and EU AI Act content, so the specialist's reach grows in step with its governance.

Going further

By extending and building on the NVIDIA AI-Q blueprint, ServiceNow developers can bring a best-in-class deep-research capability to the Autonomous Workforce. To go further, review:

The NVIDIA AI-Q blueprint customization guide for adding more data sources
The Helm chart for deploying on a NVIDIA AI Factory
The NVIDIA AI-Q blueprint evaluation guide for doing evaluation-driven development
LangSmith for monitoring the system in production and preventing performance drift
The ServiceNow AI Control Tower documentation for registering deep-research agents as governed AI assets and aligning them with NIST AI RMF and EU AI Act content
The Autonomous Workforce product page for the full picture of AI specialists built on the ServiceNow AI Platform

The NVIDIA AI-Q blueprint is being integrated across the ecosystem by ServiceNow and partners including Aible, Amdocs, Cloudera, Cohesity, Dell, Distyl, H2O.ai, HPE, IBM, JFrog, LangChain, and VAST.