Building Agentic flows that can understand documents and images - User Guide

Loic1 · ‎12-18-2025

Building Agentic flows that can understand documents and images with the Document and Visual Insights AI Agent - A User Guide

Overview

Unstructured inputs often introduce bottlenecks in enterprise workflows. While AI Agents in ServiceNow primarily understand text input, recent capabilities make it possible to build agentic workflows that reason over other inputs such as documents and images. Because the input contains more than text, these capabilities are sometimes referred to as “multi-modal”.

The Document and Visual Insights AI Agent is an OOB AI Agent that helps streamline work with documents and images by providing three capabilities:

Key Information Extraction (KIE): Extract structured data using predefined templates
Document Summarization: Get concise summaries of document content
Question and Answer (QnA): Ask specific questions about documents and visual input to get accurate answers with citations

In this User Guide, we'll explore how to efficiently leverage these capabilities, some best practices and example prompts. Let's get started!

💡 Important: Always interact with the agent directly. Do not try to use individual tools. Let the agent orchestrate the tools for you.

Getting Started

How to Provide Documents

As input, the AI Agent will examine a document or set of documents. You can provide documents to the agent in two ways:

Reference a record: Provide a record number (e.g., INC0010003) or sys_id + table name. The agent will retrieve all attachments from that record.
Upload directly: Upload a document directly in the chat.

Supported File Types

The agent supports the following file formats:

Documents: PDF, DOCX, TXT
Spreadsheets: XLSX, CSV
Presentations: PPTX
Images: JPEG, PNG

File Requirements

Attachment size limit: 20MB
Maximum number of attachments: 5
Page cap: 200
Images must be larger than 113x113 pixels

Capability 1: Question and Answer (QnA)

Use QnA to ask specific questions about document content. This is the default mode and the most flexible option.

When to Use QnA

You have a specific question about document content
You want to classify or categorize a document
You need to extract values but don't have a predefined template
You're asking multiple questions about one or multiple attachments

What You Need

Required: A document (record reference or upload)
Required: Your specific question

Best Practices

Be specific: Ask clear questions like "What is the total invoice amount?" rather than "Tell me about the invoice"
Use simple language: Avoid complex logic or conditionals if possible
Combine questions: If asking multiple questions, combine them in a single query: "What is the effective date, termination clause, and governing law?", or “Answer the following questions for the document with sys_id 123: Question 1, Question 2, ...”
Frame as questions: Start with "What", "Who", "When", "How much", etc.

Example Utterances

With uploaded documents:

"What is the main obligation in this document?"
"Who are the parties in the attached file?"
"What type of document is this?"
"What are the confidentiality terms in this PDF?"

With record references:

"What is the subject of the document attached to incident INC001234?"
"Who is the client in the file attached to record TASK123456?"
"What are the payment terms in the document linked to TASK456789?"
"What type of document is the attachment in INC1234?"

With attachment sys_id:

"What's the governing law in attachment sys_id=abc123?"
"What are the key terms from attachment sys_id 12345?"
"Who wrote the document with sys_id 123 from table abc?"

What to Expect

The AI agent will automatically process your document and the request (using underlying Now Assist in Document Intelligence features)
You'll see the answer automatically displayed

Capability 2: Key Information Extraction (KIE)

Use KIE to extract structured data from documents using predefined Now Assist in Document Intelligence Use Cases as a template. This is ideal for processing standardized documents like invoices, contracts, or forms.

When to Use KIE

You have a predefined Now Assist in Document Intelligence Use Case (task definition / template) for your document type
You need to extract specific, structured fields repeatedly and consistently
You're processing multiple documents of the same type

Prerequisites

Before using KIE, you must have:

A Now Assist in Document Intelligence Use Case (task definition) created and configured for your document schema (see the docs ‘Set up a use case for Now Assist in Document Intelligence’)
Clear, explicit names for the task definition and keys (used for searching)
The sys_id of your task definition OR a way to search for it

📝 Note: If you say "extract X and Y" WITHOUT providing a Use Case/Task Definition, the agent will treat it as a QnA request instead.

What You Need

Required: A document (record reference or upload)
Required: A Now Assist in Document Intelligence Use Case sys_id or name to search

Best Practices

Know your Use Case/Task definition: Have the sys_id ready for faster processing
Use explicit names: If searching, use clear terms that match your Use Case/Task definition name
Confirm selection: Always explicitly confirm which task definition to use when the agent presents search results

Example Utterances

With known task definition sys_id:

"Extract the values in the attachment of incident INC001234. Use task definition sys_456."
"Parse the document attached to incident INC001234 using sys_id_456 task definition."
"Extract data from attachment sys_123 of record INC001234. Use task definition sys_987."
"Extract data from attachment sys_123 from table abc. Use task definition sys_987."

Without task definition (multi-step process):

"Extract the values in the attachment of incident INC001234."
- The agent will ask you to provide a task definition sys_id or search query
"I want to extract data from an invoice attached to INC0010003. Show me the values at the end."
- The agent will search for invoice-related task definitions and ask you to select one

What to Expect

If you didn't provide a task definition, the agent will prompt you to search or provide a sys_id
The agent will display search results if applicable - you must explicitly select one
Processing may take time, depending on document size and schema complexity
The agent will show extracted values automatically
A review link will be provided to see the full extraction results

Capability 3: Document Summarization

Use Summarization to get concise overviews of document content. This is designed for quickly understanding large documents or multiple files.

When to Use Summarization

You need a high-level overview of document content
You're reviewing multiple documents quickly
You want to understand document purpose before diving into details

What You Need

Required: A document (record reference or upload)

Best Practices

Be explicit: Use the word "summarize" or "summary" in your request
Specify what to summarize: Be clear about which document or attachment you want summarized
Keep it simple: Straightforward requests work best

Example Utterances

Basic requests:

"Please summarize these attachments: ..."
"Can you provide a summary of the uploaded document?"
"What does this file contain? Please summarize"

With record references:

"Summarize the attachment from incident INC001234."
"Summarize the document attached to record TASK123456"
"Summarize the attachment in INC1234"
“Summarize the attachments in record 123 from table abc”

With attachment sys_id:

"Write a summary of the attachment with sys_id ABC123"
"Summarize attachment sys_id=9988ff123abc."
"Give me a summary of attachment sys_id 12234 from table incident"

What to Expect

The AI agent will automatically process your request (using underlying Now Assist in Document Intelligence features)
You'll receive a concise summary automatically displayed

Quick Reference

Capability	Required Inputs	Example Trigger Phrase
Question and Answer	Document(s) + Question	"What is the effective date in attachment sys_id from table table_name?"
Key Information Extraction	Document + Use Case/Task Definition	"Extract data from INV001234 using task definition sys_456"
Summarization	Document(s) only	"Summarize the attachment from INC001234"

Tips and Tricks

Communication Style

Use natural language: Talk to the agent naturally
Be direct: Clear, straightforward requests get the best results
Avoid complex logic: Try to avoid use conditional statements like "if X then do Y"

Working with Multiple Attachments

You can specify in your request: "Only use attachment sys_123, not the others"

Combining Multiple Questions

For QnA, you can ask multiple questions at once: "What is the effective date, termination date, and renewal terms in this contract?"

Default Behavior

When in doubt, the agent defaults to QnA mode as it's the most flexible option

Troubleshooting

Processing Takes Too Long

Cause: Document processing time correlates with file size and schema complexity
Solution: Be patient. It is normal for the inference to take a while with long documents.
Note: If processing seems stuck, check the logs for issues (non-existent task definition, wrong LLM endpoint, etc.)

Task Definition Not Found (KIE)

Cause: The specified task definition doesn't exist or wasn't found
Solution: Verify the sys_id is correct or use clearer search terms
Fallback: If your query contains extraction fields, the agent will automatically retry as QnA

Empty or Missing Results

Cause: The document may not contain the requested information
Solution: Rephrase your question or verify you're using the correct document

File Size or Type Errors

Cause: File exceeds the size limit or is an unsupported type
Solution: Compress the file or convert to a supported format

“Extraction” keyword

Cause: You want to perform a QnA task whose query includes the word “extraction” (or similar) and the agent directs the task to KIE instead.
Solution: Include an instruction in the query such as “Use task type QnA.”

What's New in Version 6.0

Now Assist in Document Intelligence 6.0 introduces significant improvements that streamline how the agent processes documents. This version consolidates multiple tools into a unified workflow, making the agent more efficient and easier to use.

Architectural Changes

The following individual tools have been removed and integrated into a single, streamlined subflow:

Summarize attachments
Submit task to Now Assist in Document Intelligence from attachments
Get extracted values from Now Assist in Document Intelligence task
Get results for QnA or Summarization task
Document Question Answering
Pause and Wait

What This Means for You

These tools are no longer accessible individually. All functionality has been integrated into the agent's workflow through the new "Submit and Fetch Results" tool, which handles the complete processing pipeline automatically. The agent's instructions are specifically designed to pass the correct arguments to this tool and properly handle its outputs. Therefore, this tool should not be called directly; always interact with the agent, which will orchestrate everything for you.

✓ Key Takeaway: The agent is now the ONLY entry point for document processing. These deprecated tools have been completely removed from the platform. Always interact with the agent directly. It will orchestrate all processing automatically.

Benefits of the Consolidation

Simpler interaction: No need to know which tool to call; just talk to the agent
Improved reliability: Single workflow reduces complexity and potential errors
Better performance: Optimized processing pipeline handles tasks more efficiently
Optimized processing time: Results are fetched and processed instantly upon completion, eliminating the need for waiting mechanisms
Consistent experience: All document tasks follow the same interaction pattern

Frequently Asked Questions

Can I use multiple documents in one request?

Yes, if a record has multiple attachments. The agent will process them according to your request. You can specify which attachments to use or process all of them.

What happens if I don't know the Use Case/Task Definition sys_id?

The agent will offer to search for relevant Use Cases/Task Definitions. Simply provide a search term (like "invoice" or "contract") and the agent will display matching results for you to select from.

Can I create or modify Use Cases/Task Definitions through the agent?

No, Use Cases/Task Definitions must be created and configured in Now Assist in Document Intelligence separately. The agent can only use existing Use Cases/Task Definitions.

What's the difference between QnA and KIE?

QnA is flexible: ask any question about document content. KIE is structured: extract predefined fields using templates. Use QnA for ad-hoc questions, KIE for consistent, repeatable extraction.

Should I interact with individual tools?

No. Always interact with the agent. The agent will orchestrate all the necessary tools for you. Do not try to call tools directly.

What about video input?

We are developing a new type of AI Agent that enables customers to automate tasks that rely on visual understanding of screens or real-world environments, called Vision AI Agents. It is rolling out in preview for early customer feedback, see this article to learn more.

anand9 · ‎12-18-2025

Question:- Its a separate AI Agent, is there any plan to include attachment summarization in incident summarization skill?

Loic1 · ‎12-19-2025

@anand9 This is already supported, see Customize a Now Assist for IT Service Management (ITSM) skill on the docs, refer to the Additional data sources section.

Matthew_13 · ‎12-19-2025

@Loic1 - Great read; thanks for sharing!

SN Arch Guy · ‎01-26-2026

@Loic1 are there any other prerequisites to enable the QnA capability? I tried to run it from the Now Assist Panel and got failures like in the screen print below, and I tried multiple versions of the prompt. I saw in the Activity tab of AI Agent Studio that there were corresponding invocations of the Default VA Workflow, but to no avail. I noted that that workflow does not contain any of the document intelligence AI Agents inside it. And I did not see any direct invocations of the Document and visual insights AI agent in the Activity tab; but when I tested the prompt in AI Studio, with that specific agent, it did return the desired results.

Loic1 · ‎01-27-2026

@SN Arch Guy This article covers the "Document and Visual Insights AI Agent" (DVI) OOB AI Agent specifically. It does not seem that what you are invoking from the Now Assist Panel has any reference to the DVI AI Agent. I'd suggest to review which flow is being invoked from the Now Assist Panel. Thanks for giving it a try!

Mark B1 · ‎01-27-2026

@Loic1 I am trying to get the "Document and visual insights AI agent" AI agent to summarize image attachments on knowledge articles.

Prompt: "Summarize the image https://xxx.service-now.com/nav_to.do?uri=sys_attachment.do?sys_id=58caeb923322f61060b1413fad5c7b79"

It keeps looping round (around 20-30 times):

Checked on remaining steps

Used the tool "Get results for QnA or Summarization task"

Checked on remaining steps

Used the tool "Get results for QnA or Summarization task"

Checked on remaining steps

Used the tool "Get results for QnA or Summarization task"

Checked on remaining steps

Used the tool "Get results for QnA or Summarization task"

Checking on remaining steps
....

And eventually times out with:

Unfortunately, the summary for the provided image could not be generated due to a technical error (socket timeout). Please try again later or contact support if the issue persists.

Any ideas on what would cause this or how to correct it?

SN Arch Guy · ‎01-27-2026

@Mark B1 where were you testing your prompt? In AI Agent Studio, or in Now Assist Panel? If from within the panel, did you need to do anything to enable the invocation of that agent from the panel? And if not from within the panel, can you try the prompt there and describe the results you see? Does it invoke the desired agent or does it invoke the Default VA Workflow agent workflow like I described in my earlier post?

@Loic1 where should one enter the prompt for your QnA example use case in order to invoke the appropriate AI agent? In the Now Assist Panel, one should not have to specify the particular agent to invoke, ServiceNow should be able to figure that out based on AI Agent configurations.

SN Arch Guy · ‎01-27-2026

I'm getting closer to making the QnA work in Now Assist Panel. There is an existing Agentic workflow, Process images for new tasks, that can be found in AI Agent Studio in the Platform AI Agents and Skills scope. This workflow does contain the Document and visual insights AI agent agent, see image below. However, the workflow does not come enabled for Now Assist Panel, and so I enabled that on the Select channels and status tab in AI Agent Studio.

Now, the problem is that the Process images for new tasks workflow also includes the Image Processor Agent agent, and this agent was written with instructions that, in the first step, require an image to be uploaded. So the Now Assist Panel asks to upload a document, even when the prompt is to analyze a task record that already has an attachment. Ideally, this agent should check first if a task was provided that already has an attachment. So to make this work, will need to create a new Agentic workflow with just the Document and visual insights AI agent agent.

Mohamed_009

@Loic1

Hope your doing well 🙂

Could you please review the post below and share your feedback?

https://www.servicenow.com/community/now-assist-forum/can-virtual-agent-access-documents-attached-to...