- Post History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
8 hours ago - edited 8 hours ago
Overview
Unstructured inputs often introduce bottlenecks in enterprise workflows. While AI Agents in ServiceNow primarily understand text input, recent capabilities make it possible to build agentic workflows that reason over other input such as documents and images. Because the input has more than text, these capabilities are sometimes called “multi-modal”.
The Document and Visual Insights AI Agent is an OOB AI Agent that helps streamline work with documents and images by providing three capabilities:
- Question and Answer (QnA): Ask specific questions about documents and visual input to get accurate answers with citations
- Key Information Extraction (KIE): Extract structured data using predefined templates
- Document Summarization: Get concise summaries of document content
💡 Important: Always interact with the agent directly. Do not try to use individual tools. Let the agent orchestrate the tools for you.
In this User Guide, we'll explore how to efficiently leverage these capabilities, some best practices and example prompts. Let's get started!
Getting Started
How to Provide Documents
You can provide documents to the agent in two ways:
- Reference a record: Provide a record number (e.g., INC0010003) or sys_id + table name. The agent will retrieve all attachments from that record.
- Upload directly: Upload a document directly in the chat.
Supported File Types
The agent supports the following file formats:
- Documents: PDF, DOCX, TXT
- Spreadsheets: XLSX, CSV
- Presentations: PPTX
- Images: JPEG, PNG
File Requirements
- Attachment size limit: 20MB
- Maximum number of attachments: 5
- Page cap: 200
- Images must be larger than 113x113 pixels
Capability 1: Question and Answer (QnA)
Use QnA to ask specific questions about document content. This is the default mode and the most flexible option.
When to Use QnA
- You have a specific question about document content
- You want to classify or categorize a document
- You need to extract values but don't have a predefined template
- You're asking multiple questions about one or multiple attachments
What You Need
- Required: A document (record reference or upload)
- Required: Your specific question
Best Practices
- Be specific: Ask clear questions like "What is the total invoice amount?" rather than "Tell me about the invoice"
- Use simple language: Avoid complex logic or conditionals if possible
- Combine questions: If asking multiple questions, combine them in a single query: "What is the effective date, termination clause, and governing law?", or “Answer the following questions for the document with sys_id 123: Question 1, Question 2, ...”
- Frame as questions: Start with "What", "Who", "When", "How much", etc.
Example Utterances
With uploaded documents:
- "What is the main obligation in this document?"
- "Who are the parties in the attached file?"
- "What type of document is this?"
- "What are the confidentiality terms in this PDF?"
With record references:
- "What is the subject of the document attached to incident INC001234?"
- "Who is the client in the file attached to record TASK123456?"
- "What are the payment terms in the document linked to TASK456789?"
- "What type of document is the attachment in INC1234?"
With attachment sys_id:
- "What's the governing law in attachment sys_id=abc123?"
- "What are the key terms from attachment sys_id 12345?"
- “Who wrote the document with sys_id 123 from table abc?”
What to Expect
- The AI agent will automatically process your document (using underlying Now Assist in Document Intelligence features)
- You'll see the answer automatically displayed
Capability 2: Key Information Extraction (KIE)
Use KIE to extract structured data from documents using predefined DocIntel templates. This is ideal for processing standardized documents like invoices, contracts, or forms.
When to Use KIE
- You have a predefined Now Assist in Document Intelligence Use Case (task definition / template) for your document type
- You need to extract specific, structured fields consistently
- You're processing multiple documents of the same type
Prerequisites
Before using KIE, you must have:
- A Now Assist in Document Intelligence Use Case (task definition) created and configured for your document schema (see the docs ‘Set up a use case for Now Assist in Document Intelligence’)
- Clear, explicit names for the task definition and keys (used for searching)
- The sys_id of your task definition OR a way to search for it
📝 Note: If you say "extract X and Y" WITHOUT providing a task definition, the agent will treat it as a QnA request instead.
What You Need
- Required: A document (record reference or upload)
- Required: A Now Assist in Document Intelligence Use Case sys_id or name to search
Best Practices
- Know your task definition: Have the sys_id ready for faster processing
- Use explicit names: If searching, use clear terms that match your task definition name
- Confirm selection: Always explicitly confirm which task definition to use when the agent presents search results
Example Utterances
With known task definition sys_id:
- "Extract the values in the attachment of incident INC001234. Use task definition sys_456."
- "Parse the document attached to incident INC001234 using sys_id_456 task definition."
- "Extract data from attachment sys_123 of record INC001234. Use task definition sys_987."
- "Extract data from attachment sys_123 from table abc. Use task definition sys_987."
Without task definition (multi-step process):
- "Extract the values in the attachment of incident INC001234."
- The agent will ask you to provide a task definition sys_id or search query
- "I want to extract data from an invoice attached to INC0010003. Show me the values at the end."
- The agent will search for invoice-related task definitions and ask you to select one
What to Expect
- If you didn't provide a task definition, the agent will prompt you to search or provide a sys_id
- The agent will display search results if applicable - you must explicitly select one
- Processing may take time depending on document size and schema complexity
- The agent will show extracted values automatically
- A review link will be provided to see the full extraction results
Capability 3: Document Summarization
Use Summarization to get concise overviews of document content. This is designed for quickly understanding large documents or multiple files.
When to Use Summarization
- You need a high-level overview of document content
- You're reviewing multiple documents quickly
- You want to understand document purpose before diving into details
What You Need
- Required: A document (record reference or upload)
Best Practices
- Be explicit: Use the word "summarize" or "summary" in your request
- Specify what to summarize: Be clear about which document or attachment you want summarized
- Keep it simple: Straightforward requests work best
Example Utterances
Basic requests:
- "Please summarize these attachments: ..."
- "Can you provide a summary of the uploaded document?"
- "What does this file contain? Please summarize"
With record references:
- "Summarize the attachment from incident INC001234."
- "Summarize the document attached to record TASK123456"
- "Summarize the attachment in INC1234"
- “Summarize the attachments in record 123 from table abc”
With attachment sys_id:
- "Write a summary of the attachment with sys_id ABC123"
- "Summarize attachment sys_id=9988ff123abc."
- "Give me a summary of attachment sys_id 12234 from table incident"
What to Expect
- The AI agent will automatically process your request (using underlying Now Assist in Document Intelligence features)
- You'll receive a concise summary automatically displayed
Tips and Tricks
Communication Style
- Use natural language: Talk to the agent naturally
- Be direct: Clear, straightforward requests get the best results
- Avoid complex logic: Try to avoid use conditional statements like "if X then do Y"
Working with Multiple Attachments
- You can specify in your request: "Only use attachment sys_123, not the others"
Combining Multiple Questions
- For QnA, you can ask multiple questions at once: "What is the effective date, termination date, and renewal terms in this contract?"
Default Behavior
- When in doubt, the agent defaults to QnA mode as it's the most flexible option
Troubleshooting
Processing Takes Too Long
- Cause: Document processing time correlates with file size and schema complexity
- Solution: Be patient. It is normal for the inference to take a while with long documents.
- Note: If processing seems stuck, check the logs for issues (non-existent task definition, wrong LLM endpoint, etc.)
Task Definition Not Found (KIE)
- Cause: The specified task definition doesn't exist or wasn't found
- Solution: Verify the sys_id is correct or use clearer search terms
- Fallback: If your query contains extraction fields, the agent will automatically retry as QnA
Empty or Missing Results
- Cause: The document may not contain the requested information
- Solution: Rephrase your question or verify you're using the correct document
File Size or Type Errors
- Cause: File exceeds the size limit or is an unsupported type
- Solution: Compress the file or convert to a supported format
“Extraction” keyword
- Cause: You want to perform a QnA task whose query includes the word “extraction” (or similar) and the agent directs the task to KIE instead.
- Solution: Include an instruction in the query such as “Use task type QnA.”
Quick Reference
|
Capability |
Required Inputs |
Example Trigger Phrase |
|
Question and Answer |
Document + Question |
"What is the effective date in attachment sys_id from table table_name?" |
|
Key Information Extraction |
Document + Task Definition |
"Extract data from INC001234 using task definition sys_456" |
|
Summarization |
Document only |
"Summarize the attachment from INC001234" |
What's New in Version 6.0
Now Assist in Document Intelligence 6.0 introduces significant improvements that streamline how the agent processes documents. This version consolidates multiple tools into a unified workflow, making the agent more efficient and easier to use.
Architectural Changes
The following individual tools have been removed and integrated into a single, streamlined subflow:
- Summarize attachments
- Submit task to DocIntel from attachments
- Get extracted values from DocIntel task
- Get results for QnA or Summarization task
- Document Question Answering
- Pause and Wait
What This Means for You
These tools are no longer accessible individually. All functionality has been integrated into the agent's workflow through the new "Submit and Fetch Results" tool, which handles the complete processing pipeline automatically. The agent's instructions are specifically designed to pass the correct arguments to this tool and properly handle its outputs. Therefore, this tool should not be called directly; always interact with the agent, which will orchestrate everything for you.
✓ Key Takeaway: The agent is now the ONLY entry point for document processing. These deprecated tools have been completely removed from the platform. Always interact with the agent directly. It will orchestrate all processing automatically.
Benefits of the Consolidation
- Simpler interaction: No need to know which tool to call; just talk to the agent
- Improved reliability: Single workflow reduces complexity and potential errors
- Better performance: Optimized processing pipeline handles tasks more efficiently
- Optimized processing time: Results are fetched and processed instantly upon completion, eliminating the need for waiting mechanisms
- Consistent experience: All document tasks follow the same interaction pattern
Frequently Asked Questions
Can I use multiple documents in one request?
Yes, if a record has multiple attachments. The agent will process them according to your request. You can specify which attachments to use or process all of them.
What happens if I don't know the task definition sys_id?
The agent will offer to search for relevant task definitions. Simply provide a search term (like "invoice" or "contract") and the agent will display matching results for you to select from.
Can I create or modify task definitions through the agent?
No, task definitions must be created and configured in DocIntel separately. The agent can only use existing task definitions.
What's the difference between QnA and KIE?
QnA is flexible: ask any question about document content. KIE is structured: extract predefined fields using templates. Use QnA for ad-hoc questions, KIE for consistent, repeatable extraction.
Should I interact with individual tools?
No. Always interact with the agent. The agent will orchestrate all the necessary tools for you. Do not try to call tools directly.
What about video input?
We are developing a new type of AI Agent that enables customers to automate tasks that rely on visual understanding of screens or real-world environments, called Vision AI Agents. It is rolling out in preview for early customer feedback, see this article to learn more.
- 42 Views