Dan Andrews
ServiceNow Employee

 

Advanced AI Agent Instructions Guide: ServiceNow Edition

With Semi-Formal Reasoning & ReAct V3 Parallel Execution

Author: Dan Andrews | Agentic AI Product Manager | ServiceNow

Updated: April 2026

📚 Table of Contents

  • 1. Introduction & Core Philosophy
  • 2. Critical Instruction Anchoring Framework
  • 3. Content Engineering Principles
  • 4. Framework-Specific Challenges
  • 5. Smart Tools: Optimizing Tool Output
  • 6. Verification Enforcement Framework
  • 7. Semi-Formal Reasoning Framework [NEW]
  • 8. ReAct V3 Parallel Execution Patterns [NEW]
  • 9. Implementation Patterns
  • 10. Testing & Validation
  • 11. Common Issues & Solutions
 

1. Introduction & Core Philosophy

Why This Guide Matters

Modern AI agent frameworks use agent instruction systems that transform user-written instructions into executable agent guidance. This intermediary layer fundamentally changes how we should approach instruction design, requiring a shift from traditional prompting to framework-optimized methodology.

Critical Understanding

AI Agent prompting is fundamentally different from simple GPT prompting.

Unlike basic prompt-response interactions, AI Agents require:

  • Clear, structured steps that define precise workflows
  • Proper verification gates to ensure quality control
  • Evidence-based reasoning that forces agents to prove their conclusions
  • Production-ready considerations based on current enterprise AI research
  • Framework-specific optimization techniques for reliable execution

Five Core Principles

  1. Framework Intelligence: Leverage the framework's built-in tool assignment capabilities rather than fighting them.
  2. Keyword-Based Optimization: Use action words that trigger appropriate built-in tool assignment automatically.
  3. Verification Enforcement: Transform quality gates into framework-executable analytical steps.
  4. Quality Preservation: Embed standards as actionable requirements that survive agent generation.
  5. Structured Reasoning Enforcement: [NEW] Require agents to construct explicit premises, trace logical paths, and derive formal conclusions before acting.

Why Principle 5 Matters

Recent research on semi-formal reasoning has demonstrated that forcing agents to construct logical certificates — explicitly stating premises, tracing execution paths, and deriving formal conclusions — improved accuracy from 78% to 93% on real-world tasks. Unlike unstructured chain-of-thought, semi-formal reasoning acts as a certificate: the agent cannot skip cases or make unsupported claims. This principle is woven throughout this guide as a foundational upgrade to verification gates.

 

2. Critical Instruction Anchoring Framework

What Is Critical Instruction Anchoring

Critical Instruction Anchoring involves strategic placement of essential requirements that ensure consistent AI agent behavior by "anchoring" critical instructions at key positions within the prompt structure. These anchors prevent instruction drift and maintain focus on essential requirements throughout the agent's execution process.

When to Use Critical Instruction Anchoring

Critical Business Logic: When agents must follow specific business rules without deviation

Quality Standards: When output quality cannot be compromised or varies

Compliance Requirements: For regulatory and procedural adherence

Complex Workflows: During multi-step processes where focus can drift

Error Prevention: To avoid costly mistakes in production environments

Anchor Placement Strategies

Primary Anchors (Beginning)

Example:

##CRITICAL REQUIREMENT: Always validate incident priority before proceeding
##QUALITY STANDARD: Maintain professional communication throughout
##COMPLIANCE RULE: Never expose sensitive customer data

Reinforcement Anchors (Mid-prompt)

Example:

### Step 2: Data Analysis
Analyze incident data maintaining CRITICAL REQUIREMENT from above
Generate insights while preserving QUALITY STANDARD requirements

Validation Anchors (End)

Example:

### Final Validation Gate
Confirm all CRITICAL REQUIREMENTS satisfied before proceeding
Verify QUALITY STANDARDS maintained throughout process

Common Critical Instruction Anchoring Mistakes

  • Overuse: Too many anchors create cognitive overload
  • Weak Language: Using "should" instead of "must" for critical requirements
  • Inconsistent Repetition: Varying anchor language between references
  • Anchor Drift: Allowing anchored requirements to be modified by subsequent instructions
 

3. Content Engineering Principles

Clarity Engineering

Eliminating ambiguity through precise language construction:

  • Specific Action Verbs: Use analyze instead of "look at"
  • Quantified Requirements: "Generate minimum 3 recommendations" vs. "provide some suggestions"
  • Explicit Conditions: "If priority = High, then escalate immediately" vs. "escalate urgent issues"
  • Defined Boundaries: Clear success/failure criteria for each step

Context Engineering

Providing sufficient background without information bloat:

  • Relevant Background: Include only context that affects decision-making
  • Situational Awareness: Help agents understand their role and environment
  • Constraint Context: Explain why limitations exist
  • Success Context: Define what good outcomes look like

Cognitive Load Management

Information Chunking

Poor:

"Analyze customer data, check priority, validate permissions, generate report, format output, send notifications, update records, and log activity"

Better:
### Step 1: Data Analysis
Analyze customer data systematically

### Step 2: Validation
Check priority and validate permissions

### Step 3: Output Generation
Generate and format comprehensive report
 

4. Framework-Specific Challenges

Critical: Explicit Built-in Tool Naming Causes Execution Failures

Problematic Approach

  • Use Content Analysis tool to evaluate findings
  • Use User Output tool to display results
  • Use User Input tool to gather information

Problem: Agent searches for "Content Analysis tool" as an assigned tool and fails when it can't find it.

Framework-Optimized Solution

  • Analyze and evaluate findings systematically
  • Display comprehensive results to user
  • Gather detailed information from user

Result: Framework automatically assigns appropriate built-in tools based on action keywords.

Keyword-Based Built-in Tool Optimization

Category Keywords Framework Action
Analysis & Evaluation analyze, evaluate, assess Triggers analytical tools
Data Processing fetch, retrieve, filter Activates data tools
Content Creation generate, synthesize, compile Enables creation tools
Validation verify, validate, confirm Invokes validation tools
User Interaction show, display, gather Triggers UI tools
 

5. Smart Tools: Optimizing Tool Output for Agent Success

Understanding Smart Tools

Smart tools are agent-assigned tools that leverage platform capabilities to pre-process, analyze, and structure data before sending it to agents. Instead of passing raw data that forces agents to perform complex analysis, smart tools do the heavy lifting within the platform layer.

Core Design Principles

1. Platform-Powered Processing

Traditional approach: Send 1,000 records to the agent for analysis

Smart tool approach: Use platform scripts to analyze, score, and return the top 10 relevant records with recommendations

2. Decision-Ready Outputs

Poor Output Structure:

{
  "data": [/* hundreds of records */],
  "count": 847
}

Smart Output Structure:

{
  "analysis_complete": true,
  "recommended_action": "APPROVE_AUTOMATED",
  "confidence_score": 0.95,
  "reasoning_trace": {
    "premises": ["847 records scanned", "3 met critical threshold"],
    "logic": "Critical items scored >0.9 on severity index",
    "conclusion": "Automated approval safe for non-critical subset"
  },
  "key_findings": {
    "critical_items": 3,
    "requires_attention": ["ITEM_123", "ITEM_456"],
    "safe_to_ignore": 844
  },
  "next_steps": "Process critical items in order shown"
}

NEW: Decision Provenance via reasoning_trace

Recent research on semi-formal reasoning shows that agents perform significantly better when they can trace why a recommendation was made, not just what the recommendation is. Adding a reasoning_trace object to your smart tool outputs gives the agent the premises it needs to either trust or challenge the recommendation. This prevents blind acceptance of tool outputs and enables the agent to catch edge cases.

3. Implementation Pattern: Threshold-Based Intelligence

// In your ServiceNow Flow/Script
if (total_records > 100) {
  output = {
    summary_mode: true,
    total_count: total_records,
    critical_subset: analyzeCriticalRecords(records),
    reasoning_trace: {
      premises: ['Total records: ' + total_records],
      logic: 'Threshold exceeded, switching to summary mode',
      conclusion: 'Focus on critical subset only'
    },
    recommendation: 'FOCUS_ON_CRITICAL'
  };
} else {
  output = {
    summary_mode: false,
    detailed_records: records,
    recommendation: 'REVIEW_ALL'
  };
}

Best Practices Summary

  1. Do the hard work in the platform — Complex calculations, scoring, filtering
  2. Provide clear next actions — Never leave the agent guessing
  3. Include confidence indicators — Help agents know when to escalate
  4. Include reasoning traces [NEW] — Give agents the premises behind every recommendation
  5. Structure for scannability — Key information immediately visible
  6. Design for failure — Always include fallback recommendations
 

6. Verification Enforcement Framework

The Problem with Traditional Verification

Traditional Approach (Often Overlooked)

☐ All criteria met
☐ Quality standards achieved
☐ Ready to proceed

Problem: LLMs frequently overlook these checkbox elements during instruction processing.

Framework-Optimized Approach (Preserved)

Step 1a: Quality Validation Gate
Analyze completion against established criteria:
• All requirements satisfied
• Quality standards met
• Readiness confirmed
Generate validation report and proceed only when criteria pass

Result: Framework converts to executable analytical steps that LLMs reliably process.

Evolving from Completion Gates to Evidence Gates [NEW]

Standard verification gates ask "did you do it?" but do not ask "prove how you got there." Recent research on semi-formal reasoning demonstrates that when agents are required to document the evidence behind their conclusions — not just confirm completion — accuracy improves by 10–15 percentage points.

Completion Gate (Good, but limited):

### Step 2: Validation Gate
Analyze completion against criteria:
• All requirements satisfied
• Quality standards met
Generate validation report

Evidence Gate (Better — forces proof): [NEW]

### Step 2: Evidence Validation Gate (INTERNAL - DO NOT DISPLAY TO USER)
Construct a reasoning certificate in your internal reasoning.
This validation is for decision integrity and must NOT be
presented to the user:

Premises Gathered:
• Identify all data points retrieved in the prior step
• State each premise explicitly with its source reference

Execution Trace:
• Trace the logical path from premises to your determination
• Account for each conditional branch encountered
• Document any edge cases evaluated and their outcomes

Formal Conclusion:
• Derive conclusion solely from documented premises and trace
• State conclusion with explicit linkage to supporting evidence
• Flag any premises that could not be verified

This reasoning is internal only. Proceed only when all
premises are supported and no logical gaps remain

When to Use Evidence Gates vs. Completion Gates

Use Evidence Gates for: High-stakes decisions (escalations, approvals, compliance determinations), any step where the agent must choose between multiple paths, and any step where a wrong conclusion has downstream consequences.

Use Completion Gates for: Data retrieval confirmations, simple format validations, and steps where the outcome is binary (data returned or not).

Platform Compatibility Tip: Make Evidence Gates Actionable

The ServiceNow ReAct engine classifies steps that do not invoke a tool as "non-actionable workflow steps" and may collapse them into the agent's internal thought — potentially compressing or skipping your structured reasoning certificate. To prevent this, anchor your evidence gates to an analytical action. For example, instead of a pure reasoning step, phrase it as "Analyze the classification decision by organizing the premises, execution trace, and conclusion internally." This triggers a built-in analytical tool, gives the engine an action to execute, and keeps the output internal since built-in analytical tool outputs are not displayed to the user.

 

7. Semi-Formal Reasoning Framework [NEW]

Research Foundation

This section is based on recent research into semi-formal reasoning as a structured prompting methodology (2026). The technique improved accuracy from 78% to 93% on real-world tasks, requires no model training, and is purely a prompt engineering enhancement — meaning it works inside ServiceNow AI Agent Builder today.

What Is Semi-Formal Reasoning

Semi-formal reasoning is a structured prompting technique that requires agents to construct explicit logical certificates before making determinations. Unlike standard chain-of-thought (which lets agents narrate freely), semi-formal reasoning enforces a three-part structure:

Premises → Execution Trace → Formal Conclusion

The key insight: the agent cannot skip cases or make unsupported claims because the template structurally requires evidence at each stage.

The Three Components

1. Premises

Explicit statements of fact that the agent has gathered or been given. Each premise must reference a verifiable source — a tool output, a record field, a user statement, or a platform value.

Premises Gathered:
• Incident INC0012345 has priority = P1 (from incident record)
• Assignment group 'Network Ops' has 3 open P1s (from workload query)
• Caller reported 'complete outage' (from user input)

2. Execution Trace

A step-by-step logical path that connects premises to a determination. The agent must trace each conditional branch it evaluated, including branches it did NOT take and why.

Execution Trace:
• P1 + 'complete outage' → Major Incident criteria met
• Network Ops has 3 open P1s → workload exceeds threshold (>2)
• Evaluated: reassign to backup group? NO — no backup group configured
• Evaluated: auto-resolve via KB? NO — major incidents require human review
• Determined path: escalate to Major Incident Management

3. Formal Conclusion

A determination that is derived solely from the documented premises and trace. No new information may appear in the conclusion that was not established in the premises.

Formal Conclusion:
• Based on P1 priority, complete outage report, and workload threshold
  exceeded, this incident meets Major Incident criteria
• Action: Escalate to Major Incident Management process
• Unverified premises: None

Applying Semi-Formal Reasoning in ServiceNow Agent Prompts

Critical: Evidence Gates Are Internal Operations

Evidence certificates are Thought-cycle operations. They must never be surfaced to the user. The agent's Thought field is internal reasoning that is not displayed. Your evidence gates should be anchored as INTERNAL in the step header and reinforced with "This reasoning is internal only" at the close. Always follow an evidence gate with a separate user-facing step that defines exactly what output the user should see.

Platform Compatibility Tip: Thought Field Token Budget

The ServiceNow ReAct engine constrains the agent's Thought field to a concise summary (typically 4–5 sentences) and actively discourages detailed reasoning in the output. A full Premises → Execution Trace → Formal Conclusion certificate exceeds this budget. The reasoning may still occur in the LLM's internal chain-of-thought, but the engine will compress it in the externalized Thought. To mitigate this, keep your evidence gate prompts focused on the critical decision points rather than exhaustive documentation — and use the actionable anchoring technique described in Section 6 to ensure the gate executes as a real step rather than getting folded into a summary.

Pattern: Evidence Certificate After Analysis Steps

### Step 3: Incident Classification
Analyze the incident details to determine the appropriate category
and subcategory using the following classification criteria:
• Match reported symptoms against known category definitions
• Evaluate CI relationships for infrastructure-based classification
• Assess business impact for severity-based routing

### Step 4: Classification Evidence Gate (INTERNAL - DO NOT DISPLAY TO USER)
Construct a reasoning certificate in your internal reasoning.
This validation is for decision integrity and must NOT be
presented to the user:

Premises Gathered:
• List the specific symptoms, CI data, and impact indicators
  retrieved in Step 3
• Reference each data point to its source (tool output, user input,
  record field)

Execution Trace:
• Trace which classification criteria each premise satisfies
• Document any ambiguous indicators and how they were resolved
• Account for alternative classifications considered and why they
  were ruled out

Formal Conclusion:
• State the selected category and subcategory with explicit linkage
  to the premises that determined each
• Flag any classification confidence below 80% for human review

This reasoning is internal only. Proceed only when all premises
are sourced and the execution trace contains no logical gaps

### Step 5: Present Classification to User
Display ONLY the following to the user:
• The determined category and subcategory
• The recommended next action
• A brief plain-language explanation (1-2 sentences)

DO NOT include internal reasoning details, premises,
execution traces, or evidence certificates in user-facing output

Pattern: Conditional Branch Accountability

When your prompt includes conditional logic, require the agent to account for every branch:

### Step 5: Resolution Path Selection
Based on the analysis results, determine the resolution approach.

For EACH of the following conditions, explicitly state whether it
is met or not met and cite the supporting evidence:

• IF category = Hardware AND warranty active: Initiate RMA
• IF category = Software AND KB match found: Present KB resolution
• IF category = Network AND P1/P2: Escalate to Network Ops
• IF none of the above: Route to general support queue

Document which conditions evaluated TRUE, which evaluated FALSE,
and the specific data points that determined each evaluation.
Execute ALL applicable resolution paths simultaneously.

Platform Compatibility Tip: Frame Branches as Mandatory Evaluation Steps

The ServiceNow ReAct engine may optimize away conditional branches if it determines the condition is not met by the current mission — skipping the evaluation entirely without documenting why. To prevent this, do not frame conditions as optional paths the engine can skip. Instead, frame them as mandatory evaluation requirements: "For EACH condition, determine and document whether it is met. This evaluation is a required step regardless of outcome." This ensures the engine treats the evaluation itself as the action, not the conditional execution that follows.

Anti-Patterns to Avoid

  • Ungrounded assertions: The agent claims "this is a hardware issue" without citing which premise led to that conclusion
  • Skipped branches: The agent selects a path without documenting why other paths were eliminated
  • Phantom premises: The conclusion references data that was never retrieved or stated in the premises
  • Circular reasoning: The conclusion restates a premise as evidence for itself
  • Reasoning leakage: [NEW] The agent surfaces its evidence certificate, premises, or execution trace to the user instead of keeping them in the internal Thought cycle
 

8. ReAct V3 Parallel Execution Patterns [NEW]

Platform Context

Every published V3 prompt across all model providers includes <best_practices_for_parallelisation>. Some model variants are more feature-rich than others, with exclusive sections for conditional branching, reasonable inference handling, and retry/escalation handling.

Parallel Execution Rules

What Gets Batched Together

  • Independent autopilot tools — No shared inputs, no shared write targets, no dependency chain
  • Up to 4 actions per batch — Even if 8 are independent, only 4 run at once

What Forces Serialization (Single-Action Batch)

  • Copilot tools — Any tool requiring user permission runs alone
  • FALLBACK actions — show_output_to_user, collect_input_from_user always run alone
  • Finish action — Always alone, never batched
  • Non-parallelizable tools — If a tool description says 'non parallelizable'
  • Dependent tools — If Tool B needs the output of Tool A
  • Same-resource writers — Two tools writing to the same field/entity serialize

V3 Dependency Check

Check Result
Does any required input come from a tool that hasn't run yet? Serialize
Do two tools write to the same field on the same record? Serialize
Is the tool marked non-parallelizable? Isolate
Is this a copilot tool? Isolate
None of the above? Batch it

Prompt Patterns for Parallel Execution

Pattern 1: Explicit Independence Declaration

### Step 3: Parallel Data Collection
Retrieve ALL of the following simultaneously - these operations
are independent and share no dependencies:
• Fetch the caller's incident history for the last 90 days
• Fetch the caller's asset assignments from CMDB
• Fetch the caller's open change requests
• Fetch the caller's service entitlements

### Step 4: Data Collection Validation Gate
Analyze all retrieved data for completeness:
• All four data sources returned successfully
• No critical data gaps identified
Generate data collection assessment before proceeding

Pattern 2: Gating Action to Parallel Fan-Out

### Step 1: Incident Identification
Retrieve the incident record using the provided incident number

### Step 2: Identification Validation Gate
Analyze the retrieved incident for completeness:
• Valid incident sys_id obtained
• Incident state and category confirmed
Generate identification assessment

### Step 3: Parallel Incident Analysis
Using the incident record from Step 1, perform ALL of the
following independently - each operation reads from the incident
but writes to different targets:
• Analyze the incident assignment group workload
• Retrieve related knowledge articles matching the category
• Check for duplicate or parent incidents in the last 30 days

Pattern 3: Conditional Parallel Branches

### Step 5: Conditional Resolution Actions
Based on the analysis results, execute ALL applicable paths
simultaneously. For EACH condition, state whether it is met and
cite the evidence:
• IF incident category is 'Hardware': Initiate hardware diagnostic
• IF incident priority is P1 or P2: Notify on-call manager
• IF duplicate incidents found: Link to parent incident
• IF knowledge articles matched: Compile top 3 suggestions

Pattern 4: Avoiding Accidental Serialization

Forces serial execution:

### Step 3: Update Incident Priority
Update the incident priority field to P2

### Step 4: Update Incident Assignment
Update the incident assignment group to Network Operations

Acknowledges serialization:

### Step 3: Sequential Incident Updates
Apply the following updates to the incident record in order -
these write to the same record and must execute sequentially:
1. Update priority to P2
2. Update assignment group to Network Operations
3. Update category to Network

Pattern 5: Parallel Collect, Single Synthesize, Parallel Distribute

### Step 2: Parallel Information Gathering
Retrieve ALL of the following simultaneously - independent reads:
• Search AI Knowledge Base for articles matching symptoms
• Query Fleet Management for the caller's device health
• Retrieve the caller's recent interaction history (7 days)

### Step 3: Gathering Validation Gate
Analyze all retrieved information sources:
• Knowledge base results returned
• Fleet device health data obtained
• Interaction history loaded
Generate information gathering assessment

### Step 4: Resolution Synthesis
Using all gathered data from Step 2, compile a prioritized
resolution plan organized by likelihood of success

### Step 5: Synthesis Evidence Gate (INTERNAL - DO NOT DISPLAY TO USER)
Construct a reasoning certificate in your internal reasoning.
This validation must NOT be presented to the user:

Premises: Cite which data sources support each recommendation
Trace: Document why recommendations are ordered as shown
Conclusion: Confirm plan addresses reported symptoms with evidence

This reasoning is internal only.

### Step 6: Present Resolution Plan
Display ONLY the resolution plan to the caller in plain language.
DO NOT include premises, traces, or evidence certificates.
Collect feedback on the presented plan.

Token Anchoring Patterns for V3

Independence Anchors (Signal Batchability)

  • "these operations are independent and share no dependencies"
  • "retrieve ALL of the following simultaneously"
  • "perform ALL of the following independently"
  • "each operation reads from X but writes to different targets"

Serialization Anchors (Prevent Unwanted Batching)

  • "must execute sequentially"
  • "these write to the same record"
  • "Step B depends on the output of Step A"

Gating Anchors (Create Sync Points)

  • "once all parallel operations complete"
  • "using the combined results from the previous step"

Anti-Patterns

  • Implicit dependencies: If Step 4 uses Step 3's output, state the dependency explicitly or V3 may batch them
  • Mixing user interaction with tool calls: FALLBACK always runs alone; combining fetch + ask wastes a cycle
  • Overloading with 5+ operations: V3 caps at 4 per batch; group into clusters of 4 or fewer
 

9. Implementation Patterns

Step Structure Template

### Step X: [Action-Oriented Name]
Objective: [Single, clear purpose statement]
Required Actions: [List specific actions using appropriate keywords]
Completion Trigger: [Explicit condition for step completion]

### Step Xa: [Validation Name] Gate
Analyze [output] to verify [specific criteria]:
• [Measurable completion criterion 1]
• [Measurable completion criterion 2]
• [Output validation criterion]
Generate validation assessment and proceed only when criteria satisfied

Evidence-Enhanced Step Template [NEW]

### Step X: [Action-Oriented Name]
Objective: [Single, clear purpose statement]
Required Actions: [List specific actions using appropriate keywords]

### Step Xa: Evidence Validation Gate (INTERNAL - DO NOT DISPLAY TO USER)
Construct a reasoning certificate in your internal reasoning.
This validation is for decision integrity and must NOT be
presented to the user:

Premises Gathered:
• Identify all data points retrieved from Step X
• State each premise with source reference

Execution Trace:
• Trace logical path from premises to determination
• Account for each conditional branch and edge case
• Document alternative paths considered and why eliminated

Formal Conclusion:
• Derive conclusion solely from documented evidence
• Flag any unverified premises

This reasoning is internal only. Proceed only when premises
are supported and trace has no gaps

### Step Xb: Present Results to User
Display ONLY the following to the user:
• [User-relevant outcome from Step X]
• [Recommended next action in plain language]

DO NOT include internal reasoning details, premises,
execution traces, or evidence certificates in user-facing output

Conditional Logic Pattern [UPDATED]

Original pattern let agents assert a classification without proof. Updated to require traced evidence:

Analyze user input to determine processing approach.

For EACH condition below, explicitly state whether it is met
and cite the specific data points that support the evaluation:

• IF complex requirements detected: Execute comprehensive methodology
• IF standard requirements detected: Apply streamlined process
• IF minimal requirements detected: Use direct approach

Document which indicators determined the classification.
Generate processing plan based on evidenced complexity assessment.

Iterative Refinement Pattern

Generate initial output based on requirements
Present results to user for feedback
Analyze feedback for improvement opportunities
Refine output incorporating user guidance
Repeat until user satisfaction achieved
 

10. Testing & Validation

Testing Framework

1. Agent Generation Test

Test Objective: Verify optimization preservation

  • Generate agent runtime from optimized instructions
  • Compare against original instructions
  • Validate verification gate preservation
  • Check tool assignment accuracy

2. Agent Execution Test

Test Objective: Verify real-world performance

  • Happy path: Standard workflow execution
  • Edge cases: Unusual input handling
  • Error conditions: Failure recovery
  • Quality validation: Standard maintenance

3. Reasoning Trace Audit [NEW]

Test Objective: Verify the agent's logical reasoning chain is sound

  • Does the agent document premises with source references?
  • Does the execution trace cover all conditional branches, including those not taken?
  • Does the conclusion introduce any data not established in the premises?
  • Are edge cases explicitly addressed rather than silently skipped?
  • When the agent selects a path, can you trace backwards from conclusion to evidence?

Reasoning Audit Scoring

Pass: All premises sourced, all branches traced, conclusion follows from evidence

Partial: Premises sourced but some branches unaccounted for, or conclusion introduces minor assumptions

Fail: Agent asserts conclusions without traceable evidence, skips branches, or introduces phantom data

Performance Benchmarks

Metric Target
Step Completion 95%
Gate Preservation 100%
Tool Assignment Accuracy 90%
Quality Standard Retention 85%
Reasoning Trace Completeness [NEW] 90%
Parallel Batch Efficiency [NEW] 85%
 

11. Common Issues & Solutions

Quick Fix Reference

Problem Solution Prevention
Agent Uses Wrong Tool Adjust action keywords Test keyword variations
Gates Missing Convert to analytical steps Use template structure
Quality Loss Embed as requirements Add validation gates
Tool Confusion Remove explicit names Use keywords only
Ungrounded Claims [NEW] Add evidence gates Require reasoning certificates
Skipped Branches [NEW] Require branch accounting Use conditional accountability pattern
Accidental Serialization [NEW] Add independence anchors Use V3 token anchoring patterns
Reasoning Leakage [NEW] Add INTERNAL anchors to gates Separate evidence gates from user-facing steps
 

Implementation Checklist

Instruction Design

  • ☐ Clear action keywords identified
  • ☐ Verification gates converted to analytical steps
  • ☐ Quality standards embedded as actionable requirements
  • ☐ Tool assignments optimized (keyword-based vs explicit)
  • ☐ Step structure follows template format
  • ☐ Evidence gates added for high-stakes decision steps [NEW]
  • ☐ Conditional branches require explicit evaluation documentation [NEW]
  • ☐ Evidence gates anchored as INTERNAL with DO NOT DISPLAY TO USER [NEW]
  • ☐ Every evidence gate followed by a separate user-facing presentation step [NEW]

Framework Integration

  • ☐ Built-in tools referenced via keywords only
  • ☐ Assigned tools explicitly named with detailed descriptions
  • ☐ Framework intelligence leveraged appropriately
  • ☐ Agent generation compatibility verified
  • ☐ Smart tool outputs include reasoning_trace objects [NEW]

V3 Parallel Optimization [NEW]

  • ☐ Independent operations grouped with independence anchors
  • ☐ Dependent operations explicitly serialized with serialization anchors
  • ☐ Batches limited to 4 or fewer operations
  • ☐ Copilot tools isolated in their own steps
  • ☐ Gating steps create clear sync points before fan-out

Optimization Priorities

Critical (Must Fix)

  • Built-in tool naming issues
  • Missing verification gates
  • Quality standard loss
  • Step progression failures
  • Reasoning leakage to user-facing output [NEW]

Important (Should Fix)

  • Suboptimal tool assignment
  • Unclear action descriptions
  • Missing error handling
  • Inconsistent formatting
  • Missing evidence gates on high-stakes decisions [NEW]
  • Accidental serialization of independent operations [NEW]

Enhancement (Nice to Have)

  • Advanced optimization patterns
  • Enhanced user experience
  • Performance improvements
  • Additional quality metrics
  • reasoning_trace objects in smart tool outputs [NEW]
 

Conclusion

The framework-optimized approach to AI agent instruction design represents a fundamental shift from traditional prompting methodologies. By understanding and leveraging agent generation intelligence, using keyword-based built-in tool optimization, implementing verification enforcement frameworks, and applying semi-formal reasoning principles, you can create agents that deliver consistent, professional-grade results.

Remember the Core Principles

  1. Use action keywords, not explicit built-in tool names
  2. Transform verification into analytical steps
  3. Embed quality standards as actionable requirements
  4. Trust and leverage framework intelligence
  5. Apply critical instruction anchoring for essential requirements
  6. Engineer content for clarity and precision
  7. Design smart tools that do the heavy lifting
  8. Enforce structured reasoning with evidence certificates [NEW]
  9. Optimize for V3 parallel execution with token anchoring [NEW]

What's New in This Edition

Section 7: Semi-Formal Reasoning Framework — Evidence certificates based on recent research (Premises → Execution Trace → Formal Conclusion)

Section 8: ReAct V3 Parallel Execution Patterns — Token anchoring, batching rules, and prompt patterns for parallel tool execution

Evidence Gates — An evolution of verification gates that forces agents to prove their reasoning

Reasoning Trace Audits — A new testing methodology for validating agent logic chains

Smart Tool Provenance — reasoning_trace objects in tool outputs for decision transparency

Updated Conditional Logic — Requires agents to document branch evaluations with evidence

With these techniques, you can build AI agents that not only execute reliably but also maintain the high standards necessary for enterprise deployment.

The investment in proper optimization pays dividends in reduced maintenance, improved user satisfaction, and scalable agent performance.

Comments
bandrews21
Tera Contributor

This is great Dan! 

pratapdalai7868
Tera Contributor

 

I was reading about the Iterative Refinement Pattern in prompt engineering, which basically works like this:

  1. Generate an initial response based on the requirements
  2. Share it with the user for feedback
  3. Analyze the feedback and look for improvements
  4. Refine the response using the user’s guidance
  5. Repeat until the user is satisfied

Sounds great in theory, right? But in practice, I’ve noticed it doesn’t always work as expected. Sometimes the model skips steps or finalizes too early. And if you try the same prompt on different models (like ChatGPT, Gemini, Claude, etc.), the results can vary a lot.

Why? A few reasons:

  • Each model handles context and instructions differently
  • Some are better at multi-turn conversations than others
  • System-level tuning also plays a big role

So, while this pattern is a good best practice, it’s not a guarantee. If you want better results, you might need to:

  • Give very clear instructions like: “Don’t finalize until I confirm. Always ask for feedback after each iteration.”
  • Pick a model that’s strong in iterative refinement
JosephWestrich
Tera Contributor

This is a great article, thanks for sharing!

One question; you mention not using tool names specifically, curious if there is a reason for that. I have found being explicit with calling of specific tools by their names seems to be very reliable. 

Also, do you have any recommendations or insights on how best to save data in short term memory to be used between agents. Is there a way you've seen to capture outputs and pass inputs that is consistently effective?

warren_chan
ServiceNow Employee

@JosephWestrich , one trick that we have used for storing data points in short-term memory is to explicitly tell the AI agent in the prompt/instructions to "save XYZ data in ${variable_name}". When the AI agent reviews its available context, the ${variable_name} seems to hold pretty well for retrieving something in short-term memory.

 

We do utilize this concept in the K25 labs for AI agents. If you want to review those labs, you can log into SNU (formerly Now Learning) and search for the K25 AI agents labs.

Version history
Last update:
yesterday
Updated by:
Contributors