Actual Model Maximum Token Limit

m7777 · ‎01-20-2026

Can I view the actual usage of Request token amount, Response token amount, and Buffer token amount?

I want to estimate token usage. I want to conduct tests in the development environment and base the estimate on the test results.

Regarding Response Token Amount
For example, if an AI agent's script (tool) retrieves 20,000 records from the incident table, will this be counted toward the response token amount?

Matthew_13 · ‎01-21-2026

Hi Buddy,

Good Question— this is where it helps to separate conversation timeouts from execution limits.

You already covered the right places for idle conversation timeouts (the system property and the channel-level setting). Those control how long the session waits when a user doesn’t respond.

For timeouts during agentic workflow or AI agent execution, there isn’t a single “agent timeout” knob, but a few different limits depending on what’s happening under the hood:

Agent/tool execution limits: There are guardrails that cap continuous or repeated tool execution so an agent doesn’t loop indefinitely. Hitting these can look like a timeout even though it’s really a governor stopping execution.
Flow / action execution timeouts: If the agent is calling Flow Designer or IntegrationHub actions, those actions have their own execution time limits. Long-running actions or slow integrations are the most common cause of agent runs stopping mid-execution.
Flow engine runtime limits: If a flow or subflow triggered by the agent runs too long server-side, it can hit the FlowAPI execution ceiling and terminate.
Outbound REST / integration timeouts: In many cases the “agent timeout” is actually an HTTP or socket timeout from an external system the agent called via an action.

General rule of thumb:
If the agent stops responding while working, look at flow/action execution and integration timeouts.
If it waits and then closes, look at conversation idle timeouts.

@m7777 - Please mark Accepted Solution or Thumbs Up if you found Helpful

MJG

View solution in original post

Matthew_13 · ‎01-20-2026

Hi Buddy,

Right now ServiceNow doesn’t expose a way to see the actual request, response, or buffer token amounts per AI interaction. There’s no table, log, or UI that shows “this call used X tokens.” Token usage is tracked internally and usually only visible in aggregate at a subscription or instance level.

So the only practical way to estimate usage is to test in DEV and control what you send to and return from the model.

On your question about response tokens and large queries:

If your agent’s script queries 20,000 incidents but does all the work server-side (counts, grouping, summarizing, filtering) and only returns a small result to the LLM, then only that small result counts toward response tokens. The raw query itself does not.

If, however, your script returns large amounts of data — for example full incident records, long descriptions, comments, or big arrays of fields — and that data is passed back to the model, then yes, that output will be tokenized and counted as response tokens. In practice, that will usually hit buffer limits or get truncated anyway.

The general rule of thumb is:
Only what the LLM actually sees counts as tokens.

That’s why the best pattern is to never send large datasets to the model. Use the script/tool to do the heavy lifting, and return only what the model needs to reason over

@m7777 - Please mark Accepted Solution or Thumbs Up if you found Helpful 🙂

MJG

m7777 · ‎01-21-2026

Hi @Matthew_13

Thanks for your answer.
I understand your answer.

I have a follow-up question.

Regarding idling while waiting for a user response, I understand the following points to check:

Check the com.glide.cs.conversation_idle_timeout system property

Check the Conversation Idle Timeout in the Now Assist panel in the sys_cs_channel table

If you know, I'd like to know about timeouts during agentic workflow or AI agent execution.

Matthew_13 · ‎01-21-2026

Hi Buddy,

Good Question— this is where it helps to separate conversation timeouts from execution limits.

You already covered the right places for idle conversation timeouts (the system property and the channel-level setting). Those control how long the session waits when a user doesn’t respond.

For timeouts during agentic workflow or AI agent execution, there isn’t a single “agent timeout” knob, but a few different limits depending on what’s happening under the hood:

Agent/tool execution limits: There are guardrails that cap continuous or repeated tool execution so an agent doesn’t loop indefinitely. Hitting these can look like a timeout even though it’s really a governor stopping execution.
Flow / action execution timeouts: If the agent is calling Flow Designer or IntegrationHub actions, those actions have their own execution time limits. Long-running actions or slow integrations are the most common cause of agent runs stopping mid-execution.
Flow engine runtime limits: If a flow or subflow triggered by the agent runs too long server-side, it can hit the FlowAPI execution ceiling and terminate.
Outbound REST / integration timeouts: In many cases the “agent timeout” is actually an HTTP or socket timeout from an external system the agent called via an action.

General rule of thumb:
If the agent stops responding while working, look at flow/action execution and integration timeouts.
If it waits and then closes, look at conversation idle timeouts.

@m7777 - Please mark Accepted Solution or Thumbs Up if you found Helpful

MJG