Chat streaming responses
Synthesized responses stream in real time for Now Assist LLM chat conversations instead of waiting for the entire message to render. Streaming enables responses from the LLM to be sent immediately to help improve performance when replying to requests.
LLM response messages stream for synthesized responses as they’re generated instead of appearing all at once to end users. Responses stream in either one letter or one word at a time. After end users enter a question or request into the chat, synthesized
responses begin to stream in and stop streaming after the full message has been delivered. While the response is generating, various loading messages (such as Generating an answer) appear. For more
information and examples of loading messages, see Latency feedback in Virtual Agent. An animated sparkle icon appears while the response is generating. The animated sparkle
icon changes to the static branded Virtual Agent icon after the response has fully loaded.
For standard chat, the Show more link appears after six lines of text is streamed. Selecting Show more streams the remainder of the message until the full message is delivered.
For enhanced chat, streaming is applicable to the synthesized response wherever it applies. When streaming completes, the entire synthesized response displays, rather than a truncated response, in the chat window or enhanced chat. The synthesized response can stream in the chat window or on the portal's search results page, as well as the enhanced chat's Now Assist tab.
To enable streaming, configure your AI assistants to stream responses during the Now Assist in Virtual Agent guided setup. For more information, see Configuring assistants overview.