Activate offensiveness protection for generative AI

Release version: Australia

Updated March 12, 2026

2 minutes to read

Activate offensiveness detection to log or block offensive content generated by Now Assist skills and workflows.

Before you begin

Role required: sn_generative_ai.nsa_admin

About this task

Generative AI output is probabilistic, which means that the same input can produce different outputs. Some of the AI generated content may be offensive, which includes toxic, sexist, or other harmful language. Now Assist Guardian detects offensive content in both inputs and outputs, and logs the event when it is detected. You can also configure it to block offensive material so that users see a standard error message instead of the generated response.

Note:

Offensiveness detection applies only to specific Now Assist skills and workflows. It is not available for all Now Assist applications. For more information about the list of skills that support offensiveness detection, see Now Assist Guardian.

You can export logs for review. For more information, see Export Now Assist Guardian logs.

Procedure

Navigate to All > Now Assist Admin > Settings.
In the side panel, select the Now Assist Guardian > Offensiveness tab.
Go to the Available for you tab to see which workflows you can choose from.

Offensiveness guardrails that are already activated appear in the Active tab.
Select Activate for the workflow on which you want to enable offensiveness detection.
In the Choose an action when offensive content is detected section, select one of the following options.
- To record the events when offensive content is detected while keeping the content visible to the user, select Log the output. The offensive content is still shown to the user.
- To record the event and prevents the content from being shown to the user, select Block the response and log the output. The user sees a standard error message instead.
In the Select content severity level to check for offensiveness section, select one of the following options.
- To flag even the slightest hints of offensive content, select Low.
- To flag clear or moderate offensive content, select Medium.
- To flag only highly offensive content, select High.
Select Save and activate.
Select Save.

Result

Offensiveness detection guardrail is enabled on your instance for the selected workflow. Events are logged when offensive content is detected or generated.

What to do next

You can enable offensiveness detection for separately for each supported Now Assist application and workflow. Repeat this task for each workflow on which you want offensiveness protection enabled.

To change the detection impact for an active workflow, select more options () icon in the list of active workflows and then select Edit.

To deactivate offensiveness protection for a workflow, select more options () icon in the list of active workflows and then select Deactivate.