Activate offensiveness detection to log or block offensive content generated by Now Assist skills and workflows.
Before you begin
Role required: sn_generative_ai.nsa_admin
About this task
Generative AI output is probabilistic, which means that the same input can produce different outputs. Some of the AI generated content may be offensive, which includes toxic, sexist, or other harmful language.
Now Assist Guardian detects offensive content in both inputs and outputs, and logs the event when it is detected. You can also configure it to block offensive material so that users see a standard error
message instead of the generated response.
Note: Offensiveness detection applies only to specific
Now Assist skills and workflows. It is not available for all
Now Assist applications. For more information about the list of skills that support offensiveness detection, see
Now Assist Guardian.
You can export logs for review. For more information, see Export Now Assist Guardian logs.
Procedure
-
Navigate to .
-
In the side panel, select the tab.
-
Go to the Available for you tab to see which workflows you can choose from.
Offensiveness guardrails that are already activated appear in the Active tab.
-
Select Activate for the workflow on which you want to enable offensiveness detection.
-
In the Choose an action when offensive content is detected section, select one of the following
options.
- To record the events when offensive content is detected while keeping the content visible to the user, select Log the output. The offensive content is still shown to the user.
- To record the event and prevents the content from being shown to the user, select Block the response and log the output. The user sees a standard error message instead.
-
In the Select content severity level to check for offensiveness section, select one of the following options.
- To flag even the slightest hints of offensive content, select Low.
- To flag clear or moderate offensive content, select Medium.
- To flag only highly offensive content, select High.
-
Select Save and activate.
-
Select Save.
Result
Offensiveness detection guardrail is enabled on your instance for the selected workflow. Events are logged when offensive content is detected or generated.
What to do next
You can enable offensiveness detection for separately for each supported Now Assist application and workflow. Repeat this task for each workflow on which you want offensiveness protection
enabled.
To change the detection impact for an active workflow, select more options (
) icon in the list of active workflows and then select Edit.
To deactivate offensiveness protection for a workflow, select more options (
) icon in the list of active workflows and then select Deactivate.