
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Creating alert remediation flows is easy especially if you know where to look. When evaluating alert remediation flows, we recommend to update the Event Management Connectors store app as there are remediation flows there.
The OOB remediation flows follow some very good design principals and they are great candidates to use as templates for further development.
The names of the flows all follow good naming conventions, which makes it easy to identify and understand the purpose of the flows. The flows are designed for grouping in bundles, which simplifies what is essentially the alert remediation offering. The number of input parameters needed for each flow is minimized and identical for each flow. The primary variable used by the flows to execute remediation on always has the same name. The name of the variable is "remediation_action_resource" which makes perfect sense as the flows are used to manage resources like Windows or Linux services and processes.
We can use those templates to create alert remediation flows for use with the ServiceNow Agent Client Collector (ACC) which is the singular agent we use for Discovery, remediation, endpoint audit & query in addition to gathering metrics, events and logs. Otherwise, we could just use the OOB flows which work with PowerShell and SSH.
You can navigate to the Flow Designer and filter subflows by application named "Event Management Connectors" to see the OOB flows.
To create an alert remediation subflow that uses ACC, open one of the sublows and copy it to your application scope. I'm using Global, but you will likely want to have your own application scope.
For this example, we will be starting a service on Linux that has stopped. I will be using the CrowdStrike agent as an example as a Linux root user can stop the agent (at least on my box) and the agent should always be running. In my environment, there is an ACC policy that checks to see if the CrowdStrike agent is running or not. If it is not running, then an alert will be generated. Once the policy is turned on, all you have to do is stop the CrowdStrike agent and an event will be created. If you go to the event table and open the event, you will see a button named Create Event Rule.
First, let's make sure that ACC is monitoring the process named falcond. We can use the OOB ACC check "named os.linux.check-process". All we have to do is add that check to our ACC policy and do a small configuration so that the check knows which process to monitor.
You can see the check is associated with the policy here:
Open the check and then change the Check Parameters so that an event will be created when there is less than 1 process named falcond running.
Clearing events are automatically created when the process starts running again.
In order for alert remediation to run in this test case, we will need to compose an alert parameter so that the flow will know which service to start. We can do alert composition in Event Rules, so I have created an Event Rule for composition. An Event Filter is also used in the rule so that it will only execute for the CrowdStrike process monitoring use case.
Now it is time for alert composition. We need a variable to be populated that will ultimately be used for flow execution. I have deviated slightly from the OOB variable name "remediation_action_resource" and will instead use "t_remediation_action_resource". The reason for this deviation is to take advantage of the Alert Tagging capability in the new Express List which essentially allows us to decorate and filter alerts according to our needs. Look at the bottom of the picture below. You will see a manual attribute named "t_remediation_action_resource" with value of "falcon-sensor".
Back to Flow Designer. The OOB remediation flow was copied and modified to work with ACC. The alert record is sent to the flow at runtime and we can use simple script to identify the agent from the alert.
var node = fd_data.subflow_inputs.ah_alertgr.node;
return 'Agent_' + node;
We will also use a short script to identify the command to run in the "Run Command on Agent" action item.
var addInfo = JSON.parse(fd_data.subflow_inputs.ah_alertgr.additional_info);
return 'sudo systemctl start ' + addInfo.t_remediation_action_resource;
Side Note: there are additional action items that ship with the Agent and several of them use osquery which allows you to have one action that works on many operating systems. Osquery is shipped as a binary with ACC and we do not run osquery as a daemon. Besides the OOB use cases, you can easily add your own actions that use sql to run os queries which is great for audit and troubleshooting use cases and allows you to create one action to run on many operating systems.
An Alert Management rule is needed to make the flow available as a playbook in Event Management. In this Alert Filter, we are filtering by alerts that come from the ITOM Agent and that have "t_remediation_action_resource" in the additional info alert attribute.
The final step is to make the flow available as an action. You can choose to have the flows execute manually or automatically. There is an additional option named "both" that makes the flow available for manual and automatic execution.
This is the Express List that we ship as part of the AIOps Experience. This is a feature rich live list of alerts that updates automatically with an option to pause update when you identify something interesting to work on. Filtering on the left, alerts in the middle, flow actions on the top and a panel will open on the right if you select an alert. Easy to see impacted services and the alert tags that we use for decoration. Since we can filter by alert tag, I have chosen to filter by the tag named "t_remediation_action_resource" which is an easy way for me to find alerts that are already marked for remediation via flow.
You can select the flow for execution from the dropdown list box.
The alert work notes will be automatically updated and we of course track execution of remediation flows in relation to alerts. That tracking in turn leverages the ServiceNow Common Service Data model so if you want to know which Service Owner has the highest number of remediation flow executions for the services they own you can figure it out.
There are many Integration Hub spokes and Cloud Actions available which gives you a very large amount of OOB content to leverage for alert remediation flows.
Unleash your inner developer 🙂 Developing these kind of workflows is fun and it is easy to track business value back to your work on alert remediation flows.
- 2,755 Views
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.