Events and Working With Thresholds

amoss98 · ‎03-02-2022

Hello all,

I have recently begun trying to configure Event Rules that include thresholds but I am stumbling across something that is stumping me. I am seeing situations where event management seems to get stuck in a loop of sorts. For some context, the events I am working with come from a Nagios XI instance.

In my example, I am trying to configure a threshold to only create alerts after a set number of memory events over a period of time. This portion of the rule works fine and I can see the events come in, match the rule, and create an alert if applicable. However, it seems Nagios does not always send an "OK" event once the condition has recovered. This should not pose a problem as I can simply set the Close Operator to 'Idle' and the period to something reasonable such as an hour. The property evt_mgmt.active_interval is set at 14400 seconds or 4 hours, which I would expect any new events to cause the existing alert to reopen in that timeframe. What I am seeing happen, is the existing events continue to get updated and the alert gets stuck in a loop of opening and closing.

Example:

Time of Event: 2022-03-01 17:37:25

Updated: 2022-03-02 14:58:44

I am at a loss as to why the event is being updated long after it is initially processed. Any ideas are appreciated!

Raj_Esh · ‎03-02-2022

Hi Amoss;

Have you configured the message_key? What is the unique field every time Alerts are generated from Nagios?

It looks like the message is the same all the time and that is why every time there is a new alert from Nagios it is considered an old one and mapping(Grouping) to an old alert.

Another point -- "It seems Nagios does not always send an "OK" event once the condition has recovered" - It can be configured to send the OK event with the Nagios under Services --> Notifications.

Thanks,

Raj

--Raj

amoss98 · ‎03-02-2022

Ok, thank you Raj. The info regarding the message key makes sense to me. Do you know if there is a way to add a timestamp in to the message key? That would be an easy way to make it unique via an event rule. Right now the message key is simply: node_MetricName

Raj_Esh · ‎03-02-2022

HI Amoss,

Not sure if we can use the timestamp. Because every time it is a new event with a new timestamp, the event is considered as a new and created alert.

But in general, I used to use the ServiceNow recommended event rules and it will generate the message_key automatically if it is empty with the help of OOTB Business rule "Add message key if missing"

addMessageKey();
function addMessageKey(){
	current.message_key = current.source + "_" + current.node + "_" + current.type.name + "_" + current.resource;
}

Maybe you can create a message key with the help of the above syntax in the business rule.

Example: ${source}_${node}_${description} or some other combination.

Hope it helps.

Thanks,

Raj

--Raj

amoss98 · ‎03-04-2022

Thanks for your assistance Raj. I was able to do some more research and in my testing I checked the Nagios XI logs and I am seeing an issue where the timestamp in the API requests coming from ServiceNow is not incrementing. So I am investigating why that is occurring at this time.