ITOM EM Alert correlation per time and amount

Community Alums
Not applicable

Hello,

We have an integration between New Relic and ServiceNow and we receive alerts each time one of the Agents running in our servers stops reporting data.

When there is, for example, a network issue we might receive tens or hundreds of these alerts in a matter of a couple minutes. I want to be able to correlate them when this happens, basically by checking the amount of alerts received in a period of time.

 

I have the following Alert Correlation rule in place, which will correlate these alerts if I receive at least 5 of them in a period of 30 minutes:

 

(function findCorrelatedAlerts(currentAlert){
/* CONFIG:
1. exactMatchGroupingFields
List of field names by which alerts are grouped if their values have an exact match
Example: ['metric_name', 'severity', 'message_key'];

2. timeDifferenceInMinutes
Time difference between alerts - in minutes.
Example: 60 (equals 60 min, 1 hour)
*/
var exactMatchGroupingFields = ['metric_name','resource'];
var timeDifferenceInMinutes = 30; // Default 60 minutes between the first alert and the alerts that follow
var minimumNumberOfRecordsToGroup = 5; // Records are going to be grouped only if there more than this amount
/* End of configuration */

// Initialization
var result = {};

// User input validation - Fields must exist in order to continue
if (!exactMatchGroupingFields.length) {
return result;
}

// Get the alert that may be the parent of the current alert by answering the conditions.
// If the alert is primary - make the current alert secondary, else - do nothing.

// Prepare time diff for the query
var timeDifferenceBetweenAlerts = new GlideDateTime(currentAlert.getValue('initial_remote_time'));
var timeDifferenceInMilliSeconds = Number(timeDifferenceInMinutes) * 1000 * 60;
timeDifferenceBetweenAlerts.subtract(timeDifferenceInMilliSeconds);

var gr = new GlideRecord('em_alert');
// Add query to search alerts for matching values of predefined fields
for (var i = 0; i < exactMatchGroupingFields.length; i++) {
var field = exactMatchGroupingFields[i];
gr.addQuery(field, currentAlert.getValue(field));
}
// Add query to search for alerts that are not closed by relation of potential parent (0 or 1) in time window (60) and order by time of creation
gr.addQuery('state', 'NOT IN', 'Closed');
gr.addQuery('correlation_rule_group', 'IN', '0,1'); // 0 = None (potential parent) | 1 = Primary alert (parent) | 2 = Secondary
gr.addQuery('initial_remote_time', '>=', timeDifferenceBetweenAlerts);
gr.orderBy('initial_remote_time');

gr.query();

var amountOfRecordsFound = gr.getRowCount();

if (amountOfRecordsFound > minimumNumberOfRecordsToGroup) {
if (gr._next()) {
// Set the primary and secondary alerts by SysIds if parent was found
// The VALUES for BOTH keys (PRIMARY and SECONDARY) must be an ARRAY of ALERTS SYS_IDS, e.g. SECONDARY: [SYS_ID1, SYS_ID2...],
// while the value for primary can contain only 1 sys_id
result = {
'PRIMARY': [gr.getUniqueValue()], // getUniqueValue() retrieves sys_id, then put in an array (value MUST be put in an array)
'SECONDARY': [currentAlert.sys_id] // Retrieve sys_id, then put in an array (value MUST be put in an array)
};
}
}
return JSON.stringify(result);

})(currentAlert);

 

 

This is working but it has a major flaw: the first 4 alerts that are evaluated, as there is an amount lower than 5, do not get correlated. 

 

Any idea on how to avoid this? I'm open to other approaches.

 

Thanks.

Regards,

Joseba Ulibarri

3 REPLIES 3

Rahul Priyadars
Giga Sage
Giga Sage
I will not call it a flaw as it's working as per your base requirement of 30 mins and 5 alerts or more. What is ur criteria to know 5 alert is also similar to last 4 ones? Regards RP

Hi Rahul,

 

I'm calling my current solution flawed because I need ALL alerts to be correlated, and not since the Nth alert that arrives.

This is just an example during the development, where there requirements are 30min and 5 alerts. In a productive environment, I will probably set this up to 10min and 30 alerts or something similar, as I only want this to be triggered when tens of alerts arrive.

So, what I'm actually is looking for ideas to achieve this...

I know that I could do this with a flow or subflow+alert management rule that groups the other alerts, but I'm hoping not to do something that complicated.

 

Thanks.

Hi Rahul,

 

I'm calling my current solution flawed because I need ALL alerts to be correlated, and not since the Nth alert that arrives.

This is just an example during the development, where there requirements are 30min and 5 alerts. In a productive environment, I will probably set this up to 10min and 30 alerts or something similar, as I only want this to be triggered when tens of alerts arrive.

So, what I'm actually is looking for ideas to achieve this...

I know that I could do this with a flow or subflow+alert management rule that groups the other alerts, but I'm hoping not to do something that complicated.

 

Thanks.