Auto-Closing Incidents Triggered by Logic Monitor Integration

lvenna
Tera Contributor

Hello All,

 

We are experiencing an issue with Incidents created through the Logic Monitor integration. These incidents are generated for collector failovers, and the failover details are included in the Incident description.

When the failover is restored, Logic Monitor sends a notification, which is added as a work note to the Incident (e.g., indicating that the failover is back online for the collector). However, the Incident is not automatically closing based on this update.

Currently, users are manually closing these incidents, and when the same collector goes down again, the Incident is reopening as expected. If I implement an auto-close mechanism, it should not interfere with this expected reopen behavior, even if the Incident is in a closed state.

Is there a recommended approach to configure these Incidents to auto-close when a specific work note is added by the Logic Monitor integration (for example, when the work notes states that the failover has been resolved and the collector is back online)?

Any guidance or best practices would be greatly appreciated. Thank you!

 

1 ACCEPTED SOLUTION

@lvenna 

 

Please see below for LogicMonitor documentation that explains how collectors are monitored and alerts are generated & cleared. I am not sure why your LM team are saying alerts will not be auto-cleared when that should be the case. You can request them to check with LogicMonitor support team to clarify on this if needed.

Bhuvan_0-1757518676518.png

There are multiple scenarios when it comes to collector failover, failback and related alerting configuration. I have extensively worked on Monitoring & Event Management tools and integration with ITSM Systems for different vendors. Collectors are remote agents that collects data from underlying Infra components on scheduled interval [5 minutes, 15 minutes etc.,] and collected data would be compared against thresholds defined for the monitored attributes. When threshold is breached, alert will be generated in the system and can be integrated with ITSM system for automated incidents and acted upon to restore the services & monitored components.

 

In Production environment, collectors would be configured in High Availability and sometimes HA + DR setup. When a Primary collector goes down, Secondary collector automatically picks up data collection jobs and starts collecting the data. In this example, typically there will be 2 alerts, one for Primary Collector going down and another for data Collection being impacted. Based on failover configuration, Secondary collector will start picking up data collection jobs and data collection will resume. By this time, data collection impacted alert should be auto-cleared and Primary collector down alert would remain open. When Primary collector is back online, related alert would be cleared and depending on your failback configuration, data collection will be either from Primary or Secondary collector. At a high level, this is the same mechanism for all the vendors for Monitoring and Event/Alert Management.

 

LogicMonitor - ServiceNow integration uses import set and transform map for the integration. 

 

https://store.servicenow.com/store/app/acdbabea1b246a50a85b16db234bcb15

 

Bhuvan_1-1757519808865.png

If LogicMonitor Product team confirms this is the expected behavior and your team has configured collector failover alerts correctly, you can try below approach

 

Identify the Transform Event Script that carries out incident updates and add a condition to check for the alert category or unique filter condition for collector failover alerts and work note contains collector is back online [these fields will be part of import set table] and update incident state as per your requirements. This will make sure you are not introducing additional script or Flow Designer Action outside your integration and will be handled as part of existing configurations.

 

I hope you appreciate the efforts to provide you with detailed information. As per community guidelines, you can accept more than one answer as accepted solution. If my responses helped to guide you or answer your query, please mark it helpful & accept the solution.

 

Thanks,

Bhuvan

View solution in original post

17 REPLIES 17

lvenna
Tera Contributor

@Bhuvan  

 

Thank you for the information. 

 

Logic Monitor team said this " for collector down, there is a chance that the resolution state may be a different value, IF a collector fail-back for any reason is not considered a clear" 

@lvenna 

 

I would recommend to analyse more on the collector failover alert from LogicMonitor and check the state change behavior in ServiceNow.

 

I do not see customizing closure via BR as the right option for this as it should be out of box behavior or via mapping configuration. If you customize using BR or scripts, it will create technical debt and is not best practices.

 

I believe this integration is handled via Flow, check how incident state transition happens and why collector failover alert do not update corresponding incident and work with LM team to fix the cause.

 

https://www.logicmonitor.com/support/collector-failover-and-failback

 

If this helped to answer your query, please mark it helpful & accept the solution. 

 

Thanks,

Bhuvan

lvenna
Tera Contributor

@Bhuvan  

 

We have some incidents are closing automatically when we receive the cleared notification (e.g., “CLEARED”). In these cases, the incident workflow functions as expected and the tickets close without manual intervention.

However, for incidents generated due to failover scenarios, we do not receive a “cleared” notification. Instead, the work notes only include messages such as “failed back.” The Logic Monitor team has suggested that we use this “failed back” message as a trigger to automatically close such incidents.

I explained that, ideally, the alert should close within Logic Monitor so that a cleared notification is sent, ensuring consistency in our incident closure process. Their response was that failover incidents are not designed to clear in Logic Monitor.

Could you please advise on the best way to handle these failover incidents? Specifically, how we can implement an automated closure mechanism based on the “failed back” message while ensuring it does not conflict with our existing incident lifecycle.

 

Thank you,

Laxma 

@lvenna 

I already responded with the approach and answered your subsequent question.

Did you get a chance to check that?

I believe I have provided enough info there with the approach.

If my response helped please mark it correct and close the thread so that it benefits future readers.

Regards,
Ankur
Certified Technical Architect  ||  9x ServiceNow MVP  ||  ServiceNow Community Leader

@lvenna 

 

Please see below for LogicMonitor documentation that explains how collectors are monitored and alerts are generated & cleared. I am not sure why your LM team are saying alerts will not be auto-cleared when that should be the case. You can request them to check with LogicMonitor support team to clarify on this if needed.

Bhuvan_0-1757518676518.png

There are multiple scenarios when it comes to collector failover, failback and related alerting configuration. I have extensively worked on Monitoring & Event Management tools and integration with ITSM Systems for different vendors. Collectors are remote agents that collects data from underlying Infra components on scheduled interval [5 minutes, 15 minutes etc.,] and collected data would be compared against thresholds defined for the monitored attributes. When threshold is breached, alert will be generated in the system and can be integrated with ITSM system for automated incidents and acted upon to restore the services & monitored components.

 

In Production environment, collectors would be configured in High Availability and sometimes HA + DR setup. When a Primary collector goes down, Secondary collector automatically picks up data collection jobs and starts collecting the data. In this example, typically there will be 2 alerts, one for Primary Collector going down and another for data Collection being impacted. Based on failover configuration, Secondary collector will start picking up data collection jobs and data collection will resume. By this time, data collection impacted alert should be auto-cleared and Primary collector down alert would remain open. When Primary collector is back online, related alert would be cleared and depending on your failback configuration, data collection will be either from Primary or Secondary collector. At a high level, this is the same mechanism for all the vendors for Monitoring and Event/Alert Management.

 

LogicMonitor - ServiceNow integration uses import set and transform map for the integration. 

 

https://store.servicenow.com/store/app/acdbabea1b246a50a85b16db234bcb15

 

Bhuvan_1-1757519808865.png

If LogicMonitor Product team confirms this is the expected behavior and your team has configured collector failover alerts correctly, you can try below approach

 

Identify the Transform Event Script that carries out incident updates and add a condition to check for the alert category or unique filter condition for collector failover alerts and work note contains collector is back online [these fields will be part of import set table] and update incident state as per your requirements. This will make sure you are not introducing additional script or Flow Designer Action outside your integration and will be handled as part of existing configurations.

 

I hope you appreciate the efforts to provide you with detailed information. As per community guidelines, you can accept more than one answer as accepted solution. If my responses helped to guide you or answer your query, please mark it helpful & accept the solution.

 

Thanks,

Bhuvan