- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-02-2022 01:24 PM
We had setup notifications in our system to let us know when the MID server status is down or up. The problem with this is that we're seeing the MID server go down and come back up within the same second in almost every case. ex. 10:42:13 the server goes down and then comes back up at 10:42:13 as well. We talked to HI about this, sent them our logs and they confirmed that they weren't seeing any issues in the logs.
So we got to thinking that a better approach would be to check and see if the mid server is down for at least 15 minutes, and then send a notification so that we can check on it. I found the article below that briefly talks about this:
The issue I'm running into, is I'm not exactly sure how to setup a scheduled job to call the function in that script. I'm wondering if anyone out there has done this before and would be willing to share their setup? Or, if there is a better way to do this than a scheduled job, I'm open to options to try to get legitimate downtime notifications on MID server.
Solved! Go to Solution.
- Labels:
-
MID Server
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-04-2022 03:52 PM
I love a good challenge!
So to start, my Script Action had one flaw. I messed up my math. 8000 seconds is 533 minutes. Needless to say, that notification will not be executing any time soon.
Here is what I have now for the Script Action:
var mid_name = "MIDServer1Sandbox";
mid_server_down(mid_name);
function mid_server_down(name) {
var fifteen_minutes = new GlideDateTime();
fifteen_minutes.addSeconds(100);
//fifteen_minutes.addSeconds(1);
var gr = new GlideRecord("ecc_agent");
gr.addQuery("name", name);
gr.addQuery("status", "Down");
gr.query();
if (gr.next()) {
gs.eventQueueScheduled('MIDServerDownNotification', gr, '', '', fifteen_minutes);
}
}
I used 100 seconds for my test, but modify to 900 for 15 minutes. I double checked that number this time too 🙂
I then modified the Notification to use a self-invoking function. Now it does work:
(function () {
var gr_mid = new GlideRecord("ecc_agent");
if (gr_mid.get(current.sys_id)) {
if (gr_mid.status == "Down") {
return true;
}
}
return false;
}) ();
By using the Advanced Conditions, I think we are able to do a real time lookup (when the notification is processed). I think using the Conditions it will be based on whatever the passed in object has.
I am not sure if the notification advanced conditions can invoke a function. Perhaps that's why the self-invoking function works.
Can you try the above and see if it now works for you?
Edit: In case it's not clear, this is how the self-invoking function looks:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-04-2022 02:40 PM
Because this is baffling me, I'd like to share what I had setup yesterday. When the mid_server.down event occurs, my script action is called.
The script action triggers the newly created event called MIDServerDownNotification and schedules the event for 1 minute into the future (for testing purposes).
When the MIDServerDownNotification event finally fires, the notification checks to validate that the status is not up (this way if it's anything but up, we'll get notified), and sends the notification.
Utilizing this method works every single time testing both with the MID server in an Up and Down status (doesn't send and sends respectively). If I leave the notification the same but change the code in the script action to what you provided, the notification isn't sent. If I remove the condition and add the advanced condition you provided, the notification doesn't send. But if I remove all conditions, the notification will send. That's the part that doesn't make sense to me. I don't get what the code in the script action has to do with the notification once the event MIDServerDownNotification is triggered. Maybe you have some thoughts on that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-04-2022 03:52 PM
I love a good challenge!
So to start, my Script Action had one flaw. I messed up my math. 8000 seconds is 533 minutes. Needless to say, that notification will not be executing any time soon.
Here is what I have now for the Script Action:
var mid_name = "MIDServer1Sandbox";
mid_server_down(mid_name);
function mid_server_down(name) {
var fifteen_minutes = new GlideDateTime();
fifteen_minutes.addSeconds(100);
//fifteen_minutes.addSeconds(1);
var gr = new GlideRecord("ecc_agent");
gr.addQuery("name", name);
gr.addQuery("status", "Down");
gr.query();
if (gr.next()) {
gs.eventQueueScheduled('MIDServerDownNotification', gr, '', '', fifteen_minutes);
}
}
I used 100 seconds for my test, but modify to 900 for 15 minutes. I double checked that number this time too 🙂
I then modified the Notification to use a self-invoking function. Now it does work:
(function () {
var gr_mid = new GlideRecord("ecc_agent");
if (gr_mid.get(current.sys_id)) {
if (gr_mid.status == "Down") {
return true;
}
}
return false;
}) ();
By using the Advanced Conditions, I think we are able to do a real time lookup (when the notification is processed). I think using the Conditions it will be based on whatever the passed in object has.
I am not sure if the notification advanced conditions can invoke a function. Perhaps that's why the self-invoking function works.
Can you try the above and see if it now works for you?
Edit: In case it's not clear, this is how the self-invoking function looks:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-06-2022 11:34 AM
That fixed it!
Everything is working in the way that we expect it to. I actually added some code to the notification so that when the server is still down (and the notification is being sent) an incident will also be created. Going back to the beginning, it's much better that we are triggering this off of an event vs a scheduled job because it allows us us to create an incident, and not have to worry about additional ones being created every time the scheduled job runs. We're only doing it when the server goes down and notification is triggered.
Thank you for all of the assistance on this!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-06-2022 01:00 PM
Please keep in mind that a MID Server being "down" doesn't specifically mean the service has stopped or the host is down. If anything prevents the response from the heartbeat probe (runs every 5 minutes) the instance will report that the MID Server is "down".
Also, as another possible thing to rule out; check out KB1116171 if your MID's are running Discovery as there are some known memory leaks that are leading to crashed MID's.