How should we monitor MID Server?

Yuki1 · ‎08-21-2018

Hi

We installed MID server for Orchestration.

What kind of services (processes) are working on MID server?

We would like to know which service should be monitored because we need to detect the any issue on MID server.

(For monitoring MID server, we use TeMIP which is internal monitoring system)

Here is MID Server's inforamtion

OS is Windows 2012

Used for Only Orchestration (SSH/REST/Power shell)

Regards,

Yuki

AshishKM · ‎08-21-2018

Yuki , as you already using a monitoring tool additionally you can configure Email event on MID server status, any time if server is Down monitoring team will get Email notification.

Please mark this response as correct and helpful if it helps you can mark more that one reply as accepted solution

Valor1 · ‎08-21-2018

The MID Server has its own "heartbeat" and corresponding "MID Server Down" notification. OOB, the notification doesn't have any recipients, but it's there!

Per DOCS, the heartbeat is every 5 minutes.

To configure the DOWN notification, this link should take you right there (assuming you specify your instance name):

https://YOURINTSTANCE.service-now.com/nav_to.do?uri=sysevent_email_action.do?sys_id=f5e95ebadb4697804201f3d51d96197c

If that's not good enough for you As for MID Server *host* monitoring, you'll want to make sure that it's "up" (ignore CPU monitoring), and that the Java.exe process is ok --- the MID Server is just a java applet. Secondly, you'll want to make sure the host can call outbound to *.service-now.com:443

johnnyjava · ‎08-22-2018

The main issue I see with MID Server stability is related to Windows patching. Make sure your Windows admins restart the machine after they patch and you can avoid that.

Otherwise you could monitor for things like SEVERE ERROR in the agent/logs/agent0.log.0 file but they aren't always all that SEVERE. MID Servers in general have been fairly stable (in my experience) since at least Kingston.

tim_broberg · ‎08-22-2018

Yes, that heartbeat should catch any cases where the mid server is totally dead and mark the mid down.

The failure mode I've seen where the heartbeat continues but the mid is useless is where there are surviving threads in the priority 0 worker thread pool that the heartbeat uses, but the priority 2 worker thread pool that gets used for discovery and such have every thread stuck somewhere.

Primarily, I've seen that happen because of j2ssh bugs, and switching to sncssh helps a bunch.

I've did try customizing the heartbeat script include and business rule to send heartbeats down all priorities. It's messy, and it generates a blizzard of heartbeat commands, but it did help catch this class of failures until sncssh made them go away.

An alternative would be to watch the agent log for the periodic "LogStatusMonitor.60" messages which tell you how much traffic there is in which queues. You can figure out a lot about what's going on in the mid server from those.

- Tim.