Why MID Server showing down while service is still running?

RK25
Giga Guru

Hello - It's been several times now that one of our mid is showing down in the instance, last refreshed is not getting updated, queue is not being processed but when I logged into the server and checked the service, it's actually running. To clear this issue, I stopped the service and started it which automatically started the MID in the instance. Has anyone saw this behavior before? Thanks!

10 REPLIES 10

Thanks for the reply Ian. I even checked the agent0 log but nothing seems to be concerning.

If the logs don't show anything of concern, then I would lean towards the probe not responding in time and giving a "false" down alert.

Exactly as Ian said. You will see heartbeat outputs in the ecc_queue. Are you seeing heartbeat inputs come back within 5 minutes? If not, it will get marked down.

The next step to troubleshoot is generally to identify the sys_id of the ecc_queue output record for the heartbeat, and search for that in the MID server logs/agent0.log.* files to find what the mid did with this probe. You can then tell if it ever got started at the mid, or if it started late due to congestion, or if it completed and is sitting in the queue waiting to come back, or whatever.

RK25
Giga Guru

Thanks Tim for the response. I checked the log around same time when the first hearbeat probe sent out but no input after that..There is something unusual going on for sure. I Appreciate any insight.

05/20/22 05:02:18 (549) Worker-Standard:PowershellProbe-49e4b98e87270150d7d58409dabb35f4 WARNING *** WARNING *** PowerConsole already terminated : PowerConsole session was lost while executing command: exit
05/20/22 05:04:59 (501) Worker-Standard:JavascriptProbe-50e4798e87270150d7d58409dabb3567 Slow execution (6202ms) of script: probe:check_priv_command
05/20/22 05:05:45 (740) Worker-Standard:PowershellProbe-f4e4b98e87270150d7d58409dabb35c6 SEVERE *** ERROR *** Exception during executeCommand, keeping PowerConsole in busy state
com.snc.automation_common.integration.exceptions.CommandTimeoutException: Command [$env:PSModulePath = $env:PSModulePath + ";D:\ServiceNow MID Server Prod Discovery\agent/scripts/Powershell/WinRMAPI"] timed out after PT2M
	at com.service_now.mid.win.powershell.api.PowerConsole.executeCurrentCommand(PowerConsole.java:272)
	at com.service_now.mid.win.powershell.api.PowerConsole.executeCommand(PowerConsole.java:218)
	at com.service_now.mid.win.powershell.api.PowerConsole.execute(PowerConsole.java:190)
	at com.service_now.mid.win.powershell.api.PowerConsole.init(PowerConsole.java:458)
	at com.service_now.mid.win.powershell.api.PowerConsole.<init>(PowerConsole.java:150)
	at com.service_now.mid.win.powershell.api.PowerConsole.<init>(PowerConsole.java:92)
	at com.service_now.mid.win.powershell.api.PowerConsole.<init>(PowerConsole.java:88)
	at com.service_now.mid.win.powershell.api.APowershellSession.<init>(APowershellSession.java:100)
	at com.service_now.mid.win.powershell.api.LocalPowerShellSession.<init>(LocalPowerShellSession.java:21)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionFactory.getUnauthorizedConnection(PowerShellConnectionFactory.java:50)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionFactory.getUnauthorizedConnection(PowerShellConnectionFactory.java:24)
	at com.snc.core_automation_common.util.AKeyedConnectionFactory.getConnection(AKeyedConnectionFactory.java:152)
	at com.snc.core_automation_common.util.AKeyedConnectionFactory.getConnection(AKeyedConnectionFactory.java:145)
	at com.snc.core_automation_common.util.AKeyedConnectionFactory.getConnection(AKeyedConnectionFactory.java:133)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionPoolFactory.createConnection(PowerShellConnectionPoolFactory.java:50)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionPoolFactory.createConnection(PowerShellConnectionPoolFactory.java:18)
	at com.snc.core_automation_common.util.AConnectionPoolFactory.create(AConnectionPoolFactory.java:27)
	at com.snc.core_automation_common.util.AConnectionPoolFactory.create(AConnectionPoolFactory.java:14)
	at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1041)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:357)
	at com.snc.core_automation_common.util.AKeyedConnectionPool.borrowConnection(AKeyedConnectionPool.java:51)
	at com.service_now.mid.probe.AWmiFetchData.probe(AWmiFetchData.java:67)
	at com.service_now.mid.probe.AProbe.process(AProbe.java:106)
	at com.service_now.mid.queue_worker.AWorker.runWorker(AWorker.java:129)
	at com.service_now.mid.probe.PowershellProbe.runWorker(PowershellProbe.java:77)
	at com.service_now.mid.queue_worker.AWorkerThread.run(AWorkerThread.java:20)
	at com.service_now.mid.threadpool.ResourceUserQueue$RunnableProxy.run(ResourceUserQueue.java:649)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

05/20/22 05:09:46 (781) Worker-Standard:MultiProbe-f3c4f94e476b0d10bc72b37c346d436a SEVERE *** ERROR *** Exception during executeCommand, keeping PowerConsole in busy state
com.snc.automation_common.integration.exceptions.CommandTimeoutException: Command [mode con lines=1 cols=9999] timed out after PT2M
	at com.service_now.mid.win.powershell.api.PowerConsole.executeCurrentCommand(PowerConsole.java:272)
	at com.service_now.mid.win.powershell.api.PowerConsole.executeCommand(PowerConsole.java:218)
	at com.service_now.mid.win.powershell.api.PowerConsole.execute(PowerConsole.java:190)
	at com.service_now.mid.win.powershell.api.PowerConsole.init(PowerConsole.java:458)
	at com.service_now.mid.win.powershell.api.PowerConsole.<init>(PowerConsole.java:150)
	at com.service_now.mid.win.powershell.api.PowerConsole.<init>(PowerConsole.java:92)
	at com.service_now.mid.win.powershell.api.PowerConsole.<init>(PowerConsole.java:88)
	at com.service_now.mid.win.powershell.api.APowershellSession.<init>(APowershellSession.java:100)
	at com.service_now.mid.win.powershell.api.LocalPowerShellSession.<init>(LocalPowerShellSession.java:21)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionFactory.getUnauthorizedConnection(PowerShellConnectionFactory.java:50)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionFactory.getUnauthorizedConnection(PowerShellConnectionFactory.java:24)
	at com.snc.core_automation_common.util.AKeyedConnectionFactory.getConnection(AKeyedConnectionFactory.java:152)
	at com.snc.core_automation_common.util.AKeyedConnectionFactory.getConnection(AKeyedConnectionFactory.java:145)
	at com.snc.core_automation_common.util.AKeyedConnectionFactory.getConnection(AKeyedConnectionFactory.java:133)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionPoolFactory.createConnection(PowerShellConnectionPoolFactory.java:50)
	at com.service_now.mid.win.powershell.api.PowerShellConnectionPoolFactory.createConnection(PowerShellConnectionPoolFactory.java:18)
	at com.snc.core_automation_common.util.AConnectionPoolFactory.create(AConnectionPoolFactory.java:27)
	at com.snc.core_automation_common.util.AConnectionPoolFactory.create(AConnectionPoolFactory.java:14)
	at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1041)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.reuseCapacity(GenericKeyedObjectPool.java:833)
	at org.apache.commons.pool2.impl.GenericKeyedObjectPool.returnObject(GenericKeyedObjectPool.java:562)
	at com.snc.core_automation_common.util.AKeyedConnectionPool.returnConnection(AKeyedConnectionPool.java:78)
	at com.service_now.mid.probe.AWmiFetchData.probe(AWmiFetchData.java:91)
	at com.service_now.mid.probe.AProbe.process(AProbe.java:106)
	at com.service_now.mid.queue_worker.AWorker.runWorker(AWorker.java:129)
	at com.service_now.mid.probe.PowershellProbe.runWorker(PowershellProbe.java:77)
	at com.service_now.mid.queue_worker.AWorkerThread.run(AWorkerThread.java:20)
	at com.service_now.mid.message_executors.SubProbeTask.call(SubProbeTask.java:27)
	at com.service_now.mid.probe.MultiProbe.probe(MultiProbe.java:79)
	at com.service_now.mid.probe.AProbe.process(AProbe.java:106)
	at com.service_now.mid.queue_worker.AWorker.runWorker(AWorker.java:129)
	at com.service_now.mid.queue_worker.AWorkerThread.run(AWorkerThread.java:20)
	at com.service_now.mid.threadpool.ResourceUserQueue$RunnableProxy.run(ResourceUserQueue.java:649)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

05/20/22 05:09:04 (962) LogStatusMonitor.60 stats threads: 260, memory max: 910.0mb, allocated: 887.0mb, used: 772.0mb, standard.queued: 0 probes, standard.processing: 53 probes, expedited.queued: 0 probes, expedited.processing: 0 probes, interactive.queued: 0 probes, interactive.processing: 0 probes
05/20/22 05:08:53 (274) ProbeReaper Probe reaper interrupted following thread: Worker-Standard:MultiProbe-09d4f18e87270150d7d58409dabb35f9 id: 93
05/20/22 05:08:43 (132) Worker-Standard:JavascriptProbe-50e4798e87270150d7d58409dabb3567 Slow execution (441451ms) of script: probe:SSHTerminalInteractiveCommand
05/20/22 05:07:11 (983) Worker-Standard:PowershellProbe-2eb4b50a97af491082fbff7c1253af24 SEVERE *** ERROR *** Exception during executeCommand, keeping PowerConsole in busy state
com.snc.automation_common.integration.exceptions.CommandTimeoutException: Command [Get-ChildItem "D:\ServiceNow MID Server Prod Discovery\agent\scripts\PowerShell\WMIFetch.psm1" -Filter *.psm1  | ForEach-Object { If (-Not (Get-Module -Name $_.BaseName)) { Import-Module -Name $_.FullName -Verbose -NoClobber } }] timed out after PT5M
	at com.service_now.mid.win.powershell.api.PowerConsole.executeCurrentCommand(PowerConsole.java:272)
	at com.service_now.mid.win.powershell.api.PowerConsole.executeCommand(PowerConsole.java:218)
	at com.service_now.mid.win.powershell.api.PowerConsole.execute(PowerConsole.java:190)
	at com.service_now.mid.win.powershell.api.APowershellSession.executeWithoutResultExtraction(APowershellSession.java:372)
	at com.service_now.mid.win.powershell.api.APowershellSession.execute(APowershellSession.java:352)
	at com.service_now.mid.win.powershell.api.APowershellSession.importModule(APowershellSession.java:258)
	at com.service_now.mid.win.powershell.api.APowershellSession.importModule(APowershellSession.java:240)
	at com.service_now.mid.probe.AWmiFetchData.probe(AWmiFetchData.java:76)
	at com.service_now.mid.probe.AProbe.process(AProbe.java:106)
	at com.service_now.mid.queue_worker.AWorker.runWorker(AWorker.java:129)
	at com.service_now.mid.probe.PowershellProbe.runWorker(PowershellProbe.java:77)
	at com.service_now.mid.queue_worker.AWorkerThread.run(AWorkerThread.java:20)
	at com.service_now.mid.threadpool.ResourceUserQueue$RunnableProxy.run(ResourceUserQueue.java:649)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

05/20/22 05:07:04 (310) SSHClientEngine WARNING *** WARNING *** Unexpectedly high SSHClientEngine select polling period: 107.0s.

@tim.broberg any insight after looking into the log? Appreciate the response!