Antony_Alldis
ServiceNow Employee

Recently, several customers were affected by an underlying issue that caused MID servers to crash, and appear "down" on the instance. For example, in Mid Servers Disconnecting since Fuji upgrade   the user reports that the services on their servers are still running but are reported as being down, only upon restart do they reconnect. You could be affected by the MID server crashing issue if you are on certain patches in Eureka, Fuji and Geneva.

If one reviewed the <path_to_MIDserver>\agent\logs\agent0.log.0 file, you would see errors like this:

12/17/15 07:16:38 (014) StartupSequencer WARNING *** WARNING *** Socket timeout&#13;

12/17/15 07:16:38 (014) StartupSequencer WARNING *** WARNING *** Update failed (Socket timeout)&#13;

12/17/15 07:16:38 (170) LogStatusMonitor.60 stats threads: 26, memory max: 508.0mb, allocated: 90.0mb, used: 29.0mb, queued: 0 probes, processing: 0 probes&#13;

12/17/15 07:16:38 (311) RefreshMonitor.65 WARNING *** WARNING *** Method failed: (https://instance.service-now.com/ecc_agent_property.do?SOAP&amp;displayvalue=all&amp;redirectSupported=true)HTTP/1.1 202 Accepted with code: 202&#13;

12/17/15 07:16:38 (311) RefreshMonitor.65 SEVERE *** ERROR *** getRecords failed (Method failed: (https://instance.service-now.com/ecc_agent_property.do?SOAP&amp;displayvalue=all&amp;redirectSupported=true)HTTP/1.1 202 Accepted with code: 202)&#13;

12/17/15 07:16:38 (311) RefreshMonitor.65 SEVERE *** ERROR *** Failed to load remote properties: Method failed: (https://instance.service-now.com/ecc_agent_property.do?SOAP&amp;displayvalue=all&amp;redirectSupported=true)HTTP/1.1 202 Accepted with code: 202&#13;

12/17/15 07:16:38 (358) ECCQueueMonitor.15 WARNING *** WARNING *** Method failed: (https://instance.service-now.com/ecc_queue.do?SOAP&amp;displayvalue=all&amp;redirectSupported=true)HTTP/1.1 202 Accepted with code: 202&#13;

This would be accompanied by the following type of message in the Application Logs on the instance:

2015-12-17 07:53:27 (349) http-44 WARNING *** WARNING *** GlideRequestManager: Request: /ecc_agent.do, run time: 175669, waiters: 0
2015-12-17 07:53:27 (349) http-23 WARNING *** WARNING *** GlideRequestManager: Request: /ecc_agent.do, run time: 175669, waiters: 0
2015-12-17 07:53:27 (355) http-23 SYSTEM WARNING *** WARNING *** GlideRequestManager: Request ignored: /ecc_queue.doSOAP&displayvalue=all&redirectSupported=true
2015-12-17 07:53:27 (355) http-44 SYSTEM WARNING *** WARNING *** GlideRequestManager: Request ignored: /ecc_mi.doSOAP&displayvalue=all&redirectSupported=true
2015-12-17 07:53:27 (416) http-21 WARNING *** WARNING *** GlideRequestManager: Request: /ecc_agent.do, run time: 175736, waiters: 0
2015-12-17 07:53:27 (421) http-21 SYSTEM WARNING *** WARNING *** GlideRequestManager: Request ignored: /ecc_agent_property.doSOAP&displayvalue=all&redirectSupported=true

Upgrading the version of Tomcat (Apache Tomcat 7.0.64 (Orbit 7.2.0-2)) has occasionally reduced the number of occurrences, but it is not foolproof, and has not worked in the majority of cases, as the underlying issue still exists in the core code. Please Note: ServiceNow will no longer upgrade Apache Tomcat to attempt to resolve this issue.

Due to the nature of this issue, ServiceNow recommends upgrading to one of the fixed in releases mentioned in ServiceNow KB: MID Server stops communicating to the instance and continuously throws socket timeout...

  • Fuji Patch 13
  • Geneva Patch 7
  • Helsinki Patch 1

For more MID Server solutions, troubleshooting demos, and implementation documentation see ServiceNow KB: Discovery and MID Server Resources Page (KB0540193).

Comments
Kilo Guru

Thank you for this - my logs are full of these messages and my MID Servers have been flaky. GP3-HF2


Kilo Guru

For various reasons we are unable to immediately upgrade to a stable version to avoid this issue. I was able to leverage my Java Logging skills to devise a means for the MID Instances to self heal from this issue.



Edit the agent/properties/glide.properties to modify these lines:



glide.log.handlers=java.util.logging.FileHandler, java.util.logging.ConsoleHandler


java.util.logging.FileHandler.level=SEVERE


java.util.logging.ConsoleHandler.level=SEVERE



Edit the agent/conf/wrapper-override.conf to add these lines:



wrapper.filter.trigger.1=SEVERE *** ERROR *** getRecords failed


wrapper.filter.action.1=RESTART


wrapper.filter.message.1=PRB646966: MID Server has stopped communicating to the instance. Restarting it with wrapper.



Stop and Start the service to pick up the changes.



Now the wrapper program will watch for the getRecords error and issue a restart on the child process, which is the only work around listed in the problem record referenced above.



Something along these lines will appear in your wrapper.log when it happens:



2016/06/22 19:18:42 | 06/22/16 19:18:42 (311) RefreshMonitor.65 SEVERE *** ERROR *** getRecords failed (Socket timeout)


2016/06/22 19:18:42 | PRB646966: MID Server has stopped communicating to the instance. Restarting it with wrapper.   Restarting JVM.


2016/06/22 19:18:42 | 06/22/16 19:18:42 (311) RefreshMonitor.65 SEVERE *** ERROR *** Failed to load remote properties: Socket timeout


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner Running under Java version: 1.8.0_60, java PID: 9412, args: stop


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner Stopping MID server


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner Destroying injector...


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner Closing com.service_now.monitor.PriorityThreadPoolProvider


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner Shutting down ThreadPool-Interactive


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner Shutting down ThreadPool-Standard


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner Shutting down ThreadPool-Expedited


2016/06/22 19:18:42 | 06/22/16 19:18:42 (467) WrapperListener_stop_runner ThreadPool-Interactive terminated


2016/06/22 19:19:19 | Shutdown failed: Timed out waiting for signal from JVM.


2016/06/22 19:19:20 | JVM did not exit on request, terminated


2016/06/22 19:19:25 | Launching a JVM...



Obviously patching / upgrading is more beneficial, but the wrapper program bundled with the MID Instances is very powerful and can be used as a band-aid until you can schedule your update.