Strange issue regarding Oracle Data Source and Mid Server

dante665
Kilo Contributor

Hi all,

like the topic says. We have some Data Sources that use JDBC to connect via Mid Server to Oracle Databases. Sometimes the connection failes, with no particular reason. So we test loaded records manually. Sometimes the connection fails 5 times in a row, also with different Data Sources. A few minutes later after ONE connection is succesfull all other Data Sources get also an connection.

Anyone else with symptoms like this ?

Our instances: Geneva Patch 6 Hotfix 2

Mid Server Logfile entry: 08/11/16 13:12:10 (582) Worker-Standard:JDBCProbeError Worker completed: JDBCProbeError source: Did not get a response from the MID server time: 0:00:00.002

12 REPLIES 12

jake_mckenna
ServiceNow Employee
ServiceNow Employee

I have seen something in the past that sound like this. Things to also check for might be errors such as:



ECCQueueMonitor.1 WARNING *** WARNING *** No probe registered to cancel for a given agent correlator



Or



ECCQueueMonitor.1 SEVERE *** ERROR *** java.lang.NullPointerException


java.lang.NullPointerException






If you and experience this again check to see if you find any CancelProbe jobs stuck in a ready state in the ECC Queue. If you do I would suggest changing the state from "ready" to "error."   After that you should start   to see the probes being pick up right away.



If the problem continues I would also check in with support to see if they can help you find a root cause.


Hi Jake,



thanks for your response. I'll check this today.


I have also identified this failure on the Helsinki release.



Within the MID server, I am seeing an exception as follows:



09/30/16 18:12:24 (726) ECCQueueMonitor.1 WARNING *** WARNING *** No probe registered to cancel for a given agent correlator 741881bbdb06ea4068c0d5f0cf961970


09/30/16 18:12:24 (727) ECCQueueMonitor.1 SEVERE *** ERROR *** java.lang.NullPointerException


java.lang.NullPointerException


  at com.glideapp.ecc.ECCMessage.getNewRecord(ECCMessage.java:148)


  at com.glideapp.ecc.ECCMessage.update(ECCMessage.java:117)


  at com.service_now.mid.message_executors.CancelProbeExecutor.execute(CancelProbeExecutor.java:49)


  at com.service_now.monitor.ECCQueueMonitor.executeMessage(ECCQueueMonitor.java:273)


  at com.service_now.monitor.ECCQueueMonitor.processMessage(ECCQueueMonitor.java:197)


  at com.service_now.monitor.ECCQueueMonitor.processMessages(ECCQueueMonitor.java:217)


  at com.service_now.monitor.ECCQueueMonitor.run(ECCQueueMonitor.java:171)


  at com.service_now.monitor.AMonitor.runit(AMonitor.java:145)


  at com.service_now.monitor.AMonitor.access$200(AMonitor.java:39)


  at com.service_now.monitor.AMonitor$MonitorTask.runMonitor(AMonitor.java:135)


  at com.service_now.monitor.AMonitor$MonitorTask.run(AMonitor.java:115)


  at java.util.TimerThread.mainLoop(Timer.java:555)


  at java.util.TimerThread.run(Timer.java:505)



Once this error occurs, the   mid server seems to generate the same message a large number times (roughly once per second for the next minute or so). The MID server then seems to accept heartbeats but not additional JDBC probe connection.



Checking the ECC queue identifies the cancel operation stays in ready (as it is not processed).   As already identified setting the state to error or deleting the queue item (and potentially restarting the MID server) seems to clear the road block and allows further JDBC operations to proceed.



It would appear that the MID server does not handle long running or incomplete JDBC operations correctly and this creates a condition where eventually the MID server is no longer contactable and you get the 'MID server didn't respond' message.



Additional Notes: running Helsinki   (will trying upgrading to Helsinki Patch 5 to see if there is any improvement). The JDBCprobe request is part of a "private" datasource request not related to discovery.


We also experienced this issue in our Helsinki Test instance.


Build name: Helsinki


Build date: 08-18-2016_0855


Build tag: glide-helsinki-03-16-2016__patch3-hotfix2-08-17-2016



The corrective procedure provided a successful workaround.


The workaround did require a MID server restart (successfully triggered from the instance).


The MID server was not part of a cluster.