Strange issue regarding Oracle Data Source and Mid Server

dante665 · ‎08-11-2016

Hi all,

like the topic says. We have some Data Sources that use JDBC to connect via Mid Server to Oracle Databases. Sometimes the connection failes, with no particular reason. So we test loaded records manually. Sometimes the connection fails 5 times in a row, also with different Data Sources. A few minutes later after ONE connection is succesfull all other Data Sources get also an connection.

Anyone else with symptoms like this ?

Our instances: Geneva Patch 6 Hotfix 2

Mid Server Logfile entry: 08/11/16 13:12:10 (582) Worker-Standard:JDBCProbeError Worker completed: JDBCProbeError source: Did not get a response from the MID server time: 0:00:00.002

jake_mckenna · ‎08-11-2016

I have seen something in the past that sound like this. Things to also check for might be errors such as:

ECCQueueMonitor.1 WARNING *** WARNING *** No probe registered to cancel for a given agent correlator

Or

ECCQueueMonitor.1 SEVERE *** ERROR *** java.lang.NullPointerException

java.lang.NullPointerException

If you and experience this again check to see if you find any CancelProbe jobs stuck in a ready state in the ECC Queue. If you do I would suggest changing the state from "ready" to "error." After that you should start to see the probes being pick up right away.

If the problem continues I would also check in with support to see if they can help you find a root cause.

dante665 · ‎08-11-2016

Hi Jake,

thanks for your response. I'll check this today.

andy_trayler · ‎09-30-2016

I have also identified this failure on the Helsinki release.

Within the MID server, I am seeing an exception as follows:

09/30/16 18:12:24 (726) ECCQueueMonitor.1 WARNING *** WARNING *** No probe registered to cancel for a given agent correlator 741881bbdb06ea4068c0d5f0cf961970

09/30/16 18:12:24 (727) ECCQueueMonitor.1 SEVERE *** ERROR *** java.lang.NullPointerException

java.lang.NullPointerException

at com.glideapp.ecc.ECCMessage.getNewRecord(ECCMessage.java:148)

at com.glideapp.ecc.ECCMessage.update(ECCMessage.java:117)

at com.service_now.mid.message_executors.CancelProbeExecutor.execute(CancelProbeExecutor.java:49)

at com.service_now.monitor.ECCQueueMonitor.executeMessage(ECCQueueMonitor.java:273)

at com.service_now.monitor.ECCQueueMonitor.processMessage(ECCQueueMonitor.java:197)

at com.service_now.monitor.ECCQueueMonitor.processMessages(ECCQueueMonitor.java:217)

at com.service_now.monitor.ECCQueueMonitor.run(ECCQueueMonitor.java:171)

at com.service_now.monitor.AMonitor.runit(AMonitor.java:145)

at com.service_now.monitor.AMonitor.access$200(AMonitor.java:39)

at com.service_now.monitor.AMonitor$MonitorTask.runMonitor(AMonitor.java:135)

at com.service_now.monitor.AMonitor$MonitorTask.run(AMonitor.java:115)

at java.util.TimerThread.mainLoop(Timer.java:555)

at java.util.TimerThread.run(Timer.java:505)

Once this error occurs, the mid server seems to generate the same message a large number times (roughly once per second for the next minute or so). The MID server then seems to accept heartbeats but not additional JDBC probe connection.

Checking the ECC queue identifies the cancel operation stays in ready (as it is not processed). As already identified setting the state to error or deleting the queue item (and potentially restarting the MID server) seems to clear the road block and allows further JDBC operations to proceed.

It would appear that the MID server does not handle long running or incomplete JDBC operations correctly and this creates a condition where eventually the MID server is no longer contactable and you get the 'MID server didn't respond' message.

Additional Notes: running Helsinki (will trying upgrading to Helsinki Patch 5 to see if there is any improvement). The JDBCprobe request is part of a "private" datasource request not related to discovery.

Inactive_use295 · ‎10-03-2016

We also experienced this issue in our Helsinki Test instance.

Build name: Helsinki

Build date: 08-18-2016_0855

Build tag: glide-helsinki-03-16-2016__patch3-hotfix2-08-17-2016

The corrective procedure provided a successful workaround.

The workaround did require a MID server restart (successfully triggered from the instance).

The MID server was not part of a cluster.