Shazzam probe being stuck

Paul125 · ‎05-15-2018

Hello, we have weekly discovery schedules set up in place and have been working for years. All of sudden these schedules being canceled it self due to time out. after digging further, we found that 'shazzam' probe being sent out but 'processed' state is showing processed but timestamp showing empty and 'updated on' showing discovery end time like below. Has anyone experienced before? Thanks

Paul125 · ‎07-11-2018

Restarting the MID solved my issue.

View solution in original post

DaveHertel · ‎05-17-2018

Ah, good to know it was a mid server issue vs. an external issue (like a network change, etc.). But without knowing what the root cause of the issue, it'd be difficult to offer advice on how to prevent an unclear problem.

I'd consider using the Discovery Dashboard to keep an eye on job time duration. I don't know what version your instance is, but in Jakarta & Kingston the disco dashboard could give some insight into long-running jobs. .. not a perfect solution -- but lends some visibility ....

Dave

Paul125 · ‎05-17-2018

Thanks Dave.

tim_broberg · ‎05-17-2018

I've had a few issues with mids getting stuck where they respond only to SystemCommands and heartbeats.

This is my tool to figure out what's going on without having to squint at heap dumps all day.

This mid server script include captures the state of the relevant variables.

var MidState = Class.create();

(function() {
MidState.prototype = Object.extendsObject(AProbe, {
    initialize : function(probe) {
        this.probe = probe;
        this.output = "";
    },

	run : function() {
		var mid = Packages.com.service_now.mid.MIDServer.get();
		var validityChecker = Packages.com.service_now.mid.security.MIDValidityChecker.get();
		this.output = 'Paused: ' + mid.isPaused() + ', Operational State: ' + mid.getOperationalState() + ', Valid: ' + validityChecker.getIsValid();
		this.createOutputResult(this.output);
	},

	type: "MidState"
});

})();

I have a Probe named MidState with topic = JavascriptProbe and ECC queue name = MidState.

I run Test probe on the probe with the mid server I need to investigate and it reports back what the internal state variables of the mid are.

Known behaviors I'm currently watching for in our environment:

Normal: Paused: false, Operational State: UP, Valid: true
PRB1250701: Paused: false, Operational State: UP, Valid: false
PRB1247167: Paused: true, Operational State: UPGRADE_CHECK, Valid: true

PRB1250701 will clear up if you pause and unpause the mid server. I think the other one might also.

PRB750509 is the other big one, where network glitches during credential reload cause it to stick forever. This is fixed in K / JP3 / IP9. A credential reload will clear it up, which you can do by touching any credential.

Hope it helps,

- Tim.

Paul125 · ‎05-17-2018

Great! Thanks Tim. I hope you know about this.. issue has been resolved after restarting the MID servers from the host.

tim_broberg · ‎05-17-2018

Yeah, I wrote this thing the last time I just paused and unpaused one to make it go because I was too lazy to root-cause it.

Naturally, I haven't had any such problems since I wrote the tool. x^D