SlightlyLoony
Tera Contributor

As I heard the story, an administrator at one of our customers got a phone call from one of her network security folks. There was yelling involved, and she had to calm him down before she could get figure out what he was so excited about. It seems this fellow had been poring over the HTTP proxy logs (how bored would you have to be to do that for fun?) and he'd spotted a lot of SOAP transactions that he thought were suspicious. So he tracked 'em down, and figured out that they were queries from MID servers (which this organization has dozens of) back to the ServiceNow instance. Now this company had carefully organized their discoveries to occur in the middle of the night, so what the heck were the MID servers doing talking to the ServiceNow instance in the middle of the day? Especially at the rate of three such transactions per second? So he called our (much calmer) administrator — and she, in turn, called me. Why were we seeing all those transactions, she wanted to know. And could we stop them?

This question turned into a nice new feature for our MID servers.

Before I introduce the new feature, here's how things work without it: every 15 seconds, each MID server issues a small SOAP request to the instance, checking to see if the instance has any work for it to do. This polling for work is the technique we use to avoid the need for the ServiceNow instance to punch a hole in your firewall so that it can "push" work out to the MID server. It doesn't have to push, 'cause it knows that the MID server is going to look for work all by its lonesome.

When there is no work, this is a very lightweight query, with almost no impact on the instance (and actually, even where there is work for the MID server to do, it's still a very lightweight query). But if your organization has, say, 50 MID servers — those queries can add up pretty fast (you'd have 200 per minute, or over 3 per second). For most large enterprises, this isn't a particularly large amount of traffic — especially as generally those MID servers would be located in geographically diverse datacenters, and would be hitting the Internet through different edge routers. And the ServiceNow instance isn't worried by this load. But certain network security folks might be offended — and if you happened to have all 50 MID servers hitting the Internet through a single fractional T1 in someplace like Mynamar, it might actually be a significant bandwidth consumption (and useless, when the MID servers aren't doing any work).

So we've added a new feature we call "query backoff". If enabled, it does something very simple: when there hasn't been any work to do for a while on any given MID server, it starts waiting longer and longer between queries for work. Eventually it will slow down to one query every four minutes, or 1/16th the usual rate. If the MID server then picks up some work to do, it will immediately switch back to querying every 15 seconds. The only visible consequence of this is that once querying has fully backed off to a four minute interval, starting a new discovery can take up to four minutes (instead of the usual 15 seconds). Well, that and 1/16th as many queries. In the case I was called about, the queries went from 3 per second to 1 every 5 seconds, which made the excitable network security guy quite happy...

To enable query backoff, just navigate to any MID server (Discovery → MID Servers, then click on the one you want to modify) and show the Configuration Parameters related list. Look under Parameter names for query_backoff, adding it if necessary. Then set its Value to true. That's all there is to it!

2 Comments