SlightlyLoony
Tera Contributor

Recently Elaine, a system administrator at a company that uses our Discovery product, got to wondering about SSH connections. She noticed that while Discovery sends quite a few probes to her UNIX and Linux systems, usually there was only one login to each system in a Discovery run. Reviewing her system logs, she could see that the MID server really was making different requests at different times, corresponding to the different probes. Usually these probes ran over the course of a few minutes, not all one right after the other.

Yet somehow the MID server was doing it all with just one login to her systems. Every once in a while, though, she noticed that the MID server would log into her systems twice or even three times during the course of a single Discovery run. While this all worked, Elaine didn't like the fact that it was inconsistent — and that she just plain didn't understand the logic. So she called me and found out about the MID server's SSH connection caching — and now she's all smiles...

Without SSH connection caching (which you can turn off with the MID Server SSH connection cache MID server configuration parameter), Discovery behaves exactly as Elaine had initially expected. If you're not familiar with MID server configuration parameters, you can find them by navigating to MID Servers → Servers, then look at the Configuration Parameters related list. What Elaine expected is that the MID server would log into the target UNIX or Linux system for each individual probe that it runs. This works just fine (in fact, this was the only option in Discovery for years), but there were a couple of consequences that occasionally gave people trouble:

  1. All those logins meant extra work for the MID server and the target system (potentially including some CPU-intensive cryptography) and extra "round trips" on the network.
  2. It was possible for a MID server to be running multiple probes to the same target concurrently (in different threads), which could result in more than the allowed number concurrent connections (depending on the target system's sshd configuration) between the MID server and the target system. This would result in one or more probes failing for mysterious reasons.

SSH connection caching solves these problems. Here's how it works:
  • The first time a MID server gets an SSH probe to a given target system, it logs in normally.
  • That first probe executes normally — but it doesn't log out or close the connection.
  • The MID server then remembers in memory (this is the caching part) the details of that still-open connection. The MID server will keep this cached connection for up to two minutes. If no other SSH probe for that target comes along within that period, the MID server will close the connection and "forget" the cached connection.
  • The next SSH probe that comes along for that same target system will first look in the cached connections to see if there's a remembered connection that's not being used by a probe in another thread. If so, it claims that connection and executes the probe using it. This also resets the two minute timer. If there is no cached connection, then the MID server will log in and establish a new connection to use. If there was a cached connection, but another thread was using it, then either (a) the probe will wait for the other thread to finish its work, then claim its connection, or (b) it will log in and establish a new connection to use.

There's one more MID server configuration parameter that controls the last behavior described above: the MID Server SSH connections per host parameter. By default this is set to 3, which means that the MID server is allowed to have up to 3 concurrent connections to the same target system. We picked 3 because it's the smallest number we've ever seen in the field, in our customers' sshd configurations. You can change this to either a larger or smaller number, as suits your environment. In general, a higher setting will give you higher Discovery performance, though the effect is generally quite small. Most Linux systems default to allowing 10 concurrent connections, but we've seen Solaris configurations varying from 3 to 20.

The behavior that puzzled Elaine is why Discovery didn't behave the same way with every target system. She didn't notice that it also may behave differently from one Discovery run to another. Those differences are caused by the happenstance of probe timing. So long as all the probes for any particular target system run within two minutes of each other, the cached connection will be reused and no further log in activity is required. If, for whatever reason, more than two minutes elapses between any two probes, the MID server will log in again. That simple fact explained every one of her observations. Mystery solved!