SSHCommand: Cannot connect, status is TCP_CONNECTION_FAILURE.

mohitrao2007 · ‎10-02-2018

Hi All,

I am trying to Run SSH Command (hostname) using Custom SSH Activity. Only Credential tag, command and target is added in the activity. When I click test inputs, I am getting below error.

errorMessages": "SSHCommand: Cannot connect, status is TCP_CONNECTION_FAILURE. Failed to connect TCP: Connection timed out: no further information: \n",

I have tested the Credentials to run on same MID Server ( I have only one MID Server and that is the Default Server for Orchestration), target server and Port 22 and the credentials are Validated. I have also tried using snc probe and triggered the same commands using same tag and it is working fine. Only Orchestration activity is showing the error.

Can anyone please help!

tim_broberg · ‎10-02-2018

Well, that's just really odd.

There is a problem where we just randomly get TCP connect issues (PRB1297648), but it's fairly rare and really sporadic. This doesn't sound sporadic at all.

I have seen a case where a workflow triggered discovery, the port scan of discovery triggered a security tool (OSSEC) which put up an iptables rule to block the mid server for 10 minutes, then the workflow couldn't get in anymore. If there's a discovery in your workflow, this is probably it. If not...

Some experiments to try:

Try try again - Wait 10 minutes from your previous activity run. Go find the ecc_queue output with the probe from your activity. Rerun it. Does the ecc_queue input reproduce the problem? If not, it just might be OSSEC-like activity that kept you out before. In that case, try running the activity again, watch it fail, and immediately try to connect to the target from your usual ssh client on the mid. If that fails, go look at the iptables rules on the target for specific persecution of your mid server.
Scissors - Squint (menacingly) at the ecc_queue record from your activity and the one from your SncProbe experiment. What's different? Tweak the SncProbe one to be more like the activity one in the most obvious difference and see if it starts failing. Keep making them more similar until the one that works fails or the one that fails works. (If you get them to be the same and the one that works still works and the one that fails still fails, you are entitled to an immediate long lunch break with beverage of your choice.) Once you identify what the discriminant is between the two, think really hard about why that would matter.
Wall of text - Set mid.ssh.debug = true on your mid server. Reproduce the failure. Go to agent/logs/agent0.log.0, search for "Using SNC SSH" until you get to a time stamp that matches when you reran your probe, and wade in. How far do we get before the TCP error? One would think not far at all, but if we start getting protocol strings and kexinit, then it's weirder than we think.
Reload - Reload the credentials. We're getting into desperation measures by now. Touch any credential on the instance or send a SystemCommand probe to the mid with source = credentials_reload.
Restart - Just restart the mid. If you're feeling kind, collect a JVM heap dump first. If this resolves the problem, we may be able to figure out what it was after the fact if we have a dump to work from. At that point, I would think it would be something like session cache corruption of some kind, but who knows.

If all else fails, just open an incident and we'll sort it out one way or another.

Best of luck, and we're here for you,
- Tim.

mohitrao2007 · ‎10-05-2018

Have tried Point No 1 and 2. no luck. On Point 2, tried with the same payload of one which works and still failing.

tim_broberg · ‎10-06-2018

Well, since it is a TCP connection issue, maybe we can at least figure out who's shunning whom by watching with tcpdump or wireshark and see how far they're getting in the handshake?

What version are you running?

At this point, I think it would be instructive to set mid.ssh.debug = true on your mid, run a probe that works, run a probe that fails, collect agent/logs/agent0.log.0 (and agent0.log.1 if you have to to get the whole experiment into the logs), and then we squint at the logs to try to figure out what's different between the two.

If you're not feeling heroic, create an incident, attach the log, and we'll parse them and figure out what's going on.

If you are feeling heroic, by all means, go for it, and we can talk about what you find.
- Tim.

Rachna S · ‎11-12-2020

Hi @mohitrao2007

I know this is really old. But been seeing the same error on our side in the flow designer though. Can you please tell me how'd you manage to resolve it and what was the root cause ?