ServiceNow Discovery: SSH credentials work initially, then stop working later

MarxA
Tera Contributor

ServiceNow Discovery: SSH credentials work initially, then stop working later (credentials appear to be cleared/overwritten)

Hi everyone,
I’m looking for outside ideas on an intermittent Discovery issue.

Context:

  • Environment includes Avaya/telephony devices and Linux servers.
  • Discovery works at first, and targets are discovered successfully.
  • After some time, Discovery can no longer authenticate over SSH.
  • In some cases, it looks like the SSH credential mapping/reference is no longer usable (for example, empty credential reference at runtime).
  • SNMP may still run, but SSH authentication path does not continue.

Observed behavior:

  • Re-adding the SSH key or recreating the discovery user restores discovery temporarily.
  • Later, the problem comes back.
  • This suggests something is changing after initial success (automation/policy/sync/rotation?).

What we already checked:

  • SSH port is reachable.
  • Manual server-side key files/permissions looked correct at check time.
  • No obvious manual deletion process identified on our team side.
  • We suspect an automated process may be overwriting/removing credential data over time.

Questions:

  1. Has anyone seen Discovery credentials (especially SSH key-based) become invalid after initial successful runs?
  2. What are the most common root causes in ServiceNow for this pattern?
  3. Which logs/tables are best to prove what changed the credential reference (audit, scheduled jobs, integrations, MID activity)?

Any pointers, known defects, or troubleshooting checklists would be appreciated.

Thanks!

 

6 REPLIES 6

ayushraj7012933
Mega Guru

Hi @MarxA ,

This behavior is typically caused by credential changes after initial success (rotation, overwrite, or access issues), not Discovery itself.

Below is a structured step-by-step solution aligned with ServiceNow best practices to identify and fix the issue.

Step-by-Step Solution

 Step 1: Identify the Failing Credential

  1. Go to Discovery → Status

  2. Open a failed Discovery run

  3. Check:

    • Which SSH credential was used

    • Error message (authentication / permission)

 Step 2: Validate Credential Record

  1. Navigate to:

    • Discovery → Credentials

  2. Open the SSH credential

  3. Verify:

    • Username

    • Private key / password

    • Active = true

 Check Last updated and Updated by fields

 Step 3: Check if Credential is Being Modified

  1. Enable auditing (if not already):

    • Table: discovery_credentials / ssh_private_key

  2. Review:

    • History → Audit

 Look for:

  • Unexpected updates

  • System or integration user changes

 Step 4: Check for External Credential Rotation

  • Verify if credentials are managed by:

    • CyberArk / Vault / any external tool

  • Confirm:

    • Whether SSH keys/passwords are rotated periodically

 If yes:

  • Update Discovery to always use latest credential

  • Avoid hardcoded or outdated keys

 Step 5: Validate MID Server Behavior

  1. Go to:

    • MID Server → Servers

  2. Check:

    • Status = Up

    • Validated

  3. Restart MID Server (test purpose)

 This clears credential cache issues

 Step 6: Check ECC Queue

  1. Navigate to System Logs → ECC Queue

  2. Filter:

    • Topic contains SSH / Discovery

  3. Review:

    • Input/output payload

    • Errors related to authentication

 Step 7: Validate Target Server (Linux)

On target machine:

  • Check:

    • authorized_keys file

    • File permissions (600 / 700)

  • Confirm:

    • SSH key still exists

    • User is not locked/expired

Step 8: Review Scheduled Jobs / Integrations

  1. Go to:

    • System Scheduler → Scheduled Jobs

  2. Check for:

    • Jobs updating credentials

    • Import sets / sync processes

Disable temporarily (for testing)

Thanks, this is very helpful.

We’ve already started checking these steps, and I want to clarify one key point: this issue is not global across our Discovery environment. It is only happening on a specific subset of Avaya/telephony devices and related servers.

Other device groups using the same Discovery framework are stable, which suggests this may be tied to Avaya-specific behavior (account/key handling, sync/provisioning, or policy on those systems) rather than a general ServiceNow Discovery problem.

If anyone has seen this specifically with Avaya devices/servers, I’d appreciate targeted guidance on:

  1. Avaya-side processes that may overwrite/remove SSH credentials after initial success
  2. Known interactions between Avaya management/sync tools and SSH key persistence
  3. Best way to keep Discovery credentials persistent for this device family

Hi @MarxA ,

Thanks for the clarification—this is a key observation. Since the issue is isolated to Avaya/telephony devices, while other device groups in ServiceNow Discovery are working fine, this strongly points toward device-side behavior rather than a Discovery framework issue.

Based on similar scenarios, here are some Avaya-specific areas to validate:

1. Check if SSH Keys Are Being Overwritten

On Avaya systems, user environments are often managed by provisioning or sync tools (e.g., System Manager / LDAP). These processes can:

  • Recreate user profiles

  • Reset .ssh/authorized_keys

  • Remove previously added keys

Verify whether the SSH key used by Discovery still exists after failure and if the file timestamp changes automatically.

2. Validate SSH Key Persistence & Permissions

Avaya platforms are stricter with SSH:

  • ~/.ssh  700

  • authorized_keys  600

Incorrect permissions can silently break authentication even if the key exists.

3. Review Avaya Security / Hardening Policies

Check SSH configuration (sshd_config) for:

  • PubkeyAuthentication

  • AuthorizedKeysFile

  • Any restrictions enforcing password-only access

Some Avaya builds disable or override key-based authentication during policy enforcement.

4. Check Provisioning / Sync Jobs (Most Common Cause)

This is typically the root cause in such cases.

👉 Validate if:

  • The user is managed via LDAP / Avaya System Manager

  • There are scheduled sync jobs resetting user configurations

If yes, these processes may remove or overwrite Discovery credentials after initial success.

5. Use a Dedicated Discovery User (Recommended)

Instead of shared/system accounts, create a dedicated user (e.g., sn_discovery) and:

  • Add SSH key manually

  • Exclude it from Avaya provisioning/sync

This helps ensure credential persistence.

6. Validate Shell Access

Some Avaya users are configured with restricted shells:

  • /sbin/nologin

  • Limited CLI environments

Ensure the user has a valid shell (e.g., /bin/bash) required for Discovery commands.

 7. Check Account Expiry / Lock Policies

Avaya systems may enforce:

  • Password expiry

  • Account lock/disable policies

Even if initial authentication succeeds, the account may later become unusable.

8. Correlate with Logs

In ServiceNow:

  • Check ECC Queue for SSH/authentication errors

  • Correlate timestamps with any Avaya-side sync or policy jobs

Thank you! This is very helpful. We’ll apply these checks on our Avaya/telephony subset and validate them step by step.

We’ll compare a working vs failing host, correlate with provisioning/sync timing, and test a dedicated discovery account excluded from sync.

I appreciate the guidance! I'll let you know how it works out!