Different outcomes when Scheduling Discovery vs running schedule On Demand

Steve Phayre · ‎07-17-2023

Hi all,

Utah instance (although only recently updated and was seeing similar behavior in Tokyo) doing IP based server discovery.

Recently, we setup a discovery schedule using new MID servers to scan a newly deployed environment (on-prem, mostly VMWare VMs & Cisco physical hardware). The schedule was set to run daily, and was scanning about 600 devices, and about 70% of the servers were being discovered as expected. The remaining 30% was successfully getting past Shazzam with open ports properly identified, but was failing on Classify (no credential was successful even though all 4 Windows creds available to the MID server were attempted and failed).

On doing some troubleshooting of the servers that failed, I attempted to scan each one using the "Discover Now" action, and for each of the once that failed every day on the scheduled version, they scanned completely successfully! So then I went into the Discovery Schedule, and ran the schedule from the UI action and if completed the scan, and successfully picked up the servers that had failed Classify on all of the scheduled attempts.

Has anyone seen this type of thing before? I'm pretty sure I had this happen years and years ago, but I'm not recalling what solved it last time.

I'm relatively sure that the recent upgrade isn't to blame, because the upgrade happened in the last few days and there were failed scheduled attempts both before and after the upgrade. And these servers (and IP addresses) had never been successfully discovered before, so there is no credential affinity records or anything like that. I thought it might be something to do with the behavior that is selected when you click on Discovery Now on the CI, but as I understand it, if you do "Discover Now" on the Discovery Schedule, it will use the behavior/functionality selected in the schedule in the same way that it runs when it's executing on the schedule.

Has anyone else seen failures in scheduled discoveries that work fine if you kick them off manually?

pratiksha5 · ‎07-17-2023

I have faced a similar issue and raised an HI ticket for the same. After a lot of debugging, we found that when we do discovery using schedule the devices were not responding. We checked the traffic between the mid-server and the target device. However if in the schedule I give an IP list (not range ) they respond. ServiceNow said it is due to latency in the network. Workaround would be to create a schedule with the IP address list. The only issue is you have to keep adding as and when a new IP comes into the picture. Many customers are facing a similar issue.

Steve Phayre · ‎07-17-2023

Thanks for the reply. I wondered if it was related to being only one IP at a time but it doesn't make sense that it would work on an IP list and not on a network range, and it works with the network range so long as the Discovery is kicked off manually.

I'm going to test out a few more config and will report back here.

orons · ‎07-18-2023

Hey!

I believe the best way to prove these things is by running Wireshark or a similar app on the MID as mentioned above

Another thing that I feel is important to notice is that people configure schedules with ranges that contain multiple IPs for the same device, so the device is being hit multiple times in parallel [at least with shazzam and classify..the rest is depending on what should run] which can potentially cause slowness on its side and in that case, most organizations have something called management network(s)\segment(s) or out of band management, which can be used for Discovery purposes [with the approval of the network team of the org of course]

Another thing is that most times the traffic to the device is going through a firewall of some sort that may also block traffic or actually respond on behalf of devices [again...very very simplified summary of things I saw over the years] which has it own implications

Steve Phayre · ‎07-18-2023

Thanks for the input. Basically the Discovery Schedule behaves differently when I run the whole Discovery Schedule on demand, than when it is simply scheduled. I moved the timing around etc but it still failed on 30% of the schedule, then I ran it around the same time, but this time went to the Discovery Schedule and hit Discover Now, and it was far more successful. I was able to repeat this a number of times.

In the end if might just be just coincidental but I remember experiencing something similar many years ago.