Issues with Data Discovery Jobs when running a Full Scan Type(for Privacy/Classification) Yokohama

Heather White
Giga Guru

When creating a data discovery job with a scan type of "Full", we are running into 2 issues.

 

1. When running a scan on a single table, the scan is not covering all records on the table. It will run a small number, 3 or 4.

2. When running larger jobs with multiple tables, the jobs are getting hung up and are not completing, they just keep running. I have not been able to determine what the maximum table number is, if there is one, for these jobs. 

 

Any assistance appreciated!

 

Thank you,

Heather

1 ACCEPTED SOLUTION

Thank you for your answer! 

We discovered our main issue was this: 

In Yokohama, the "Full" scan operates with incremental behavior. Each time a Full discovery job runs, it records the sys_updated_on timestamp of the last scanned record for each table and stores it as scan history. On subsequent runs, the job scans only records that were created or modified after this timestamp. This explains why fewer records are being scanned in the change_request table.

In Zurich, the Full scan behavior has been enhanced to ignore scan history and scan the entire table on every execution. The earlier incremental behavior has been introduced as a separate scan type called "Incremental."

We have adjusted our job sizes however, and enabled a job property that allows parallel job runs, so once we upgrade, we should be all set.

 

Thanks for your assistance!

View solution in original post

2 REPLIES 2

ayushraj7012933
Mega Guru

Hi @Heather White ,

We’ve seen similar behavior with Data Discovery (Privacy/Classification) in Yokohama, and it usually comes down to platform limits + execution model, rather than the scan type itself

 Best Practice Approach

1. Check and Tune System Limits

Review system properties related to Data Discovery / Classification:

  • Max records per table

  • Batch size

  • Worker threads

“Full” scan does not always bypass these limits unless tuned.

2. Avoid Large Multi-Table Jobs

  • Do not include too many tables in a single job

  • Recommended:

    • 5–10 tables per job max

This significantly improves completion rate

3. Run Jobs in Controlled Batches

  • Schedule jobs in sequence, not parallel

  • Avoid overlap with:

    • Discovery

    • Imports

    • Other heavy background jobs

4. Validate Table Size & Performance

  • Large tables without proper indexing can slow down scans

  • Check execution details to confirm:

    • Records picked vs processed

5. Monitor Execution

  • Use:

    • System Diagnostics → Stats

    • Job execution logs

Look for long-running or stuck workers

If this helps, please mark it as Helpful and Accept as Solution.

Thanks!

Thank you for your answer! 

We discovered our main issue was this: 

In Yokohama, the "Full" scan operates with incremental behavior. Each time a Full discovery job runs, it records the sys_updated_on timestamp of the last scanned record for each table and stores it as scan history. On subsequent runs, the job scans only records that were created or modified after this timestamp. This explains why fewer records are being scanned in the change_request table.

In Zurich, the Full scan behavior has been enhanced to ignore scan history and scan the entire table on every execution. The earlier incremental behavior has been introduced as a separate scan type called "Incremental."

We have adjusted our job sizes however, and enabled a job property that allows parallel job runs, so once we upgrade, we should be all set.

 

Thanks for your assistance!