CMDB Duplicate CIs Across Multiple Classes – Difficulty Identifying True Duplicates vs Valid Records

KumarTejaM · ‎05-18-2026

Hi Team,

We are currently analyzing duplicate CI records in CMDB and need guidance on the best approach to handle large-scale and mixed duplicate scenarios.

We observed duplicate CI patterns across multiple categories:

✅ Scope:

Cloud Classes
- Azure Datacenter (~2463)
- Cloud Mgmt Network Interface (~390)
- Cloud Subnet (~76)
- Load Balancer (~20)
- Compute Security Group (~16)
Server Classes
- Microsoft IIS Web Server (~316)
- Virtual Machine Instance (~66)
- Website (~384)
- Load Balancer Service (~25)
Database Classes
- MSSQL Database (~6879)
- MongoDB (~8)
- MySQL (~4)

✅ Observations:

Not all duplicate groups are true duplicates
- Many records share same Name + Class
- But differ in:
  - Object ID / Resource ID
  - IP Address
  - Relationships
    → These appear to be valid cloud or DB instances
Some records are clearly invalid
- No discovery source
- No relationships
- Empty attributes
  → Likely orphan or manually created records
Missing authoritative CI in some cases
- Multiple records exist but:
  - All are incomplete
  - No clear primary CI available
    → Makes merge strategy unclear
Volume challenge
- Classes like MSSQL DB (~6879 records) make manual validation impossible

✅ Challenge:

Difficult to distinguish:
- True duplicates vs valid multi-instance CIs
No clear consistent rule across all classes
CMDB best practices recommend merge, but:
- No primary CI exists in some cases
- Many records contain no usable data
Large volume requires automated approach, but:
- Risk of deleting valid records

✅ Questions:

What is the recommended approach to differentiate true duplicates vs valid CIs across different classes?
In cases where no authoritative record exists, should records be deleted or retained?
Is there any out-of-the-box or recommended approach for bulk CMDB deduplication at scale?
How should we handle cloud-based classes where naming is same but identity differs?
Any best practice for building safe deduplication logic using identifiers (object_id, MAC, etc.)?

Vijaya_Mnpram · ‎05-18-2026

@KumarTejaM You need to solve the duplicates class by class, instead of trying to solve all at once.

What is the recommended approach to differentiate true duplicates vs valid CIs across different classes? [VM] Ideally, the discovery source is more legit as they are discoverable.
In cases where no authoritative record exists, should records be deleted or retained? [VM] If the same CI is also discovered, then get the attributes merged and delete the non-discovered CI. If the very CI is non-discovered, then check with relevant team, if they need those CIs.
Is there any out-of-the-box or recommended approach for bulk CMDB deduplication at scale?[VM] Yes. There are some OOB templates from Servicenow and they are pre-loaded. go to 'Duplicate Dashobard' from the CMDB workspace, you can find the template library there. Also, you can create your own template to clear the duplicates in bulk.
How should we handle cloud-based classes where naming is same but identity differs?[VM] Again, this depends on the team which holds it. If a Cloud VM is also identified as Windows Server, then team can decide which would be better.
Any best practice for building safe deduplication logic using identifiers (object_id, MAC, etc.)? [VM] This is depending on the class. OOB the default Identifiers should help you in major cases or you can observe, what exactly the attribute missing in discovery, which is causing these duplicates.

IanCox · 2 weeks ago

Start with the identification layer, not the cleanup. In almost every "duplicates at scale" case I see, the root cause is ungoverned identification rules and dependent-source precedence — a process and skills gap, not a tooling one. Fix intake first and the backlog stops growing while you remediate.

On your five questions:

True duplicate vs. valid CI is an identifier question, not a name one. Duplicates match on the same criterion attributes under a class's Identification Rule; valid instances differ on at least one. Analyze per identifier set, per class — not per display name.
No authoritative record: don't hard-delete. Set the record to Retired/Absent so you preserve relationships and audit trail, and only remove once you've confirmed nothing references it.
At scale, use the out-of-box Duplicate CIs remediation — the CMDB Health "Duplicates" KPI surfaces them and generates de-duplication tasks that merge through IRE. Skip ad-hoc delete scripts; they destroy relationships.
Cloud classes with the same name but different identity are exactly why those classes should key on object_id/correlation_id, not name. Verify your identification rules use the cloud-native identifier as the criterion attribute.
Order identifiers by reliability — object_id/MAC/serial as criterion attributes, name only as a weak qualifier — and enforce it through dependent-source precedence.

Then lock intake behind IRE and data certification so this doesn't recur.

If you want a structured way to tackle this, we do exactly this at Four Dragons happy to chat

jrbockrath · 2 weeks ago

I'd add one more governance question before deciding what to merge or delete: why did these records make it into the CMDB in the first place?

In my experience, duplicate remediation often becomes an endless cleanup exercise because the same intake decisions keep getting replayed. Before touching the backlog, I'd define for each class:

What establishes identity?
Which source is authoritative for that identity and for each key attribute?
Under what conditions is a record complete enough to become a CI?
Which records should be quarantined for review instead of imported?

Once those decisions are explicit, the duplicate backlog becomes a finite remediation project instead of a recurring operational problem.