CMDB Duplicate CIs Across Multiple Classes – Difficulty Identifying True Duplicates vs Valid Records

KumarTejaM
Kilo Explorer

Hi Team,

We are currently analyzing duplicate CI records in CMDB and need guidance on the best approach to handle large-scale and mixed duplicate scenarios.

We observed duplicate CI patterns across multiple categories:


Scope:

  • Cloud Classes

    • Azure Datacenter (~2463)
    • Cloud Mgmt Network Interface (~390)
    • Cloud Subnet (~76)
    • Load Balancer (~20)
    • Compute Security Group (~16)
  • Server Classes

    • Microsoft IIS Web Server (~316)
    • Virtual Machine Instance (~66)
    • Website (~384)
    • Load Balancer Service (~25)
  • Database Classes

    • MSSQL Database (~6879)
    • MongoDB (~8)
    • MySQL (~4)

Observations:

  1. Not all duplicate groups are true duplicates

    • Many records share same Name + Class
    • But differ in:
      • Object ID / Resource ID
      • IP Address
      • Relationships
        → These appear to be valid cloud or DB instances
  2. Some records are clearly invalid

    • No discovery source
    • No relationships
    • Empty attributes
      → Likely orphan or manually created records
  3. Missing authoritative CI in some cases

    • Multiple records exist but:
      • All are incomplete
      • No clear primary CI available
        → Makes merge strategy unclear
  4. Volume challenge

    • Classes like MSSQL DB (~6879 records) make manual validation impossible

Challenge:

  • Difficult to distinguish:
    • True duplicates vs valid multi-instance CIs
  • No clear consistent rule across all classes
  • CMDB best practices recommend merge, but:
    • No primary CI exists in some cases
    • Many records contain no usable data
  • Large volume requires automated approach, but:
    • Risk of deleting valid records

Questions:

  1. What is the recommended approach to differentiate true duplicates vs valid CIs across different classes?
  2. In cases where no authoritative record exists, should records be deleted or retained?
  3. Is there any out-of-the-box or recommended approach for bulk CMDB deduplication at scale?
  4. How should we handle cloud-based classes where naming is same but identity differs?
  5. Any best practice for building safe deduplication logic using identifiers (object_id, MAC, etc.)?
1 REPLY 1

Vijaya_Mnpram
Kilo Sage

@KumarTejaM You need to solve the duplicates class by class, instead of trying to solve all at once. 

  1. What is the recommended approach to differentiate true duplicates vs valid CIs across different classes? [VM] Ideally, the discovery source is more legit as they are discoverable. 
  2. In cases where no authoritative record exists, should records be deleted or retained? [VM] If the same CI is also discovered, then get the attributes merged and delete the non-discovered CI. If the very CI is non-discovered, then check with relevant team, if they need those CIs.
  3. Is there any out-of-the-box or recommended approach for bulk CMDB deduplication at scale?[VM] Yes. There are some OOB templates from Servicenow and they are pre-loaded. go to 'Duplicate Dashobard' from the CMDB workspace, you can find the template library there. Also, you can create your own template to clear the duplicates in bulk. 
  4. How should we handle cloud-based classes where naming is same but identity differs?[VM] Again, this depends on the team which holds it. If a Cloud VM is also identified as Windows Server, then team can decide which would be better.  
  5. Any best practice for building safe deduplication logic using identifiers (object_id, MAC, etc.)? [VM] This is depending on the class. OOB the default Identifiers should help you in major cases or you can observe, what exactly the attribute missing in discovery, which is causing these duplicates.