Ho to remove duplicate (massive records) records from Product Models[cmdb_model] table.

Sonu Parab · ‎04-17-2023

Hi All,
We have one requirement as,
1] We have 4 millions of records in the Product Models [cmdb_model] table and we have categories them by Model Number field.
2] We have 634 model numbers which have 1000+ duplicate records for a single Model Number. So we would like to keep single healthy records from duplicate records.

e.g. For Model Number 'ABC'1234' we have 100 records so we need to keep single record for 'ABC1234' model number.

3] Need to do the same for all 634 model number. So is there any way to deal with massive duplicated records in Product Models [cmdb_model] table.

@Ankur Bawiskar , @AnveshKumar M , @Prince Arora , @Sandeep Dutta , @CMDB Whisperer , @SatyakiBose ,@Amit Gujarathi

Any and all help greatly appreciated!

Thank you.

Rahul Kumar17 · ‎04-17-2023

Hi,

Yes, there are several ways to handle massive duplicate records in the Product Models [cmdb_model] table. Here are a few options:

Deduplication Script: You can write a script to remove duplicate records based on the model number field. The script can loop through all 634 model numbers and find and remove the duplicates. This method can be time-consuming as it involves scanning all 4 million records in the table.
De-duplication Tool: You can use a third-party de-duplication tool such as DemandTools or Duplicate Check to scan and identify the duplicate records based on the model number field. These tools can perform batch operations to remove the duplicates and keep a single healthy record. The advantage of using a third-party tool is that it can handle large datasets more efficiently.
Import Sets: You can use an import set to bring in the data from the cmdb_model table into a staging table. You can then use a transform map to identify and merge the duplicate records based on the model number field. This method requires setting up the import set and transform map, but it can be useful if you need to merge data from multiple sources.

Regardless of the approach you choose, it is recommended to perform a backup before removing any data to avoid data loss. Also, it is important to consider the impact on any related data or processes that may be affected by the removal of duplicate records.

Thanks,

Rahul Kumar

If my response helped please mark it correct and close the thread.

Thanks,
Rahul Kumar