Merge Duplicate Records

mballinger
Mega Guru

Hello,

I am trying to write a fix/background script to find all duplicate records and then merge them. I am not sure how to really do this. Records will be uniquely identified using 3 fields from the table and the most recently updated record will be the primary record. 

Here is what I have so far:

var count = new GlideAggregate('cmdb_ci_computer');
count.addAggregate('count', 'name');
//count.addAggregate('count', 'serial_number'); //This doesn't work 
//count.addAggregate('count', 'model_id'); //This doesn't work 
//count.groupBy('serial_number'); //Doesn't print out the name of the record
count.addHaving('count', '>', 1);
count.query();
while(count.next()) {
    var name = count.name;
    var record_count = count.getAggregate('count', 'name');
    gs.info("Duplicates: " + name + ", Count: " + record_count;
}

The code above returns the duplicate record plus the total count. I am also only able to uniquely identify duplicate records by using only 1 field. I need to use name, serial_number, and model_id to uniquely identify records.

The next step would then need to be merging the records by most recently updated/created. I have no idea how to attempt that.

Thanks!

@Ankur Bawiskar @Brad Bowman 

1 ACCEPTED SOLUTION

Hi,

As per your requirement, I have implemented logic below.

var gaComputer = new GlideAggregate('cmdb_ci_computer');
gaComputer.groupBy("name");
gaComputer.groupBy("serial_number");
gaComputer.groupBy("model_id");
gaComputer.addAggregate("COUNT", "name");
gaComputer.query();

while (gaComputer.next()) {
    
    var duplicateCount = gaComputer.getAggregate("COUNT", "name");// This will return count of records where Name, Model ID and Serial Number is unique
    if (duplicateCount > 1) {

        var grComputer = new GlideRecord("cmdb_ci_computer");
        grComputer.addEncodedQuery("name=" + gaComputer.name + "^serial_number=" + gaComputer.serial_number + "^model_id=" + gaComputer.model_id);
        grComputer.orderByDesc("sys_updated_on"); // This will make sure to give most recently updated/created record first.
		grComputer.setLimit(1);// we are getting only primary record here (we will perform operation with secondary records in next while loop)
        grComputer.query(); // This will return number of records with same Name, Serial Number and Model ID

            if(grComputer.next()) {
                
               if(grComputer.getValue("size")=="" || grComputer.getValue("storage")=="" || grComputer.getValue("screen")==""){
				   
					var grComputer2=new GlideRecord("cmdb_ci_computer");
					grComputer2.addEncodedQuery("name=" + gaComputer.name + "^serial_number=" + gaComputer2.serial_number + "^model_id=" + gaComputer.model_id);
					grComputer2.orderByDesc("sys_updated_on"); // this will make sure that we are updating primary record with recent updated/created secondary records if there are more than 2 duplicate reocrds
					grComputer2.chooseWindow(1,duplicateCount); // this will provide us all duplicate records by excluding primary record as initial window starts with 1
					grComputer2.query();
					
					while(grComputer2.next()){
						
					if(grComputer.getValue("size")=="" && grComputer2.getValue("size")!=""){
					grComputer.setValue("size",grComputer2.getValue("size"));
					}
					if(grComputer.getValue("storage")=="" && grComputer2.getValue("storage")!=""){
					grComputer.setValue("size",grComputer2.getValue("storage"));
					}
					if(grComputer.getValue("screen")=="" && grComputer2.getValue("screen")!=""){
					grComputer.setValue("size",grComputer2.getValue("screen"));
					}
					grComputer.update();
				}
            }
        }
}

Please mark this as Correct or helpful if it helps.

Thanks and Regards,
Abhijit

By marking my response as correct or helpful, you contribute to helping future readers with similar issues.
Regards,
Abhijit
ServiceNow MVP

View solution in original post

16 REPLIES 16

bammar
Kilo Sage
Kilo Sage

You may have to make some nested querys ( though there may be a more elegant way to do this) Lets think about this a Psuedocode...

 

Start by Looking for the first field - a Name and then getting an Aggregate for Name 

IF the aggregate count is greater than 1 .... it is Possible but not necessarilly a duplicate - So you would make a While count>1 and enter that - inside that you would Aggregate the 2nd field the serial number and if you got >1 then you would finally do a while and go to the 3rd- at this point in that query you would order them by date and maybe try a counter and delete the ones that are the oldest. This is really hard to test without messing up so if its a one time deal you may be better of having the script write to a text field on the dups or check a true false box - then later run another query to Delete/merge them

 

 

Also I would take steps in the database to never allow inserts of any new records where these 3 fields match another 

 

@bammar - Thanks for your response! That makes a lot of sense to nest the other lookups. I guess my next biggest dilemma would be merging the records. How would I merge the records? There are several fields from the dupe records that would need to append to the Primary record if the field values are not filled out on the Primary. Then the dupes would need to be deleted afterwards.

@bammar - Can you please review my code under Allen A. I've nested a few Glide Aggregates. I don't think I did it correct because I am not returning any count. I've confirmed and verified that I have duplicate records. Thanks!

Allen Andreas
Administrator
Administrator

Hello,

For the merging of the records, that's more of a business process question than technical as you'd need to decide what exactly you are merging. Are you merging anything that has a value on the "duplicate" record (so non-primary) that the now "original" record has no value for?

You would need to decide that, once you decide that, then as @bammar mentioned, you'd nest the queries and then set the fields like:

if (duplicate.field_name) {
primary.setValue('field_name', duplicate.getValue('field_name'));
}

The above is an example and after you've retrieve both records via GlideRecord or whatever query method you decide to go with.

Please mark reply as Helpful/Correct, if applicable. Thanks!


Please consider marking my reply as Helpful and/or Accept Solution, if applicable. Thanks!