Import set table data is being duplicated when running 2 transforms

gpopp · ‎09-14-2016

I have a really strange phenomenon involving import sets. This is on the Helsinki release. It seems that if I have two transform maps for the same import set, somehow the import set table produces a duplicate of all the data rows from the original import set!

A very simple CSV file with 9 rows containing machine names and some hardware data is loaded into a table named 'test_import'.

It looks like this:

Asset Tag,Appliance,Platform,Module,Component,Manufacturer,Model,PN,Man PN,Man SN

13752,ilauctsa01a,Dion,ilauctsa01a,Chassis,,1-u Intel D-Generation Edge Cache/SAS,,2602-556242,VM15AS001990

13752,ilauctsa01a,Dion,ilauctsa01a,Cpu/Processor1,Intel,Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz,,,

13752,ilauctsa01a,Dion,ilauctsa01a,Cpu/Processor2,Intel,Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz,,,

13752,ilauctsa01a,Dion,ilauctsa01a,Memory1,Samsung,8GB DDR4-1866 SDRAM DIMM Module,,M393A1G40DB0-CPB,418A74A0

etc...

I created two test transform maps that run scripts. They are called Test Import One and Test Import Two. They both look like this:

(function transformRow(source, target, map, log, isUpdate) {

// Do nothing

ignore = true;

})(source, target, map, log, action==="update");

After loading the data, if I look at the import set table named 'u_test_import' there are 9 rows.

Import Set Bug 1.png

Then I run the do-nothing transforms:

Import Set Bug 2.png

After the transform there are now twice as many rows in the IMPORT set table as before:

So the "transforms" I had did nothing (ignoring every row) and yet my INPUT table has doubled in size. As you can imagine this is not a good thing with massive data sets (which is unfortunately how I discovered this).

Can anyone explain what is going on here? Is there something I'm doing to cause this that I can work around?

Thanks in advance for any help!

Chuck Tomasi · ‎09-14-2016

While I don't have all the platform level details, I can certainly understand why it duplicates if you run them at the same time vs. separately.

Running them separately gives you the opportunity to analyze the results after each run. Run Map 1, check results. Yup, user's imported fine. Run Map 2 after it manually, check results. "Hmmm, some issues". If you run these together (map 1 for users and map 2 for groups) how can you tell which worked and which didn't, when you are sending to two different target tables? From a status perspective, you need two import set records when running two maps on the same source data.

The other alternative would be to have a status related list for each import record and treat the imported data as "sacred" and a related list of output status from each map that was run. In 99.9% of the cases (or more) this will be a related list of one record. That would require the engineers to rebuilt some bits of the import set engine, which you can recommend.

I invite you to open an enhancement request! Our product managers DO listen.

Enhancement requests: Tell us how you would improve the ServiceNow product

View solution in original post

Chuck Tomasi · ‎09-14-2016

Hi Gregory,

You have selected two transform maps. Both are going to run on that import set. You are effectively running this import twice. It takes each pending row, runs it through transform 1 and then again through transform 2. Hence you get two rows. I'd be interested to see the Created dates on those duplicate records. I think you'll find half of them coincide to when the import was initially done and the other half when you ran the transform.

The typical use case for different transform maps is to import the same data to two target tables (e.g. users and groups.) You'll want to know which of your users imported and which of your groups imported (and similarly, which had issues.)

http://wiki.servicenow.com/index.php?title=Import_Sets

gpopp · ‎09-14-2016

Thanks Chuck! But wow - I still find this rather bizarre that my input data is being replicated! If I had ONE transform map, I don't see/wouldn't expect my source data to be duplicated - so why is it with two?. Note that if I run each of these transforms separately, I do not see the duplication. Is the issue perhaps that the transformer is storing the transformation results with the rows and thus needs a location for each one so it duplicates the entire row? That's kind of insane. Imagine I have an import set of 25,000 records. So I have to import it twice if I want to - say - distribute the data to two different tables? Yikes.

Chuck Tomasi · ‎09-14-2016

While I don't have all the platform level details, I can certainly understand why it duplicates if you run them at the same time vs. separately.

Running them separately gives you the opportunity to analyze the results after each run. Run Map 1, check results. Yup, user's imported fine. Run Map 2 after it manually, check results. "Hmmm, some issues". If you run these together (map 1 for users and map 2 for groups) how can you tell which worked and which didn't, when you are sending to two different target tables? From a status perspective, you need two import set records when running two maps on the same source data.

The other alternative would be to have a status related list for each import record and treat the imported data as "sacred" and a related list of output status from each map that was run. In 99.9% of the cases (or more) this will be a related list of one record. That would require the engineers to rebuilt some bits of the import set engine, which you can recommend.

I invite you to open an enhancement request! Our product managers DO listen.

Enhancement requests: Tell us how you would improve the ServiceNow product

gpopp · ‎09-14-2016

Hi Chuck,

Thanks again for confirming my suspicions and explaining the operation! Yes, for large datasets (especially one with many columns) I do think a related table would be preferable.

Greg