How large can an array be?

Chris Bui
Giga Expert

Hello!

In an LDAP transform, I'm trying to disable groups that have been removed from AD by using an On Complete script.   I create an array of all groups in sys_user_group and an array of everything in the import set.   Then I use arrayUtil to give me the difference and disable those groups.   It seems to work, but I wonder what the limits of arrays are.  

In this group scenario, there are a little over 6,000 groups.   If we try to apply this solution to user records, we're lookin at over 40,000.  

Has anyone worked with arrays of this size?

Here's a sample of the On Complete script written as a backgroud script so you can test yourself:

var arrayUtil = new ArrayUtil();

var importSet = [];

var groups= [];

//get the records from this import set

var grImport= new GlideRecord('u_ldap_folder_permissions');

grImport.addQuery('sys_import_set','6466aae66fc6324016ce511bbb3ee490');

grImport.query();

while(grImport.next()){

      importSet.push(grImport.getValue('u_samaccountname'));

}

//get all the group records

var grGroups = new GlideRecord('sys_user_group');

grGroups.query();

while(grGroups.next()){

      groups.push(grGroups.getValue('name'));

}

gs.print('importSet length: ' + importSet.length);

gs.print('groups length: ' + groups.length);

//get the difference

var toBeDisabled = arrayUtil.diff(groups, importSet);

gs.print('toBeDisabled length: ' + toBeDisabled.length);

for(var i = 0; i < toBeDisabled.length; i++){

      gs.print(toBeDisabled[i]);

}

Message was edited by: Chris Bui

1 ACCEPTED SOLUTION

Mwatkins
ServiceNow Employee
ServiceNow Employee

Hi Chris,

Thanks for asking this question. It has prompted me to do some writing that I have been meaning to do for a while. I agree with Rob. The main thing you want to watch out for is performance as opposed to the max array size.

  • However, if I understand your question correctly, you have 40,000 users and you want to test that against the content of an import set from your active directory. I guess these import sets have 40,000 or fewer records in them. This would mean two arrays with 40,000 strings in each of them. ArrayUtil.diff() creates a 3rd array for the results, so lets assume that one is smaller, maybe 10,000 strings. So we are looking at 90,000 strings total stored in memory. So this is no problem at all - you might use 1 or 2 MB at the most out of the 500-700 MB that most systems have free at any given time.
  • The algorithm for ArrayUtil.diff() does two loops, one nested in the other, so that would be 1,600,000,000 loops. That might be a problem. Even though each loop would execute quickly, a loop like that would potentially have an impact on CPU usage. Also, the total of all loops together might add up in terms of execution time. You are still going to be fine if you are just running this once a day for a couple minutes but you might want to think about a different way to accomplish the goal if you end up scaling to a larger user base. See my recommendations below.
  • Sometimes I see people get into trouble when they are running the same code multiple times simultaneously on different threads. You will not have that problem because transform maps run in serial by default. No two transforms can run transforms on the same source/target table combination at the same time [side note: transforms can be made to run in parallel with the glide.import_set_insert_serialized... properties. Use with caution]

What is the goal here? You want to disable any user/group in the system that wasn't in the imported list of users right? I feel like there is another way to accomplish this...

Other options

  1. Maybe using a negation query against the database. e.g. query all users where name != array of u_samaccountname values.
  2. Or perhaps you could add a field to the user table and update it with some unique value during transformation - perhaps you could base it on the import set name. Any field not matching the unique value would be deactivated. That query would be pretty simple, select sys_user where u_transform_id != X.

Here are some things that can cause performance issues when working with arrays (or ServiceNow Javascript code in general):

Creating unnecessarily large arrays or strings

The main problem here is memory usage. The Java Virtual Machine that ServiceNow runs in has 2GB of memory total. At any given time it may have a garbage collection floor between 20% and 80%. GC floor above 80% is generally considered dangerously high. Arrays store their data in memory and so retain Java heap space for the duration of the array's life cycle (see de-referencing unneeded objects below). To avoid running your instance out of memory you want to make sure that your arrays do not pull too much data into memory.

One idea that I've had - but never tried - is to write a gs.sleep() statement at the very end of whatever script I want to test for memory. Then check memory before, during and after executing the script by looking at /stats.do. The part that says "Free percentage: 58.0" tells you how much available memory you have at any given time. If you did this on a sub-prod with little else running at the same time you ought to be able to get a pretty good idea of how much memory is being retained by your code.

There is an interesting thread in StackOverflow about how to test object sizes: memory - JavaScript object size - Stack Overflow

Keeping GlideRecord/GlideElement records in memory

One specific example that I see happening often is when someone stores an object reference in an array without realizing it or when they really only need to store a primitive data type. For example, the following code stores a GlideElement object that takes 100's of times the memory needed for a simple sys_id String. Javascript strings store as UTF-16 so a good estimate for size of a string would be total character length x 2 bytes. Assuming the average name has about 15 characters in it, 40,000 names would be about 1MB.

var arrIncident = [];
var myIncident = new GlideRecord("incident");
myIncident.query();
while(myIncident._next()) {
        var caller = new GlideRecord("sys_user");
        caller.get(myIncident.caller_id);
        arrIncident.push(caller.sys_id); ← This is passing a GlideElement
}

Using the getValue() method, like you have, is a good way to avoid this since it return the simple Javascript String type instead of complex objects.

arrIncident.push(caller.getValue("sys_id")); Much better

Failing to de-reference unneeded objects

If you have a very long script that builds multiple large arrays, it might be worth it to de-reference any arrays as soon as they are no longer needed.

var o = {a: 1, b:2 }; memory retained in object o
var oa = o.a;
var o = null; the only reference to the object originally stored in o is gone but we have stored a reference to o.a, one of the properties, so we can't garbage collect yet.
var oa = null; now we can garbage collect

See Reference Counting Garbage Collection

Infinite (or Very Deep) looping while traversing a recursive data structure

This can happen a lot in the CMDB and task tables when you want to use the parent/child relationships to check the whole ancestry of a record. When doing this type of operation you run the risk of getting into a very deep or infinite loop - record A has child B has child C has child A (uh oh!). Lots has been written about how to avoid this type of thing so I won't go into depth here, but here are some ideas:

  • Make some type of counter that will not let the recursion go beyond a certain number of layers deep.
  • Make a max size check that will not let the array get above a certain size
  • Keep a list of items that have already been seen, breaking when recursion is detected

Creating huge memory objects while attempting to avoid infinite loops while traversing a recursive data structure

Yes, that long title is intentionally humorous. Basically this problem happens when you are trying to avoid the previous problem by keeping a list of items that have already been seen. If that list gets too big you could be out of the frying pan and into the fire! What I mean by this is, suppose you are traversing the related CI hierarchy (cmdb_rel_ci) and trying to find all the child CI's that would be impacted by a Change. As you traverse the cmdb_rel_ci table there is the possibility of getting in a loop because A depends on B depends on C depends on A. In order to avoid this, you make an array and whenever an affected CI is identified you put it in the array. Then if you encounter any CI that is already in the array you stop looping and go back one level. The only problem is that you might end up with a really big array. So, when using this technique be sure to also employ the techniques mentioned above about keeping GlideRecord objects in memory. Also, you might create a fail-safe that errors out if the array grows past a certain point - as determined by your approximation of how much memory is consumed by each object in the array.

View solution in original post

8 REPLIES 8

Wow Matthew!



Thanks for taking the time to answer as you did.  



I have changed my approach for this by following your first suggestion to use a negation query.   The performance gained with this is night and day. Even with the limited group scenario, the original way took about 20-30 seconds to process.   Now it's pretty instantaneous.      



I'll keep an extra special eye on this if we apply this solution to our user sync since I'm still a little nervous about using .addQuery('user.sys_id', 'NOT IN', <array of 40,000 things>);



If I could throw it out there, if GlideRecord had a an outer join option, the negation query would be inherent by definition and we wouldn't have to pass an array of 40,000 things.   I haven't done SQL in a while but it would be something along the lines of...



SELECT sys_user.sys_id


FROM   sys_user


        LEFT JOIN sys_import_set


        ON sys_import_set = '6466aae66fc6324016ce511bbb3ee490'


        AND sys_user.u_objectGUID = sys_import_set.u_objectGUID


WHERE sys_import_set.u_objectGUID IS NULL



Thanks again for such a great explanation!



Chris


Brian Dailey1
Kilo Sage

Hi Chris,



A few things come to mind...



First, you don't have to create the first GlideRecord (grImport) and loop through it to build the array list in your Complete script.   The Transform is already looping once through all the import set records, all you would need to do is add a line or two of code to the main Transform script to build out your list:



e.g.         importSet.push(source.u_samaccountname.toString());



Second, I think I would rather employ the database to do the work instead of mashing a comparison of arrays every time.


  1. Add a flag of some sort to your User Group table (e.g., 'u_disable')
  2. Add an "on Start" transform script to your Transform map to reset it before the import:
    var grGroups = new GlideRecord('sys_user_group');
    grGroups.query();
    grGroups.setvalue('u_disable', true);
    grGroups.updateMultiple();
  3. Add a line in your main Transform Map script to switch this flag to false for every group record processed during the transform.
    target.u_disable = false;
  4. Then just query on groups where the flag is still true, these are the ones you will disable...



This still seems a bit redundant, you could possibly use the existing Active flag instead of creating a new one.   Then you would just be reactivating the valid groups through the import process.   But then again you may not want the valid groups being inactive even for a moment.




Secret option (C), you could use the updated timestamp as an indicator of which records were included in the import set.   At the beginning of the transform in an "on Start" transform script, record the current date/time in a variable, and use that as an exclusion criteria in your "on Complete" script to query for the records that were not included.



var grDisable = new GlideRecord('sys_user_group');


grDisable.addEncodedQuery("sys_updated_on<=javascript:gs.dateGenerate('2017-04-14','19:25:05')");


grDisable.query();


...



Just be aware that this is a sloppier solution... there is always the possibility that some other user or process updates a non-valid group elsewhere during the import, and it would not get disabled during this run.




Thanks,


-Brian



Brian:



Your suggestion to build the array during the transform is really good!   I created an onStart script and created a variable outside the scope of the default function ServiceNow starts you off with.



//create importSet variable to be used as we process the set


var importSet = [];




(function runTransformScript(source, map, log, target /*undefined onStart*/ ) {




  // Add your code here




})(source, map, log, target);



BTW, I'm trying to avoid updating all the records with each import because I think there will be issues if we turn LDAP listener on.   Now that I think about it, this complete script will probably also be an issue since the listener creates mini update sets.  



Thanks!


poyntzj
Kilo Sage

Came across this looking for something else

A other ideas for you

Option 1

For every group you have processed add a setForceUpdate and set the last import time
Periodically check for any that have not updated in the last x days and disable

 

Option 2

Create an array at the beginning and each group that is updated add to the array - so you may get a few thousand

From here, you can do a few things.  (2 and 3 are similar, just done different ways)

  1. Create a GlideRecord Query where the name is not in the array
    whatever you get update to disabled
    This may freak at the size of the array
  2. Once you have your array of updated groups, create another array of all of the groups
    This can be done by matching your updated groups so your new array is just the ones that did not update
    query against those and disable
  3. Once you have your array of updated groups, create another array of all of the groups
    Once you have this array, you can then REMOVE your array of processed groups from all the groups and leave yourself with a small array
    query against those and disable