- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2017 01:02 PM
Hello!
In an LDAP transform, I'm trying to disable groups that have been removed from AD by using an On Complete script. I create an array of all groups in sys_user_group and an array of everything in the import set. Then I use arrayUtil to give me the difference and disable those groups. It seems to work, but I wonder what the limits of arrays are.
In this group scenario, there are a little over 6,000 groups. If we try to apply this solution to user records, we're lookin at over 40,000.
Has anyone worked with arrays of this size?
Here's a sample of the On Complete script written as a backgroud script so you can test yourself:
var arrayUtil = new ArrayUtil();
var importSet = [];
var groups= [];
//get the records from this import set
var grImport= new GlideRecord('u_ldap_folder_permissions');
grImport.addQuery('sys_import_set','6466aae66fc6324016ce511bbb3ee490');
grImport.query();
while(grImport.next()){
importSet.push(grImport.getValue('u_samaccountname'));
}
//get all the group records
var grGroups = new GlideRecord('sys_user_group');
grGroups.query();
while(grGroups.next()){
groups.push(grGroups.getValue('name'));
}
gs.print('importSet length: ' + importSet.length);
gs.print('groups length: ' + groups.length);
//get the difference
var toBeDisabled = arrayUtil.diff(groups, importSet);
gs.print('toBeDisabled length: ' + toBeDisabled.length);
for(var i = 0; i < toBeDisabled.length; i++){
gs.print(toBeDisabled[i]);
}
Message was edited by: Chris Bui
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2017 02:56 PM
Hi Chris,
Thanks for asking this question. It has prompted me to do some writing that I have been meaning to do for a while. I agree with Rob. The main thing you want to watch out for is performance as opposed to the max array size.
- However, if I understand your question correctly, you have 40,000 users and you want to test that against the content of an import set from your active directory. I guess these import sets have 40,000 or fewer records in them. This would mean two arrays with 40,000 strings in each of them. ArrayUtil.diff() creates a 3rd array for the results, so lets assume that one is smaller, maybe 10,000 strings. So we are looking at 90,000 strings total stored in memory. So this is no problem at all - you might use 1 or 2 MB at the most out of the 500-700 MB that most systems have free at any given time.
- The algorithm for ArrayUtil.diff() does two loops, one nested in the other, so that would be 1,600,000,000 loops. That might be a problem. Even though each loop would execute quickly, a loop like that would potentially have an impact on CPU usage. Also, the total of all loops together might add up in terms of execution time. You are still going to be fine if you are just running this once a day for a couple minutes but you might want to think about a different way to accomplish the goal if you end up scaling to a larger user base. See my recommendations below.
- Sometimes I see people get into trouble when they are running the same code multiple times simultaneously on different threads. You will not have that problem because transform maps run in serial by default. No two transforms can run transforms on the same source/target table combination at the same time [side note: transforms can be made to run in parallel with the glide.import_set_insert_serialized... properties. Use with caution]
What is the goal here? You want to disable any user/group in the system that wasn't in the imported list of users right? I feel like there is another way to accomplish this...
Other options
- Maybe using a negation query against the database. e.g. query all users where name != array of u_samaccountname values.
- Or perhaps you could add a field to the user table and update it with some unique value during transformation - perhaps you could base it on the import set name. Any field not matching the unique value would be deactivated. That query would be pretty simple, select sys_user where u_transform_id != X.
Here are some things that can cause performance issues when working with arrays (or ServiceNow Javascript code in general):
Creating unnecessarily large arrays or strings
The main problem here is memory usage. The Java Virtual Machine that ServiceNow runs in has 2GB of memory total. At any given time it may have a garbage collection floor between 20% and 80%. GC floor above 80% is generally considered dangerously high. Arrays store their data in memory and so retain Java heap space for the duration of the array's life cycle (see de-referencing unneeded objects below). To avoid running your instance out of memory you want to make sure that your arrays do not pull too much data into memory.
One idea that I've had - but never tried - is to write a gs.sleep() statement at the very end of whatever script I want to test for memory. Then check memory before, during and after executing the script by looking at /stats.do. The part that says "Free percentage: 58.0" tells you how much available memory you have at any given time. If you did this on a sub-prod with little else running at the same time you ought to be able to get a pretty good idea of how much memory is being retained by your code.
There is an interesting thread in StackOverflow about how to test object sizes: memory - JavaScript object size - Stack Overflow
Keeping GlideRecord/GlideElement records in memory
One specific example that I see happening often is when someone stores an object reference in an array without realizing it or when they really only need to store a primitive data type. For example, the following code stores a GlideElement object that takes 100's of times the memory needed for a simple sys_id String. Javascript strings store as UTF-16 so a good estimate for size of a string would be total character length x 2 bytes. Assuming the average name has about 15 characters in it, 40,000 names would be about 1MB.
var arrIncident = [];
var myIncident = new GlideRecord("incident");
myIncident.query();
while(myIncident._next()) {
var caller = new GlideRecord("sys_user");
caller.get(myIncident.caller_id);
arrIncident.push(caller.sys_id); ← This is passing a GlideElement
}
Using the getValue() method, like you have, is a good way to avoid this since it return the simple Javascript String type instead of complex objects.
arrIncident.push(caller.getValue("sys_id")); ← Much better
Failing to de-reference unneeded objects
If you have a very long script that builds multiple large arrays, it might be worth it to de-reference any arrays as soon as they are no longer needed.
var o = {a: 1, b:2 }; ← memory retained in object o
var oa = o.a;
var o = null; ← the only reference to the object originally stored in o is gone but we have stored a reference to o.a, one of the properties, so we can't garbage collect yet.
var oa = null; ← now we can garbage collect
See Reference Counting Garbage Collection
Infinite (or Very Deep) looping while traversing a recursive data structure
This can happen a lot in the CMDB and task tables when you want to use the parent/child relationships to check the whole ancestry of a record. When doing this type of operation you run the risk of getting into a very deep or infinite loop - record A has child B has child C has child A (uh oh!). Lots has been written about how to avoid this type of thing so I won't go into depth here, but here are some ideas:
- Make some type of counter that will not let the recursion go beyond a certain number of layers deep.
- Make a max size check that will not let the array get above a certain size
- Keep a list of items that have already been seen, breaking when recursion is detected
Creating huge memory objects while attempting to avoid infinite loops while traversing a recursive data structure
Yes, that long title is intentionally humorous. Basically this problem happens when you are trying to avoid the previous problem by keeping a list of items that have already been seen. If that list gets too big you could be out of the frying pan and into the fire! What I mean by this is, suppose you are traversing the related CI hierarchy (cmdb_rel_ci) and trying to find all the child CI's that would be impacted by a Change. As you traverse the cmdb_rel_ci table there is the possibility of getting in a loop because A depends on B depends on C depends on A. In order to avoid this, you make an array and whenever an affected CI is identified you put it in the array. Then if you encounter any CI that is already in the array you stop looping and go back one level. The only problem is that you might end up with a really big array. So, when using this technique be sure to also employ the techniques mentioned above about keeping GlideRecord objects in memory. Also, you might create a fail-safe that errors out if the array grows past a certain point - as determined by your approximation of how much memory is consumed by each object in the array.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2017 01:20 PM
Chris,
You should be fine with that. Javascript will max out around 4.29 billion elements, I think you're well below that.
Ultimately, it's more of a performance issue than a max array size issue, just keep an eye on it. There may be a more efficient way of doing this, but it shouldn't be necessary unless your LDAP Transform starts taking a long time to run (like isn't done by the next time it needs to run).
-Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2017 02:56 PM
Hi Chris,
Thanks for asking this question. It has prompted me to do some writing that I have been meaning to do for a while. I agree with Rob. The main thing you want to watch out for is performance as opposed to the max array size.
- However, if I understand your question correctly, you have 40,000 users and you want to test that against the content of an import set from your active directory. I guess these import sets have 40,000 or fewer records in them. This would mean two arrays with 40,000 strings in each of them. ArrayUtil.diff() creates a 3rd array for the results, so lets assume that one is smaller, maybe 10,000 strings. So we are looking at 90,000 strings total stored in memory. So this is no problem at all - you might use 1 or 2 MB at the most out of the 500-700 MB that most systems have free at any given time.
- The algorithm for ArrayUtil.diff() does two loops, one nested in the other, so that would be 1,600,000,000 loops. That might be a problem. Even though each loop would execute quickly, a loop like that would potentially have an impact on CPU usage. Also, the total of all loops together might add up in terms of execution time. You are still going to be fine if you are just running this once a day for a couple minutes but you might want to think about a different way to accomplish the goal if you end up scaling to a larger user base. See my recommendations below.
- Sometimes I see people get into trouble when they are running the same code multiple times simultaneously on different threads. You will not have that problem because transform maps run in serial by default. No two transforms can run transforms on the same source/target table combination at the same time [side note: transforms can be made to run in parallel with the glide.import_set_insert_serialized... properties. Use with caution]
What is the goal here? You want to disable any user/group in the system that wasn't in the imported list of users right? I feel like there is another way to accomplish this...
Other options
- Maybe using a negation query against the database. e.g. query all users where name != array of u_samaccountname values.
- Or perhaps you could add a field to the user table and update it with some unique value during transformation - perhaps you could base it on the import set name. Any field not matching the unique value would be deactivated. That query would be pretty simple, select sys_user where u_transform_id != X.
Here are some things that can cause performance issues when working with arrays (or ServiceNow Javascript code in general):
Creating unnecessarily large arrays or strings
The main problem here is memory usage. The Java Virtual Machine that ServiceNow runs in has 2GB of memory total. At any given time it may have a garbage collection floor between 20% and 80%. GC floor above 80% is generally considered dangerously high. Arrays store their data in memory and so retain Java heap space for the duration of the array's life cycle (see de-referencing unneeded objects below). To avoid running your instance out of memory you want to make sure that your arrays do not pull too much data into memory.
One idea that I've had - but never tried - is to write a gs.sleep() statement at the very end of whatever script I want to test for memory. Then check memory before, during and after executing the script by looking at /stats.do. The part that says "Free percentage: 58.0" tells you how much available memory you have at any given time. If you did this on a sub-prod with little else running at the same time you ought to be able to get a pretty good idea of how much memory is being retained by your code.
There is an interesting thread in StackOverflow about how to test object sizes: memory - JavaScript object size - Stack Overflow
Keeping GlideRecord/GlideElement records in memory
One specific example that I see happening often is when someone stores an object reference in an array without realizing it or when they really only need to store a primitive data type. For example, the following code stores a GlideElement object that takes 100's of times the memory needed for a simple sys_id String. Javascript strings store as UTF-16 so a good estimate for size of a string would be total character length x 2 bytes. Assuming the average name has about 15 characters in it, 40,000 names would be about 1MB.
var arrIncident = [];
var myIncident = new GlideRecord("incident");
myIncident.query();
while(myIncident._next()) {
var caller = new GlideRecord("sys_user");
caller.get(myIncident.caller_id);
arrIncident.push(caller.sys_id); ← This is passing a GlideElement
}
Using the getValue() method, like you have, is a good way to avoid this since it return the simple Javascript String type instead of complex objects.
arrIncident.push(caller.getValue("sys_id")); ← Much better
Failing to de-reference unneeded objects
If you have a very long script that builds multiple large arrays, it might be worth it to de-reference any arrays as soon as they are no longer needed.
var o = {a: 1, b:2 }; ← memory retained in object o
var oa = o.a;
var o = null; ← the only reference to the object originally stored in o is gone but we have stored a reference to o.a, one of the properties, so we can't garbage collect yet.
var oa = null; ← now we can garbage collect
See Reference Counting Garbage Collection
Infinite (or Very Deep) looping while traversing a recursive data structure
This can happen a lot in the CMDB and task tables when you want to use the parent/child relationships to check the whole ancestry of a record. When doing this type of operation you run the risk of getting into a very deep or infinite loop - record A has child B has child C has child A (uh oh!). Lots has been written about how to avoid this type of thing so I won't go into depth here, but here are some ideas:
- Make some type of counter that will not let the recursion go beyond a certain number of layers deep.
- Make a max size check that will not let the array get above a certain size
- Keep a list of items that have already been seen, breaking when recursion is detected
Creating huge memory objects while attempting to avoid infinite loops while traversing a recursive data structure
Yes, that long title is intentionally humorous. Basically this problem happens when you are trying to avoid the previous problem by keeping a list of items that have already been seen. If that list gets too big you could be out of the frying pan and into the fire! What I mean by this is, suppose you are traversing the related CI hierarchy (cmdb_rel_ci) and trying to find all the child CI's that would be impacted by a Change. As you traverse the cmdb_rel_ci table there is the possibility of getting in a loop because A depends on B depends on C depends on A. In order to avoid this, you make an array and whenever an affected CI is identified you put it in the array. Then if you encounter any CI that is already in the array you stop looping and go back one level. The only problem is that you might end up with a really big array. So, when using this technique be sure to also employ the techniques mentioned above about keeping GlideRecord objects in memory. Also, you might create a fail-safe that errors out if the array grows past a certain point - as determined by your approximation of how much memory is consumed by each object in the array.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2017 07:32 PM
Sorry, Matthew. I didn't see some of your reply before drafting mine. Don't mean to step on toes.
-Brian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-17-2017 09:11 AM
Great minds think alike, right?