Best practices for big data processing jobs - Memory vs Queries?

robby3 · ‎04-23-2015

I am making a nightly job that does a lot of data processing.

There are two ways to build this job.

1) Do 1 single massive query in the beginning to get all my data, then loop through to process it. This approach uses much more memory, but offloads a lot of work from the db server.

2) Load only the data I need when I need it. Much easier to code. Start with one master query to only get items, then as I loop through each item, and make subsequent queries to get item specific data to process, then discard data I no longer need. This has a much lower memory footprint, but will make many, relatively simple db queries.

Does ServiceNow have a Best practices for this type of work? How are the system optimized; for memory or query usage? Any recommendations?

Thanks

-Robby

sergiu_panaite · ‎04-23-2015

Hi Robby,

Based on your two ways to build the job:

1. If the processing is done on the server side once the resultset has been retrieved from database, the load will be on the application node. If your scripts to process the resultset are complex (building huge arrays, using regular expressions on large fields) it can cause the application node to run out of memory eventually. This can be seen as outage by users logged in that application node. Remember that each application node is just a JVM with a maximum of 2GB memory allocated. So testing would be required in this case before applying such job on production.

2. This approach might work well and even if you do more database queries as long as they are fast (proper indexes should be verified before) the impact will probably be lower.

In reality both ways have to be tested properly before such a job is deployed on a production instance.

Regards,

Sergiu