Query a date range in weeks

zschneider · ‎08-14-2012

I have a table with years worth of data that I want to organize in another table. I'm trying to find a way where I can run a scheduled job that loops through an entire week for the past 51 weeks and runs some data in the query.

Here's what i've tried (that sadly doesn't work):


for (var weekCount = 1; weekCount < 52; weekCount++) {
         variable = retrieveCount(weekCount);
}

function retrieveCount(weekCount){
         var tuc = new GlideRecord('table');
         tuc.addQuery('sys_created_on', '>=', gs.weeksAgoStart(weekCount));
         tuc.addQuery('sys_created_on', '<', gs.weeksAgoEnd(weekCount));
         tuc.query();
         while (tuc.next()){
         //do stuff
}
}

Does anyone have any idea what I could use instead of weeksAgoStart / weeksAgoEnd to cycle through 52 weeks?

Thanks!

James_Neale · ‎08-16-2012

First, I would recommend reconsidering how you code this script as you are currently looking at running 52 separate queries where you could perform 1 query, add an orderBy statement, then perform your loop and determine the week of each record in script - unless of course you are looking at handling absolutely huge record sets where performing seperate queries may be better for server memory purposes.

This is just an example of what I would look at doing; it has not been tested:




// get all the data for the past year (assuming you don't have an obscene number of records)
var gr = new GlideRecord('table');
gr.addQuery('sys_created_on', '<=', gs.yearsAgo(1));
gr.orderBy('sys_created_on');
gr.query();

// loop through each record
while (gr.next()) {
  // get the glide date element object to work with and determine the week
  var date = gr.getElement('sys_created_on');
  var week = date.getWeekOfYearUTC();
 
  // work

}

zschneider · ‎08-17-2012

Each individual query will return about 10,000 table entries, so i'm not sure if that should be limited for memory or not. If I did all 52 in one scan you're looking at a little over half a million individual entries.

I was splitting it up because it's easier for me to parse date ranges, but if you think that would be more efficient I could definitely change it.

The other problem I have with this script is after I run this I then want to parse by user and remove duplicates, I've tried using multiple scripts to remove duplicates from an array, but the array is so large that it still leaves duplicates (we're talking about 50,000 entries in an array), so I ended up having to query each user individually to get a count of how many entries they have into the table.

In all this one script would probably causes about 30,000 queries to be run on the server.

James_Neale · ‎08-17-2012

This is some query you are trying to perform! It's a shame that there isn't (that I know of at least) a function on the GlideRecord object that allows you to perform an SQL distinction to save some of that processing.

I think 10,000 records at a time, doing what you want to do, is probably a high enough number to justify running separate queries given your context. The only way you'll tell for sure is by writing and running both solutions and checking the transaction history to see which one is more efficient/faster - though you will have to be very careful with memory if you want to handle half a million records at once, it may be better to avoid this.

One issue here is that I'm not sure exactly what you were trying to achieve - is this for some kind of report, a data transformation? Also:
- I'm not sure what you mean by 'parse by user' - I assume you are trying to organise the data based on the user association?
- How do you end up with 50,000 items in an array (certainly a call for efficiency)?
- And, how does all this lead to performing 30,000 queries?

If you are doing this often (or even if you're not), it would be worth ensuring that the fields you are querying on can be, and are, indexed. Speak to SN support to sort this out if necessary.

Hopefully this is of some help.
Cheers

zschneider · ‎08-17-2012

I think the way I'm going about doing this code is really inefficient and kind of dumb, I just haven't been able to successfully find a more efficient means.

We have a custom table that acts very much like the sys_logs table. Every time a user does an insert or update on the task table (or its children tables) a record is created on this table. I personally would have preferred updating one record for the specific span of time we want to record, rather than making a record for every entry, but this was created before I started and we need the previous data.

Anyway what I plan to do is organize this data into another table that counts how many entries there are for each user and records it weekly, that way we can get stats on table use for each sub-task table.

My first idea was to query all records made within that week (the timespan we want to record) and then use java script to parse the number of entries for each user with the itil role (755 users). Sadly because of the sheer number of entries, arrays are acting weird and not accurately running the way they do in much smaller numbers. So now my idea is to run a loop of all itil role users and then query each user individually. 755(number of users) * 51 (retro weeks) = over 30k queries.

I don't want to have to manually run this query 51 times, so my hope was to find a loop that could just query each weeks start/end date and run it automatically.

If you have any idea of a more efficient way to accomplish this task, I'd be all for it. There are slightly over 1,000,000 entries on the table I'm attempting to organize, and it will run weekly after this first batch, so sorting by indvidual users (returns a small enough query to manipulate with javascript) by week (the measure we're organizing by) seems to make the most sense.