drjohnchun
Tera Guru

Last time, we saw how statistical sampling can be used with Control Test Definitions (Indicators in Helsinki) in the GRC application; although we had to calculate the sample size manually, we were able to leverage the built-in random sampling feature. This time, we'll take a look at how we can build a tool that calculates a sample size for us and retrieves sample data using a ServiceNow feature called Scripted Filter (NOTE: this requires admin role).

SCRIPT INCLUDE statSample()

First, we need to create a Script Include with a function that performs statistical sampling; let's call the function statSample, short for statistical sampling. At a high level, here's what statSample() does:

  1. Takes the table name and encoded query string needed for querying.
  2. Optionally, it also takes margin of error and confidence level; if omitted, they default to 5% margin of error and 95% confidence level.
  3. Query the table and get the population size using .getRowCount().
  4. Determine the sample size from the population size, margin of error and confidence level.
  5. Pick a random row from the record set and save the sys_id; repeat until the sample size number of unique records are collected.
  6. Return the saved sys_ids in an array.

To create a new Script Include,

  1. Log in with admin role.
  2. Navigate to System Definition > Script Includes.
  3. Click on New button to create a new Script Include.
  4. Fill the form as in the screenshot below:
    find_real_file.png
  5. This can be either Global or Scoped; if Scoped, make sure to jot down the API Name to be used later in Scripted Filter.
  6. Since we want to use this from other applications, set Accessible from to All application scopes.
  7. For this to be used as a Scripted Filter, Client callable must be checked.
  8. In the Script field, paste the below script (also attached below as a file).
  9. Finally, click on Submit.

/**

* Performs statistical sampling against a filtered list of GlideRecords.

* Takes table name and encoded query string then returns an array of sys_id of sample records.

*

* Version 0.1.2 2016-12-06

*

* SCRIPTED FILTER EXAMPLES

*

* EXAMPLE 1: SHOW DEBUG DATA

* [Sys ID] [is] [javascript:statSample('incident', '', 3, 99, true)]

* returns Sys ID = {"confidenceLevel":99,"encodedQuery":"","marginOfError":3,"population":7255,"sampleSize":1474,"tableName":"incident","zValue":2.58}

*

* EXAMPLE 2: GET STATISTICAL SAMPLE USING ENCODED QUERY STRING, AT DEFAULT 95% CONFIDENCE LEVEL AND 5% MARGIN OF ERROR

* [Sys ID] [is] [javascript:statSample('incident', 'active=1')]

* returns 382 sample records from the population size of 54,939 at 95% confidence level with 5% margin of error

*

* NOTE: Scripted Filter runs in rhino.sandbox context so not all classes/objects are available for scripting

* NOTE: The function is run twice in Scripted Filter somehow, so use randomSampleRecords to run only once.

*

* MIT License http://tldrlegal.com/l/mit

* Copyright (c) 2016 John.Chun @snowaid.com

*

* @param {string} tableName - name of table for GlideRecord

* @param {string} encodedQuery - encoded query string

* @param {int} marginOfError - margin of error in percent (for example, 5 for 5% margin of error)

* @param {int} confidenceLevel - confidence level in percent (for example, 95 for 95% confidence level)

* @param {boolean} debug - if true, returns debug data (input and derived values)

* @return {string[]} array of sys_id of statistical sample records

*/

var statSampleRecords = [];   // this is in rhino.sandbox context in Scripted Filter; otherwise in global

function statSample(tableName, encodedQuery, marginOfError, confidenceLevel, debug) {

  if (statSampleRecords.length) return statSampleRecords;   // in Scripted Filter, force to run only once

  try {

      //var gr = new GlideRecordSecure(tableName);   // enforce ACL; GlideRecordSecure undefined in Scripted Filter in Helsinki

      var isScriptedFilter = !this.GlideRecordSecure;   // use the fact that GlideRecordSecure is undefined in Scripted Filter

      var gr = new GlideRecord(tableName);

      if (!gr.isValid()) throw 'Invalid table name "' + tableName + '".';

      if (!gr.canRead()) throw 'No permission to read from "' + tableName + '".';   // test ACL for table

      // get population

      if (encodedQuery) gr.addQuery(encodedQuery);

      gr.query();   // to getRowCount()

      var population = gr.getRowCount();

      if (!population) return [];

      // get sample size

      marginOfError = marginOfError || 5;   // default to 5%

      confidenceLevel = confidenceLevel || 95;   // default to 95%

      var zValues = { 80: 1.28, 85: 1.44, 90: 1.65, 95: 1.96, 99: 2.58 };

      var zValue = zValues[confidenceLevel] || 1.96;   // default to 1.96 == confidenceLevel of 95%

      var sampleSize = Math.ceil(1 / (1 / population + Math.pow(0.02 * marginOfError / zValue, 2)));

      // show debug data (input and derived values)

      if (debug) return JSON.stringify({

          tableName: tableName,

          encodedQuery: gr.getEncodedQuery(),

          marginOfError: marginOfError,

          confidenceLevel: confidenceLevel,

          zValue: zValue,

          population: population,

          sampleSize: sampleSize,

      });

      // throw dice and get a random sample

      var records = [], offsets = [];

      while (records.length < sampleSize && offsets.length < population) {

          var offset = Math.floor(Math.random() * population);   // 0 <= offset < population

          if (indexOf(offsets, offset) >= 0) continue;   // dupe offset, so rethrow dice

          offsets.push(offset);

          gr.chooseWindow(offset, offset + 1);   // works in global & scoped

          gr.query();

          if (gr.next()) records.push(gr.sys_id.toString());

      }

      if (isScriptedFilter) statSampleRecords = records;   // in Scripted Filter, save statSampleRecords

      return records;

  }

  catch (e) {

      return 'ERROR: ' + e;   // return error message

  }

  // emulates Array.prototype.indexOf() in older JavaScript

  function indexOf(arr, val) { for (var i = 0; i < arr.length; i++) if (arr[i] == val) return i; return -1; }

}

SCRIPTED FILTER IN CONTROL TEST DEFINITION

Scripted Filters are used in Condition Builders to add custom conditions. They can be used wherever Condition Builders are used, such as list views, related lists, reports, and some forms, like the Control Test Definition form we saw last time. They run Script Includes that return an array of sys_ids, which is then used to retrieve the corresponding records.

Let's go back to the same Control Test Definition we used in Part 4: GRC and modify a few fields to use statSample(). As shown in the screenshot below, we only need to

  1. Set Sample size to 0, that is, all records resulting from Control test condition.
  2. Set Control test condition to [Sys ID] [is] [javascript:statSample('cmdb_ci')] (the value field is obscured in the screenshot; in case you defined statSample() in a scoped app, make sure to prefix statSample() with the API name, like x_12345_myapp.statSample). We're only passing the table name in statSample(); in most cases, you'll also pass the second parameter, encoded query string, to refine the query.

find_real_file.png

When you click on Execute Now, it'll add a new Control Test with the sample as Supporting Data. So now, you don't have to manually calculate the sample size and enter into the Sample size field.

THE GOOD, THE BAD, AND THE UGLY

By utilizing a Scripted Filter, now we can use statistical sampling wherever Condition Builders are used (and there are quite a few places), without having to manually calculate the sample size. Now, that's great. The bad part is that it's not very efficient, since it has to evaluate the query twice; first time in statSample() to select the sys_ids of the sample records, then the same query has to run again to collect the same records outside of statSample(). The random selection algorithm inside statSample() is not very efficient either; there are more efficient ways to do the same using pure SQL, but we are confined to using GlideRecord here. You'll notice the slow performance when the population size is large. Luckily, statistical sampling is not something we use often so the performance should be acceptable in most cases.

Now to the ugly part; when we call statSample(), we have to pass table name and encoded query string as parameters. If you look at the above screenshot, the table name "cmdb_ci" is already on the form, in the Table field. However, statSample(), being a Script Include, is unaware of it, so we have to explicitly pass it using the raw name of the table "cmdb_ci" and not the more friendly "Configuration Item". It gets worse with the encoded query string; we can't use the Condition Builder field Control test condition to add more conditions (technically we can, but it may alter the sample, invalidating the results). Any encoded query to be applied to the table must be passed to statSample() as a string parameter. Again, statSample() has no knowledge of the Condition Builder on the form. So you need to define the encoded query in a List View and use Copy query context menu to copy the string and then pass it on to statSample() as the second parameter; not ideal. Luckily, in most cases, these queries won't change often once defined.

So while not the most elegant solution, it'll be useful in many cases where statistical sampling can be used. If you know of better ways of doing this, I'd like to know.

ROW-LEVEL ACL (ACCESS CONTROL LIST)

While we're on this topic, I'd like to briefly touch upon row-level ACLs that may keep some records from displaying if the user doesn't have the permission to see them. statSample() uses GlideRecord, which doesn't respect ACLs; however, once the sys_ids are returned and used to retrieve the records the second time, ACLs are typically enforced so some rows may not display. This impairs sampling (unless it is done purposefully to find out how many "hidden" rows are present in a population). Although these cases should be rare, it's something to be aware of. My recommendation is to apply filters appropriately so the entire result set (population) is readable and available for sampling.

OTHER USE CASES

Because statSample() is a Script Include that can be called from the server as well as the client, it can be used beyond Scripted Filter. It returns only an array of sys_ids, but it can be easily modified to return an array of GlideRecods by changing Line 75 to

          if (gr.next()) records.push(gr);

so the entire GlideRecord is added to the records array. This may be more useful in some cases.

Next time, we'll continue looking at other related features from ServiceNow.

UPDATES

2016-12-07 updated script to Version 0.1.2

2016-12-05 revised script to run only once when called from Scripted Filters; for reasons unknown, the function is run twice. Also updated the attached script file. This improved the performance by about two folds. See And the winner is... Randomly Selecting GlideRecords for more details.

Please feel free to connect, follow, post feedback / questions / comments, share, like, bookmark, endorse.

John Chun, PhD PMP see John's LinkedIn profile

visit snowaid

ServiceNow Advocate
1 Comment