drjohnchun
Tera Guru

Last time, we looked at the GQL syntax and how to handle raw and display values of columns. This time, we'll look at the parser, which is closely tied to the GQL syntax, and see how GQL statements can be parsed into various parts for execution. We're not skipping the processor; we'll come back to it later.

PARSER

Let's go step by step and see how we can parse a GQL statement. The below example we saw last time may help you follow the steps:

SELECT   -- sample GQL statement

  number [Number]

  , short_description [Title]

  , dv_state [State]

  , caller_id.email [Caller Email]

  , caller_id.manager.email [Manager Email]

  , dv_cmdb_ci

FROM incident

WHERE active=true^priority<2^EQ^ORDERBYpriority^ORDERBYDESCnumber

LIMIT 5,10

A. Remove comments

A comment starts with two dashes "--" and ends with the end-of-line; all comments and their zero or more preceding white spaces will be removed first before further processing using

var no_comments = query.replace(/\s*--.*/g,'');

This is not foolproof in case there are double dashes elsewhere not meant to be for comments (e.g., column headings), but let's keep it simple for now.

B. Breakdown into parts

A GQL statement has the following four main parts in the following order we defined last time:

  1. SELECT - mandatory; always starts with this.
  2. FROM - mandatory
  3. WHERE - optional (also contains ORDERBY from encoded query strings)
  4. LIMIT - optional

Let's try to pick out these four parts using a regular expression. Regular expression may not be the best tool for parsing, but it may work adequately for our simple logic.

// parse "SELECT select_list FROM table WHERE encoded_query_string LIMIT [offset,] row_count" where WHERE and LIMIT are optional

var regexp = /SELECT\s*([^]+?)\s*FROM\s*(.+?)\s*(?:WHERE\s*([^]+?(?=\s*LIMIT)|[^]+?)\s*)?(?:LIMIT\s*([,\d\s]+?)\s*)?$/

Here's what this does in more detail:

var regexp = /

SELECT\s*([^]+?)\s*   // look for SELECT and capture what follows ([^]+?) including white spaces [^], but ignore zero or more surrounding white spaces \s*

FROM\s*(.+?)\s*           // look for FROM and capture what follows except white spaces (table names don't have spaces)

(?:WHERE\s*([^]+?(?=\s*LIMIT)|[^]+?)\s*)?   // look for optional WHERE clause; if found, capture what follows until LIMIT is encountered or to end of line

(?:LIMIT\s*([,\d\s]+?)\s*)?   // look for optional LIMIT clause and capture numbers separated by a comma

$/   // continue to the end of the last line

Let me clarify a few things since regular expressions, while compact and powerful, are not always self explanatory.

In Line 2, "[^]" represents ALL characters, both white spaces and the rest, so the SELECT list can be written in single or multiple lines. This could also be written as "[\s\S]" but "[^]" is a shorthand (technically, the first caret inside square brackets indicates "exclude the following characters"; since there's no following characters in this case, it excludes none, leading to represent all characters). This leads us to use the lazy quantifier "?" in "([^]+?)" to tell it to "match as few times as possible" and not to capture trailing white spaces; without "?", it gets greedy and keeps going until it sees "FROM" since "[^]" means all characters.

Line 4 is a non-capture group, indicated by "(?:" meaning "match what follows but don't capture." However, what's in the inner parentheses "([^]+?(?=\s*LIMIT)|[^]+)" will still be captured and it'll capture the encoded query string that follows WHERE. The inner parentheses could've been simply "([^]+?)" if we mandated that encoded query strings must be on a single line. Since we're not mandating it, we have to use the positive lookahead "(?=\s*LIMIT)" to tell it to "stop when you see LIMIT". The second part in the inner parentheses "|[^]+?" means to "go all the way to the end if you don't see LIMIT". Line 4 ends with "?" to match 0 or 1 time since the WHERE clause is optional.

Line 6 ends with an anchor, the end of last line "$", to tell lazy quantifiers not to be too lazy and go all the way to the end.

This works well for both single or multiple lines, and whether white spaces are removed or not. We'll add only a simple syntax checker later. A good way to test if a regular expression works as intended is to use an online tool like hifi RegExp Tool; it shows the capture groups in different colors, which is useful for testing. Also, you can view a saved copy of this regex at regex101.com

When this regular expression is executed, it returns four capture groups corresponding to the four main parts of a GQL statement.

C. SELECT

The SELECT part consists of a list of one or more column names, with optional column headings enclosed in square brackets, separated by commas. We'll use

// split column list "column_1, column_2 [column_2_heading], column_3"

var columns = select.split(/\s*,\s*/);

to get an array of columns. For each column, we'll check for column heading and, if it exists, parse it out using:

// parse out "column_name [column_heading]" where "[column_heading]" is optional

var regexp = /([.\w]+)\s*(?:\[(.+)\])?/

D. FROM

The FROM part simply contains the table name, which we can pass on to either of

var gr = new GlideRecord(table);

var gr = new GlideRecordSecure(table)   // enforce ACL

from Scoped GlideRecord.

E. WHERE

The optional WHERE part contains an encoded query string with filter and/or order conditions. We can pass this straight to

gr.addEncodedQuery(encodedQuery);

from Scoped GlideRecord.

F. LIMIT

The optional LIMIT part gives the offset, which is optional, and row_count. After parsing the integers using parseInt(), we can pass these on to either of

gr.chooseWindow(offset, offset + row_count);

gr.setLimit(row_count);

from Scoped GlideRecord.

Next time, we'll continue with the rest of the design.

Please feel free to connect, follow, post feedback / questions / comments, share, like, bookmark, endorse.

John Chun, PhD PMP see John's LinkedIn profile

visit snowaid

1 Comment