What index should it be? - OR - Slow Queries explained

iris_geist · ‎09-29-2015

Since Fuji, administrators can now add indexes to their own instances as needed. Fellow support agent Bill Tang wrote in detail how you can Improve your ServiceNow instance performance by creating database indexes via the User Interface To expand on this function, I want to focus on what you can do if you have a slow query in the slow query log and want to know how to make this specific query faster.

A note of caution though - adding too many indexes, even if they work well for a query - can be detrimental to the table's overall performance. Always feel free to consult with your support engineers to determine if adding the new index will be beneficial.

Example of a slow query log

First you need to analyze the Slow Queries log under System Diagnostics and to do that, you will need to be able to interpret all the information given in every log entry. I have used my instance as an example on how and in which order to read the slow query information:

slow log.jpg

First, I sort my list descending on Total execution time - this is where I can make the biggest difference, because these slow queries are executed most often. I usually don't worry about queries that are in the single or double digits, but everything higher than that for average execution time, I will analyze more closely. Once I have finished my analysis on those findings, I move on to the queries with the highest average execution time.

Let us look at a single query that needs improvement and analyze it:

Slow Queries Example:

SELECTFROM (sys_dictionary sys_dictionary0 INNER JOIN sys_metadata sys_metadata0 ON sys_dictionary0.`sys_id` = sys_metadata0.`sys_id` ) WHERE sys_dictionary0.`choice_table` IS NOT NULL

Total execution time: 50 Minutes

Execution count: 40.091

Average execution time (ms): 75,72

Example stack trace:

glide.scheduler.worker.0

com.glide.db.query_stats.QueryPattern.setStackTrace(QueryPattern.java:59)

com.glide.db.query_stats.QueryStats.handleMetaData(QueryStats.java:111)

com.glide.db.query_stats.QueryStats.addToNursery(QueryStats.java:86)

com.glide.db.query_stats.QueryStats.recordQuery(QueryStats.java:74)

com.glide.db.query_stats.QueryStats.recordQuery(QueryStats.java:63)

com.glide.db.DBI.saveStats(DBI.java:1035)

com.glide.db.DBI.sqlVerbosity(DBI.java:1029)

com.glide.db.DBI.executeStatement(DBI.java:785)

com.glide.db.DBI.executeStatement(DBI.java:766)

com.glide.db.DBQuery.executeAsResultSet0(DBQuery.java:290)

com.glide.db.DBQuery.executeAsResultSet0(DBQuery.java:265)

com.glide.db.DBQuery.executeAndReturnTable(DBQuery.java:231)

com.glide.db.DBAction.executeNormal(DBAction.java:197)

com.glide.db.DBAction.executeAndReturnException(DBAction.java:166)

com.glide.db.RDBMSQueryContext.executeQuery(RDBMSQueryContext.java:46)

com.glide.db.DBQuery.execute(DBQuery.java:1772)

com.glide.choice.ChoiceDependency.<init>(ChoiceDependency.java:39)

com.glide.choice.ChoiceDependency.get(ChoiceDependency.java:28)

com.glide.choice.ChoiceChangeListener.expandMe(ChoiceChangeListener.java:49)

com.glide.choice.ChoiceChangeListener.flushChanges(ChoiceChangeListener.java:37)

com.glide.db.AChangeListener.onExecute(AChangeListener.java:43)

com.glide.db.DBAction.processListeners(DBAction.java:143)

com.glide.db.DBAction.executeAndReturnException(DBAction.java:171)

com.glide.script.GlideRecordITable.delete(GlideRecordITable.java:177)

com.glide.script.GlideRecord.delete(GlideRecord.java:4913)

com.glide.script.system.GlideSystemUtilDB.tableClearByTable(GlideSystemUtilDB.java:523)

com.glide.script.system.GlideSystemUtilDB.cleanupTableReferences(GlideSystemUtilDB.java:504)

com.glide.script.system.GlideSystemUtilDB.dropTable(GlideSystemUtilDB.java:741)

com.glide.script.system.GlideSystemUtilDB.js_dropTable(GlideSystemUtilDB.java:428)

...

org.mozilla.javascript.gen.c5175.call(sys_script_include.77e30b080a00052638e7274157b9b97a:30)

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)

org.mozilla.javascript.gen.c3087.call(sys_script_include.bb1996e80a0a0b0f00499e4a748ddfb7:58)

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)

org.mozilla.javascript.gen.c3088.call(sys_script_include.bb1996e80a0a0b0f00499e4a748ddfb7:67)

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)

org.mozilla.javascript.gen.c5173.call(sysevent_script_action.2227b84a0a0a0b950004868f9ece6509:3)

org.mozilla.javascript.gen.c5173.exec(sysevent_script_action.2227b84a0a0a0b950004868f9ece6509)

com.glide.script.ScriptEvaluator.execute(ScriptEvaluator.java:233)

...

com.glide.script.fencing.GlideScopedEvaluator.evaluateScript(GlideScopedEvaluator.java:192)

com.glide.policy.ScriptActionHandler.executeScriptInScope(ScriptActionHandler.java:144)

com.glide.policy.ScriptActionHandler.process0(ScriptActionHandler.java:62)

com.glide.policy.ScriptActionHandler.process(ScriptActionHandler.java:39)

com.glide.policy.EventProcessor.processEventDuringNormalOperation(EventProcessor.java:164)

...

com.glide.policy.EventManager._process(EventManager.java:160)

com.glide.policy.EventManager.process(EventManager.java:142)

...

org.mozilla.javascript.FunctionObject.doInvoke(FunctionObject.java:597)

org.mozilla.javascript.FunctionObject.call(FunctionObject.java:504)

org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1227)

...

com.glide.schedule.JobExecutor.execute(JobExecutor.java:79)

com.glide.schedule.GlideScheduleWorker.executeJob(GlideScheduleWorker.java:177)

com.glide.schedule.GlideScheduleWorker.process(GlideScheduleWorker.java:124)

com.glide.schedule.GlideScheduleWorker.run(GlideScheduleWorker.java:56)

Example URL: events process

First sighting: 2015-09-22 17:10:33

The Explain Plan

OrderSelect TypeTableTypePossible KeysKeyKey lengthRefRowsExtraID0SIMPLEsys_dictionary0ALLPRIMARY25.629Using where11SIMPLEsys_metadata0eq_refPRIMARYPRIMARY96empigeist2_2.sys_dictionary0.sys_id11

The Database Indexes

Table	Reference Table	Index
sys_scope	sys_scope	sys_metadata
reference	reference	sys_dictionary
element, name	element	sys_dictionary
sys_id	PRIMARY	sys_metadata
sys_id	PRIMARY	sys_dictionary
sys_package	sys_package	sys_metadata
name, element	name	sys_dictionary
sys_update_name, sys_class_name	sys_update_name	sys_metadata

This is all the information available to you in the Slow Queries log. I will explain them in the order in which they make most sense to me - from the time the query was issued to the additional information that helps us understand what is going on.

Interpreting the slow query log information

Query Source

Through the example URL we know that this query was issued by the events process. Another common example URL might be /home.do - meaning the slow query is on a user's home page. The stack trace confirms that this query comes from a worker: com.glide.schedule.GlideScheduleWorker.run(GlideScheduleWorker.java:56).

Following the stack further, I see org.mozilla.javascript.gen.c5175.call(sys_script_include.77e30b080a00052638e7274157b9b97a:30), which refers to a script include called TableDrop, which in line 30 gs.dropTable(tableName); which includes - of course - querying the dictionary record for which table to drop.

Query run times

Now that we know the background of this query - how often did it run, how long did it take? The answers to this can be found here:

Total execution time: 50 Minutes

Execution count: 40.091

Average execution time (ms): 75,72

Then the question becomes - why does it take so long for a simple query?

Query analysis

The query is:

SELECTFROM (sys_dictionary sys_dictionary0 INNER JOIN sys_metadata sys_metadata0 ON sys_dictionary0.`sys_id` = sys_metadata0.`sys_id` ) WHERE sys_dictionary0.`choice_table` IS NOT NULL

What does this query do?

It first queries the sys_dictionary table for all records where choice_table is not null. Then it joins the result of that with the information in the sys_metadata table for all records where the sys_ids match.

Interpreting an explain plan

Why does this take so long? To find that out, we look at the explain plan:

Order	Select Type	Table	Type	Possible Keys	Key	Key len	Ref	Rows	Extra	ID
0	SIMPLE	sys_dictionary0	ALL	PRIMARY				25.629	Using where	1
1	SIMPLE	sys_metadata0	eq_ref	PRIMARY	PRIMARY	96	empigeist2_2.sys_dictionary0.sys_id	1	Extra	1

This is the real interesting part - what does all this mean?

The more common Select Types are:

SIMPLE; no UNION or subquery
PRIMARY; outermost SELECT
UNION; second or later SELECT in a UNION
SUBQUERY; first SELECT in subquery
DERIVED; derived table (subquery in WHERE clause)

The most commonly seen types are - in order of efficiency:

Name	Description
const	At most, one matching value, treated as a constant. I.e. active = 1
eq_ref	Joining or looking up unique index values. JOIN uses a unique index or key prefix. I.e. joining two tables on the PRIMARY key.
ref	Joining or looking up non-unique index values. JOIN uses a non-unique index or key prefix. Indexed fields compared with '=', '!='. Best data access strategy for non-unique values.
index_merge	Perform several index searches using different keys from same table and merge the results.
range	Range index scan. The key column is compared to a constant using operators like BETWEEN, IN, >, >=. I.e. sys_created_on > "2015-09-23 00:00:00".
index	Full Index scan. The entire index tree is scanned.
ALL	Full Table scan. The Entire table is scanned.

Now that we have all this information, what does our explain mean:

Order	Select Type	Table	Type	Possible Keys	Key	Key len	Ref	Rows	Extra	ID
0	SIMPLE	sys_dictionary0	ALL	PRIMARY				25.629	Using where	1
1	SIMPLE	sys_metadata0	eq_ref	PRIMARY	PRIMARY	96	empigeist2_2.sys_dictionary0.sys_id	1		1

We are querying two tables, sys_dictionary first, and then sys_metadata. In the case of sys_dictionary, we use a full table scan through the primary key (sys_id) and have to scan through 25.629 records, that we then filter out from there through the where clause. On the sys_metadata table, we merge the results of the first query record by record based on the primary key, sys_id.

Why did it use the primary key and a full table scan on the dictionary query? The answer is simple - there is no index on choice_table, so it had no matching index to go with. If we were to add an index (non-unique, since we do not create unique indexes for query tuning) on choice_table, our explain would then look like something like this:

Order	Select Type	Table	Type	Possible Keys	Key	Key len	Ref	Rows	Extra	ID
0	SIMPLE	sys_dictionary0	ref	PRIMARY, choice_table	choice_table	40		5.427	Using index; using where	1
1	SIMPLE	sys_metadata0	eq_ref	PRIMARY	PRIMARY	96	empigeist2_2.sys_dictionary0.sys_id	1		1

As you can tell, the extra now tells us an index was found and used. The number of rows we had to comb through to get to our records is significantly smaller, and it returns way faster than before.

Talking about the extras, here is a sampling of what they can be:

Using filesort; filtered rows are gone through again for sorting
Using index; all data is retrieved from a single index (covering index)
Using index; using where; index is also used for all key lookups
Using temporary; a temporary table is required to process the query
Using where; a WHERE clause is used
Using join buffer; rows are re-used from the join buffer

Finding the best index for your query

How do you know if an index will be good and taken or not - looking at the query is the first step, but there is also this thing called "cardinality." At one point, the database will decide that taking the index has no benefit and will revert to a full table scan. You will look at the explain plan very confused and say "I gave it an index, why would it not take it???"

Imagine the following situation: You are searching for all users named Jim that work in the Finance department. Your company has a total of 30000 users, out of those, 500 are named Jim, there are nearly a thousand different names in your list of users, and you have 15 departments in your company (meaning about 2000 users per department).

The story of cardinality is - how can I narrow down my results fastest, so I don't have to search through too much information. In our search for Jim in Finance - are we better off searching for Jim first, or for the Finance Department?

If we search (and index) for Jim first, we will go through an index tree that has nearly a thousand sorted nodes (all the different first names). If we search (and index) for departments first, we will go through only 15 nodes. But if you look at how many records it would find within the record, you would have 500 for the name, and around 2000 for the department, meaning searching and indexing for name first will narrow our results down fastest and furthest.

Different situation - you want to find all closed incidents in your system. 99% of all your incidents are closed. Will an index on state help? Most likely not, since there are so many closed incidents compared to the complete set of data, that it very likely will revert to a full table scan at that point.

Sometimes you need to change they query, not the index

And of course there are situations where an index cannot be used at all. We all love the "contains" searches that ServiceNow offers. In SQL speak, they translate to a LIKE '%xxx%', or a double wildcard. Databases cannot use an index on that type of query at all - always ending up with a full table scan (unless there is a saving grace of another part of the query that can be indexed)

Another example are OR queries against different fields: WHERE first_name="Jim" OR last_name="Smith" - in this case, it will first search all Jims, then all Smith's and merge the results together - it cannot use one index for both. We always recommend making those two queries and merge the results through other methods.

One more common example - if you group by one thing and order by another, the same situation occurs - what to take? Then I won't take any... : GROUP BY first_name, ORDER BY last_name

---

Special thanks to our fantastic Performance Engineer Scott Nemes at ServiceNow who reviewed and provided some of those great tables that I included in here. Scott, you are the BEST!

What index should it be? - OR - Slow Queries explained

Example of a slow query log

The Explain Plan

The Database Indexes

Interpreting the slow query log information

Query Source

Query run times

Query analysis

Interpreting an explain plan

Finding the best index for your query

Sometimes you need to change they query, not the index

Troubleshooting Service Bridge Company Selection Issues: A Deep Dive

Maria DB VS Raptor DB - Record Insertion Performance - Part 1

Application Scoring and Application Assessment in Enterprise Architecture