Zing Text Search - Automatic Stop Words

Rick Mann
Tera Expert

I'm looking at the wiki article for Administering Zing Text search and have a question on Automatic Stop Words.   Is there a recommended "Auto Threshold" that should be set?   I'm looking to set this on our Task table.   Thanks.

 

5.3 Automatic Stop Words

To configure automatic stop words for a table:

  1. Navigate to System Definition > Text Indexes.
  2. Open the text index entry for the table.
  3. Select the Auto stop check box.
  4. Enter the maximum number of occurrences for a non-stop word in the Auto threshold field.
  5. Click Update.
    When the number of occurrences of a word exceeds the threshold, the word is automatically add as a stop word for the table with the Stop mode of Index but do not Query and a Comment to indicate that the stop word was generated automatically. Automatic stop words are compiled on a nightly basis by the TS Index Stats scheduled job.

 

Administering Zing Text Search - ServiceNow Wiki

3 REPLIES 3

Mwatkins
ServiceNow Employee
ServiceNow Employee

Hi Rick,


This is a great question as Stop Words are very important to implement. Stop words will improve performance of your text search operations dramatically. I've seen reductions in search times of 40x in some cases!!



The usual recommendation for large tables (over 1 million records) is to implement stop words with a threshold of 50,000. This will take any word that occurs 50,000 times or more and place it into your auto stop word list. Here's the basic process of how to implement a new stop word threshold, a more detailed explanation can be found at Configuring auto stop words and regenerate text indexes


  1. Set threshold for auto stop words
  2. Reset Text Search Caches
  3. Run 'TS Index Stats' job. This will evaluate your new stop words threshold and show you what words will become stop words when you re-index. If you do not like the results that you are seeing, run step #3 again.
  4. Once you think you have the threshold right, Customer Support also recommends setting the Stop Mode of all Stop Words to "Neither Index nor Query". This will stop the system from wasting processing overhead and disk space by indexing words that you will not be searching on. Also, if there are any words that are frequent enough to be stop words but, for whatever reason, you still do not want them to be stop words, you should set the Stop Mode to "Not a Stop Word". This can happen for certain company names or product names that occur frequently but will still need to be included to produce meaningful search results.
  5. The final step is to reindex your table(s). This will take some time and cause some slowness. It should be done during your slowest usage period (usually starting Friday night and running over a weekend). Zing will rebuild the text search index tables using the stop words as a guide for what not to index. The text search tables will be much smaller. This will make your searches quicker.


But setting stop words to 50,000 is not a one-size-fits-all solution. Some customers may find that 50,000 words is way too small. Others may find it way too big. As you might imagine, it all depends on how many words are out there. A customer with 1,000,000,000 task records will probably want a higher threshold than a customer with only 100,000 tasks. It is going to be different for each customer and for each table.



That being said, there are a couple things that can help you make a decision when adjusting your stop word thresholds:



1. Do you really need to up the threshold or can you just make exceptions for certain words? Suppose you set your stop words at 50,000 and then your users start complaining that certain words aren't brining back any of the results they expect. This can happen when a word they want to use is now a stop word. If there are certain words that are very common but you still want them to be indexed anyway you can make an exception for those words. You can do this through the Text Indexes module by opening the record for the table you want to fix, finding the word in question in the Ts index stops related list, and changing the Stop Mode of the record to "Not a Stop Word". If the Stop Mode was "Index but do not Query" then there is nothing more to do. That word will now show up in search results for that table. However, if the Stop Mode was set to "Neither Index nor Query" then you will need to click the "Regenerate Text Index" button on the same form after you have set the word to "Not a Stop Word". Caution: This will re-index the text search database for the whole table - a process that is resource intensive and may take hours to complete for large tables.



2. If you decide to raise the threshold then you will need to Regenerate Text Index for all desired tables (you might just want to do it for all tables at once - that can be done by the link at the bottom of the list that opens when you click the Text Indexes module). Before you decide on a Stop Words threshold you should probably have a look at the frequency of the words that are currently being stopped and confirm that your adjustment will be a higher number than the frequency of words that you do not want to be stopped. You can see the frequency of each word in the Comment field of each record in the Ts index stops related list.



3. You may need to adjust the stop words thresholds of your large tables as your tables grow. If you are a new customer your task table might double in size every couple months. The stop words threshold that worked fine when you first implemented is now way too tight.



In the end, it all comes down to this: the higher your stop words threshold the slower text search will operate. I've seen customers who have gone as high 150,000 words for their threshold on some tables with no issues. Hope this helps!



Regards, Matthew Watkins


Mwatkins
ServiceNow Employee
ServiceNow Employee

By the way, I'd say that the normal 50,000 recommendation would be for a task table including attachments with between 1,000,000 and 5,000,000 records. That's just a finger in the air estimate but hopefully it will serve as a good starting point for you.


I worked on it a year back, but there were gaps in my understanding.


Thanks a lot for your succinct explanation.