sabell2012
Mega Sage
Mega Sage

 

NOTE: MY POSTINGS REFLECT MY OWN VIEWS AND DO NOT NECESSARILY REPRESENT THE VIEWS OF MY EMPLOYER, ACCENTURE.

 

DIFFICULTY LEVEL: ADVANCED
Assumes having taken the class SSNF and has good advanced level of knowledge and/or familiarity with Scripting in ServiceNow.


So why am I pursuing this, what might seem, very boring topic?  Well, this need has come up so often, and it is so inadequately addressed with the platform; that it needs a library of such functions to meet that need!

 

The idea here is to provide a fuzzy search that will allow the developer the ability to find close matches. Data-being-data you most likely will NOT be in control of what you are asked to process.  Thus the need for this sort of thing!

 

So, the questions that present themselves: How many times have you had to go manually looking for a particular user in the sys_user table and been frustrated by effort? Sometime it takes you a bit to track them down via the source field? Wouldn't it be nice to have something human-like go through and find such things for you and just present you with the results?  Quickly?

 

Answer for me: You bet! Happens all the time!

 

Okay, getting off my soap-box here. 🙂

 

In this next-to-last article on string search algorithms (yes, yes, there will be one more) I will present a really great fuzzy search method: Jaro-Winkler Distance.

 

I searched the internet over (sounds like an old Hee-Haw song doesn't it?), and found several implementations in JavaScript. I converted a few of these, that looked reasonable, into a single Script Include Library, and then created a Fix Script that utilized the same data and results from my previous article for comparison purposes.

 

Note: I have attached the Fix Script and Script Includes I created to this article for download.  You accept the install as-is and unsupported if you decide to use it.

 

See: Community Code Snippets: Scripted String Matching Techniques - Part 2 (Levenshtein)

 

In running the Fix Script the results were very interesting! For the "Buddy" matches I used my Levenshtein results as control values.

 

The results below are from the Fix Script. I edited them to include the control information and the variance (if significant):

 

*** Script: ---> 

	Jordan Thomas script:
	aaapppp: 0
	frog: 92.50
	fly: 0
	elephant: 44.17
	hippo: 44.17
	hippo: 0
	hello: 88.00
	ABC : 90.67
	D N H Enterprises: 95.25
	My Gym 94.20
	PENNSYLVANIA: 89.80
	Buddy: 66.71
	Buddy (flipped): 66.71

	Shannan Archer script:
	aaapppp: 0
	frog: 92.5
	fly: 0
	elephant: 44.166666666666664
	hippo: 44.166666666666664
	hippo: 0
	hello: 88.0
	ABC : 92.22222222222223
	D N H Enterprises: 95.25189786059352
	My Gym 95.16666666666667
	PENNSYLVANIA: 91.501554001554
	Buddy: 66.7055167055167
	Buddy (flipped): 66.7055167055167

	thsig script (Apache StringUtils Java library as control values):
	aaapppp: 0
	frog: 75.0
	fly: 0
	elephant: 58.25
	hippo: 44.166666666666664
	hippo: 0
	hello: 88.96000000000001
	ABC : 90.91228070175438
	D N H Enterprises: 87.82239477891652
	My Gym 86.39526508336331
	PENNSYLVANIA: 83.47820512820513
	Buddy: 90.69427268180065
	Buddy (flipped): 87.10764538417449

 

As you can see the differences from the control values were not that far off! I was especially interested in the Levenshtein accuracy. Overall not to different from the Apache Java library, and more accurate than the Levenshtein algorithm.

 

Okay, so now leaning toward Jaro-Winkler Distance as the better string matching method.

 

In my next article I will cover two other algorithms and we will see if they work better than Jaro-Winkler. They are:

  • Smith-Waterman
  • Needleman-Wunsch

Enjoy!

Steven Bell.

 

If you find this article helps you, don't forget to log in and mark it as "Helpful"!

 

sabell2012_0-1702242390318.png


Originally published on: 09-01-2018 7:45 AM

I updated the code and brought the article into alignment with my new formatting standard.

Version history
Last update:
‎12-10-2023 02:38 PM
Updated by:
Contributors