sabell2012
Mega Sage
Mega Sage

 

NOTE: MY POSTINGS REFLECT MY OWN VIEWS AND DO NOT NECESSARILY REPRESENT THE VIEWS OF MY EMPLOYER, ACCENTURE.

 

DIFFICULTY LEVEL: ADVANCED
Assumes having taken the class SSNF and has good advanced level of knowledge and/or familiarity with Scripting in ServiceNow.


In my last article I covered the Jaro-Winkler string matching algorithm (my favorite so far), and compared it to the Levenshtein algorithm. In this last (I promise) article of the the String Matching series I present two other algorithms and we will see if they work better than Jaro-Winkler.

 

They are:

 

So once again, back to the Web! I searched around, but with these I had trouble finding true string-match examples that returned a distance vs. a similarity result. That and the number of JavaScript examples was scanty at best. I ended up converting a couple of Java examples to JavaScript (which went pretty well). After I got those working I found that the matches weren't as good as Jaro-Winkler; which was unexpected. 

 

Especially when dealing with Smith-Waterman. However, in retrospect, I guess I should have expected these results as neither of these are available in the Apache.StringUtils library. Anyway, the exercise was useful in many ways: 1) I got decent at converting Java programs to JavaScript, 2) I proved to myself that Jaro-Winkler was the best approach for string matches.

 

Note: I have attached the Fix Script and Script Includes I created to this article for download. You accept the install as-is and unsupported if you decide to use it.

 

Smith-Waterman

I found two executions of the same Smith-Waterman algorithm. One was based on the other. The root version was in Java, and the second version was a rendering as Pseudo-Code. I converted both to JavaScript, and ran the Jaro-Winkler tests against them. Here were the results:

 

*** Script: ---> 

	______________________joshweir
	aaapppp: 0
	frog: 95.83333333333334
	fly: 0.0
	elephant: 20.0
	hippo: 20.0
	hippo: 0.0
	hello: 80.0
	ABC : 100.0
	D N H Enterprises: 94.64285714285714
	PENNSYLVANIA: 88.54166666666666
	Buddy: 70.625
	Buddy (flipped): 70.625

	______________________mpkorstanje
	aaapppp: 0
	frog: 95.83333333333334
	fly: 0.0
	elephant: 20.0
	hippo: 20.0
	hippo: 0.0
	hello: 80.0
	ABC : 100.0
	D N H Enterprises: 94.64285714285714
	PENNSYLVANIA: 88.54166666666666
	Buddy: 70.625
	Buddy (flipped): 70.625

 

I played around with the gap value, also the max and min values and came relatively close, but not as good as Jaro-Winkler (vs. the Apache.JaroWinkler string library results). As you can see it just wasn't as good.  I messed around quite a bit with the settings, but just couldn't get it close.

 

Jaro-Winkler Results

(From previous article)

 

*** Script: ---> aaapppp: 0  				= 0.0
*** Script: ---> frog: 92.5				= 0.93
*** Script: ---> fly: 0  				= 0.0
*** Script: ---> elephant: 44.166666666666664 		= 0.44
*** Script: ---> hippo: 44.166666666666664 		= 0.44
*** Script: ---> hippo: 0				= 0.0
*** Script: ---> hello: 88				= 0.88
*** Script: ---> ABC : 92.22222222222223		= 0.93
*** Script: ---> D N H Enterprises: 95.25189786059352	= 0.95
*** Script: ---> My Gym 95.16666666666667		= 0.92
*** Script: ---> PENNSYLVANIA: 91.501554001554		= 0.88
*** Script: ---> Buddy: 66.7055167055167		= 61.67
*** Script: ---> Buddy (flipped): 66.7055167055167	= 61.67

 

Needleman-Wunsch

 

I couldn't find a true string-compare JavaScript version of this so again ended up converting a Java program.  Needleman is pretty close in structure to Smith with some slight differences in the array search parameters.  Again, I message around with the gap, min, and max settings. I found a setting that was close to Smith-Waterman, but it ranked poorly against the Jaro-Winkler results.

 

*** Script: ---> 
	______________________programcreek
	aaapppp: 0.0
	frog: 75.0
	fly: 75.0
	elephant: 78.125
	hippo: 78.125
	hippo: 78.125
	hello: 80.0
	ABC : 76.66666666666667
	D N H Enterprises: 94.56521739130434
	PENNSYLVANIA: 84.61538461538461

	Buddy: 84.61538461538461
	Buddy (flipped): 84.61538461538461

 

And there you have it! There were several other possibilities to explore. I will let you pursue that! 

Java String Compare Libraries: link

 

My feeling is the Jaro-Winkler is the best for my purposes, and I will be using that in the future until ServiceNow begins exposing the Packages.Java.Lang libraries again to give us the Java libraries for string searches (and other stuff).

 

Oh, and by-the-way, this is where the infamous Jelly libraries also hide!

 

Enjoy!

Steven Bell.

 

If you find this article helps you, don't forget to log in and mark it as "Helpful"!

 

sabell2012_0-1702242405921.png


Originally published on: 09-04-2018 7:54 AM

I updated the code and brought the article into alignment with my new formatting standard.

Version history
Last update:
‎12-11-2023 06:51 AM
Updated by:
Contributors