
- Post History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
09-04-2018 05:54 AM - edited 12-11-2023 06:51 AM
NOTE: MY POSTINGS REFLECT MY OWN VIEWS AND DO NOT NECESSARILY REPRESENT THE VIEWS OF MY EMPLOYER, ACCENTURE.
DIFFICULTY LEVEL: ADVANCED
Assumes having taken the class SSNF and has good advanced level of knowledge and/or familiarity with Scripting in ServiceNow.
In my last article I covered the Jaro-Winkler string matching algorithm (my favorite so far), and compared it to the Levenshtein algorithm. In this last (I promise) article of the the String Matching series I present two other algorithms and we will see if they work better than Jaro-Winkler.
They are:
So once again, back to the Web! I searched around, but with these I had trouble finding true string-match examples that returned a distance vs. a similarity result. That and the number of JavaScript examples was scanty at best. I ended up converting a couple of Java examples to JavaScript (which went pretty well). After I got those working I found that the matches weren't as good as Jaro-Winkler; which was unexpected.
Especially when dealing with Smith-Waterman. However, in retrospect, I guess I should have expected these results as neither of these are available in the Apache.StringUtils library. Anyway, the exercise was useful in many ways: 1) I got decent at converting Java programs to JavaScript, 2) I proved to myself that Jaro-Winkler was the best approach for string matches.
Note: I have attached the Fix Script and Script Includes I created to this article for download. You accept the install as-is and unsupported if you decide to use it.
Smith-Waterman
I found two executions of the same Smith-Waterman algorithm. One was based on the other. The root version was in Java, and the second version was a rendering as Pseudo-Code. I converted both to JavaScript, and ran the Jaro-Winkler tests against them. Here were the results:
*** Script: --->
______________________joshweir
aaapppp: 0
frog: 95.83333333333334
fly: 0.0
elephant: 20.0
hippo: 20.0
hippo: 0.0
hello: 80.0
ABC : 100.0
D N H Enterprises: 94.64285714285714
PENNSYLVANIA: 88.54166666666666
Buddy: 70.625
Buddy (flipped): 70.625
______________________mpkorstanje
aaapppp: 0
frog: 95.83333333333334
fly: 0.0
elephant: 20.0
hippo: 20.0
hippo: 0.0
hello: 80.0
ABC : 100.0
D N H Enterprises: 94.64285714285714
PENNSYLVANIA: 88.54166666666666
Buddy: 70.625
Buddy (flipped): 70.625
I played around with the gap value, also the max and min values and came relatively close, but not as good as Jaro-Winkler (vs. the Apache.JaroWinkler string library results). As you can see it just wasn't as good. I messed around quite a bit with the settings, but just couldn't get it close.
Jaro-Winkler Results
(From previous article)
*** Script: ---> aaapppp: 0 = 0.0
*** Script: ---> frog: 92.5 = 0.93
*** Script: ---> fly: 0 = 0.0
*** Script: ---> elephant: 44.166666666666664 = 0.44
*** Script: ---> hippo: 44.166666666666664 = 0.44
*** Script: ---> hippo: 0 = 0.0
*** Script: ---> hello: 88 = 0.88
*** Script: ---> ABC : 92.22222222222223 = 0.93
*** Script: ---> D N H Enterprises: 95.25189786059352 = 0.95
*** Script: ---> My Gym 95.16666666666667 = 0.92
*** Script: ---> PENNSYLVANIA: 91.501554001554 = 0.88
*** Script: ---> Buddy: 66.7055167055167 = 61.67
*** Script: ---> Buddy (flipped): 66.7055167055167 = 61.67
Needleman-Wunsch
I couldn't find a true string-compare JavaScript version of this so again ended up converting a Java program. Needleman is pretty close in structure to Smith with some slight differences in the array search parameters. Again, I message around with the gap, min, and max settings. I found a setting that was close to Smith-Waterman, but it ranked poorly against the Jaro-Winkler results.
*** Script: --->
______________________programcreek
aaapppp: 0.0
frog: 75.0
fly: 75.0
elephant: 78.125
hippo: 78.125
hippo: 78.125
hello: 80.0
ABC : 76.66666666666667
D N H Enterprises: 94.56521739130434
PENNSYLVANIA: 84.61538461538461
Buddy: 84.61538461538461
Buddy (flipped): 84.61538461538461
And there you have it! There were several other possibilities to explore. I will let you pursue that!
Java String Compare Libraries: link
My feeling is the Jaro-Winkler is the best for my purposes, and I will be using that in the future until ServiceNow begins exposing the Packages.Java.Lang libraries again to give us the Java libraries for string searches (and other stuff).
Oh, and by-the-way, this is where the infamous Jelly libraries also hide!
Enjoy!
Steven Bell.
If you find this article helps you, don't forget to log in and mark it as "Helpful"!
Originally published on: 09-04-2018 7:54 AM
I updated the code and brought the article into alignment with my new formatting standard.