
- Post History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
08-23-2018 05:32 AM - edited 12-09-2023 06:19 AM
NOTE: MY POSTINGS REFLECT MY OWN VIEWS AND DO NOT NECESSARILY REPRESENT THE VIEWS OF MY EMPLOYER, ACCENTURE.
DIFFICULTY LEVEL: ADVANCED
Assumes having taken the class SSNF and has good advanced level of knowledge and/or familiarity with Scripting in ServiceNow.
Continuing where I left off with my previous article on string matching functions (see link) I will now dive into the deep dark mysterious depths of string matching algorithms! Seriously. This is rocket science folks! So, not for me to figure it all out by myself; I went straight to the web. Fortunately, there are some great resources out there, and I was not forced to figure it all out myself!
First, of course, came my due-diligence. Investigation of string matching algorithms. I found there were two that did what I was looking for. Mainly:
- How close were two strings I wanted to match?
- Return percent of sameness
I researched the following (from Comparison of String Distance Algorithms😞
- Hamming Distance - Number of positions with the same symbol in both strings (not useful for my purposes)
- Levenshtein Distance - Minimal insertions, deletions, replacements needed to make the strings look alike
- Damerau-Levenshtein Distance - With transpositions
- Longest Common Substring - Rolled my own one of these, so decided to skip
- n-gram Distance - Also rolled my own one of these; skipped this as well
- Cosine Distance (n-gram) - similar in accuracy to the n-gram. Skipped.
- Jaro-Winkler Distance - Five parameters determine the percentage sameness of the compared strings
- Fuzzy Distance
The n-gram was similar to an algorithm I had created myself, so I didn't bother with that. The ones that looked intriguing were the Levenshtein Distance, and the Jaro-Winkler Distance. Both, btw, are ones chosen by Apache to be in their StringUtils library; along with Fuzzy Distance. None of the others make an appearance there.
In this article I will tackle the Levenshtein Distance solution to string matching. Fortunately, I don't have to write it all up myself, but can borrow from the cumulative work of others at The complete guide to string similarity algorithms.
Note: All examples were run in my Personal Instance on the London version in Scripts - Background or Fix Scripts.
Research
- Community Code Snippets: Scripted String Matching Techniques - Part 1
- Comparison of String Distance Algorithms
- Lucene
- Apache StringUtils
- Levenshtein Distance
- Pre-built JavaScript functions: The complete guide to string similarity algorithms (for my comparison exercise)
- JavaScript example:
Here is what is in Vancouver that we don't have access to (crying here) - Apache Commons StringUtils:
Copyright 2001-2011 The Apache Software Foundation
This product includes software developed by
The Apache Software Foundation (http://www.apache.org/).
This product includes software from the Spring Framework,
under the Apache License 2.0 (see: StringUtils.containsWhitespace()
Note: To find this info navigate to System Diagnostics > Stats > Stats and click on the Open Source software link. Then search the PDF when it opens for the string Lucene. This file is worth poking around with, and keeping a copy for future reference.
Note: Let me interject a note here: I wish like crazy that the ServiceNow development team would expose the Lucene java library for us (really me) to use! or just-as-good: the Apache StringUtils library! Please?!?!? Arrrrgh! (not a pirate sound; think anguish!).
Library
Firstly, I adapted the Levenshtein examples to ServiceNow by creating a Script Include. This required that I clean up the code a bit (fixed up missing semi-colons basically). NO, I didn't go through and replace all the one- and two-character variables. Sigh. I figured if I decided to use one of these functions I could do so later. It was REALLY tempting though. None of these functions, in its current state, is really maintainable. 😛
Fix Script
Finally, I created a Fix Script to test out all of these functions.
Testing
I used the test cases from the Apache StringUtils site, and then a couple that I came up with myself. I then tacked on a percentage for the last two to determine actual match accuracy.
Note: I have attached the Fix Script and Script Include I created to this article for download. You accept the install as-is and unsupported if you decide to use it.
Upload the Script Include attached to this article.
Here were my results:
*** Script: CompareLD1------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 7
*** Script: ---> 4. 7
*** Script: ---> 5. 23
*** Script: ---> 6. 23
*** Script: ---> 7. 61.66666666666667
*** Script: ---> 8. 61.66666666666667
*** Script: CompareLD2------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 7
*** Script: ---> 4. 7
*** Script: ---> 5. 23
*** Script: ---> 6. 23
*** Script: ---> 7. 61.66666666666667
*** Script: ---> 8. 61.66666666666667
*** Script: CompareLD3------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 7
*** Script: ---> 4. NaN
*** Script: ---> 5. NaN
*** Script: ---> 6. NaN
*** Script: ---> 7. NaN
*** Script: ---> 8. NaN
*** Script: ---> 9. NaN
*** Script: CompareLD6------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 7
*** Script: ---> 4. 7
*** Script: ---> 5. 7
*** Script: ---> 6. 23
*** Script: ---> 7. 23
*** Script: ---> 8. 61.66666666666667
*** Script: ---> 9. 61.66666666666667
*** Script: CompareLD7a------------------------------
*** Script: ---> 1. 0
*** Script: ---> 3. 7
*** Script: ---> 2. undefined
*** Script: ---> 4. undefined
*** Script: ---> 5. undefined
*** Script: ---> 6. undefined
*** Script: ---> 7. NaN
*** Script: ---> 8. NaN
*** Script: CompareLD7b------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. undefined
*** Script: ---> 4. undefined
*** Script: ---> 5. undefined
*** Script: ---> 6. undefined
*** Script: ---> 7. NaN
*** Script: ---> 8. NaN
*** Script: CompareLD8------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 7
*** Script: ---> 4. 7
*** Script: ---> 5. 23
*** Script: ---> 6. 23
*** Script: ---> 7. 61.66666666666667
*** Script: ---> 8. 61.66666666666667
*** Script: CompareLD9------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 7
*** Script: ---> 4. 7
*** Script: ---> 5. 23
*** Script: ---> 6. 23
*** Script: ---> 7. 61.66666666666667
*** Script: ---> 8. 61.66666666666667
*** Script: CompareLD10------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 7
*** Script: ---> 4. 7
*** Script: ---> 5. 23
*** Script: ---> 6. 23
*** Script: ---> 7. 61.66666666666667
*** Script: ---> 8. 61.66666666666667
*** Script: CompareLD13------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 6.5
*** Script: ---> 4. 6.5
*** Script: ---> 5. 39.5
*** Script: ---> 6. 39.5
*** Script: ---> 7. 34.166666666666664
*** Script: ---> 8. 34.166666666666664
*** Script: CompareLD-Dziemba------------------------------
*** Script: ---> 1. 0
*** Script: ---> 2. 7
*** Script: ---> 3. 6
*** Script: ---> 4. 6
*** Script: ---> 5. 23
*** Script: ---> 6. 23
*** Script: ---> 7. 61.66666666666667
*** Script: ---> 8. 61.66666666666667
*** Script: ------------------------------
[0:00:01.136] Total Time
As you can see, some algorithms failed entirely. While the rest were not quite as good as the 77% match from my previous article. However, interesting none-the-less. It might have its uses later, but it was still not what I was looking for in the way of sameness. In my part 3 article I will tackle a better method!
Enjoy!
Steven Bell.
If you find this article helps you, don't forget to log in and mark it as "Helpful"!
Originally published on: 08-23-2018 7:32 AM
I updated the code and brought the article into alignment with my new formatting standard.
- 1,277 Views