- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-24-2024 12:45 PM
Our knowledge base was built with migrate first and cleanup later. It's time for cleanup later now. Does anyone know the best way to run a report to find duplicate content in knowledge articles? We are implementing better practices so our authors avoid this in the future, but is there an easy way to see current duplicate content? Not KB numbers, but content itself. Thanks.
Solved! Go to Solution.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-24-2024 02:26 PM
Hi, @rosenst22 !
What you MIGHT be able to do is pull down a report of all your articles into a spreadsheet. The content field will also be pulled down.
Once you have your spreadsheet, you can either use Excel or some data program (such as CloudPak) to look for duplicates across the content field - even if it's, say, an 80% match (or whatever you consider close enough to be a duplicate article).
If you work with the data this way, NEVER EVER get rid of the SYS_ID. It's the anchor or identifier that everything hinges upon. If you're not working with any other fields, you can delete all those columns, if you want to. You only need to keep the field or fields that you're actively wanting to change. HOWEVER, keep that sys_ID.
CAVEAT: The content field will load with the content HTML, not the plain format. Because of character limits in the cells of Excel spreadsheets, articles can (and absolutely will - ask me how I know) truncate. So, for example, you might only get the first half or first two-thirds of a longer article. And because it's the HTML coding, not the content, an article with more formatting/coding to it will not load as much of the meat of the article.
2ND CAVEAT: This method won't work as well if you didn't clean up your coding before migrating articles. What I mean by that is, if you upload articles from Word docs and PDFs, you're highly likely (unless you did any kind of format clean up or used the new "strip formatting" tool, which of course didn't exist 5 years ago when I started doing this) to have a bunch of gobbledygook and bloated coding on the backend. If you went with stripped-down coding (i.e., put everything in notepad first to strip the formatting or used the "strip formatting" tool, so you have just a <p>words</p> kind of look to your back end), then you'll be able to compare articles more easily and accurately.
That's just what I'm thinking off the top of my head.
Another way you could do this is to do a more manual search by key phrases. Content > contains > key phrase about specific topic. See what comes up in the results. This can be done from list view. If you have a lot of articles or suspect a lot of duplicates, this may not work. However, I've done this for small searches, such as looking for articles that all use the same link to a resource (so I can update the link), and it's worked well.
I hope this is helpful (please mark as helpful if so), or I hope it gets the wheels turning for your or someone else in the community who may have dealt with a similar issue. All the best to you!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-24-2024 02:26 PM
Hi, @rosenst22 !
What you MIGHT be able to do is pull down a report of all your articles into a spreadsheet. The content field will also be pulled down.
Once you have your spreadsheet, you can either use Excel or some data program (such as CloudPak) to look for duplicates across the content field - even if it's, say, an 80% match (or whatever you consider close enough to be a duplicate article).
If you work with the data this way, NEVER EVER get rid of the SYS_ID. It's the anchor or identifier that everything hinges upon. If you're not working with any other fields, you can delete all those columns, if you want to. You only need to keep the field or fields that you're actively wanting to change. HOWEVER, keep that sys_ID.
CAVEAT: The content field will load with the content HTML, not the plain format. Because of character limits in the cells of Excel spreadsheets, articles can (and absolutely will - ask me how I know) truncate. So, for example, you might only get the first half or first two-thirds of a longer article. And because it's the HTML coding, not the content, an article with more formatting/coding to it will not load as much of the meat of the article.
2ND CAVEAT: This method won't work as well if you didn't clean up your coding before migrating articles. What I mean by that is, if you upload articles from Word docs and PDFs, you're highly likely (unless you did any kind of format clean up or used the new "strip formatting" tool, which of course didn't exist 5 years ago when I started doing this) to have a bunch of gobbledygook and bloated coding on the backend. If you went with stripped-down coding (i.e., put everything in notepad first to strip the formatting or used the "strip formatting" tool, so you have just a <p>words</p> kind of look to your back end), then you'll be able to compare articles more easily and accurately.
That's just what I'm thinking off the top of my head.
Another way you could do this is to do a more manual search by key phrases. Content > contains > key phrase about specific topic. See what comes up in the results. This can be done from list view. If you have a lot of articles or suspect a lot of duplicates, this may not work. However, I've done this for small searches, such as looking for articles that all use the same link to a resource (so I can update the link), and it's worked well.
I hope this is helpful (please mark as helpful if so), or I hope it gets the wheels turning for your or someone else in the community who may have dealt with a similar issue. All the best to you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-18-2024 05:59 PM
Hi @rosenst22
We have used servicenow's similarity analysis solution links below and also contextual search
*Similarity Solution : Here ML model can be used based on fields chosen from KB artilcle like Short description, meta, etc and ML model when run will return articles having similarity and will also give similarity score.
Results can be stored in custom table and reports can then be created for further actionable items.
It is important to note that it will not give 100% accuracy but we found it is good to start and based on current developments in AI space ServiceNow AI ML models may improve.
*Contextual Search :
Contextual search is a search technology that focuses on the context of the query as well as the intent of the user in order to fetch the most relevant set of results. Contextual search displays related results within a form or record producer based on the text you enter in a field.
In our case, it appears in KB creation screen itself and based on keywords it shows top 5 articles to author and author can browse them to see and can avoid creating duplicate articles.
By implenting above we have been able to improve the knowledge base and reduce duplicates.
Hope this helps.
Good Luck.
Regards,
Dipesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-18-2024 06:08 PM
Hi @rosenst22
We have used servicenow's similarity analysis solution links below and also contextual search
*Similarity Solution : Here ML model can be used based on fields chosen from KB artilcle like Short description, meta, etc and ML model when run will return articles having similarity and will also give similarity score.
Results can be stored in custom table and reports can then be created for further actionable items.
It is important to note that it will not give 100% accuracy but we found it is good to start and based on current developments in AI space ServiceNow AI ML models may improve.
*Contextual Search :
Contextual search is a search technology that focuses on the context of the query as well as the intent of the user in order to fetch the most relevant set of results. Contextual search displays related results within a form or record producer based on the text you enter in a field.
In our case, it appears in KB creation screen itself and based on keywords it shows top 5 articles to author and author can browse them to see and can avoid creating duplicate articles.
By implementing above we have been able to improve the knowledge base and reduce duplicates.
Hope this helps.
Good Luck.
Regards,
Dipesh