Stopword list not working predictive intelligence clustering
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-11-2024 07:24 AM
Hi all,
I am currently working on a clustering solution for incidents. However many of our incident content is generated through templates, which causes my clustering solution to register the template texts as clusters.
Problem: The stopword list I am using does not seem to work, whichever format I put in.
Example:
What I want to exclude: "BSO concerned:"
What I have tried as stopwords in separate stopword configurations:
- "BSO concerned"
- "BSO concerned:"
- BSO, concerned
- BSO,concerned
- BSO, concerned:
However, these things still show up in my cluster analysis. Does anyone have an idea what I might be doing wrong?
Looking forward to your replies!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-11-2024 07:38 AM
@GijsBeerens Might be tokenized issue, the text preprocessing in clustering often breaks down text into tokens (words). The problem might be due to tokenization splitting "BSO concerned:" into separate tokens like "BSO" and "concerned". In this case, listing "BSO concerned" as a stopword will not work.Try this if works for you.