Stopword list not working predictive intelligence clustering

GijsBeerens
Tera Contributor

Hi all, 

I am currently working on a clustering solution for incidents. However many of our incident content is generated through templates, which causes my clustering solution to register the template texts as clusters. 

Problem: The stopword list I am using does not seem to work, whichever format I put in. 

Example: 
What I want to exclude: "BSO concerned:"
What I have tried as stopwords in separate stopword configurations:

  • "BSO concerned"
  • "BSO concerned:" 
  • BSO, concerned
  • BSO,concerned
  • BSO, concerned: 

However, these things still show up in my cluster analysis. Does anyone have an idea what I might be doing wrong? 

Looking forward to your replies! 

 

 



1 REPLY 1

Abhay Kumar1
Giga Sage

@GijsBeerens Might be tokenized issue, the text preprocessing in clustering often breaks down text into tokens (words). The problem might be due to tokenization splitting "BSO concerned:" into separate tokens like "BSO" and "concerned". In this case, listing "BSO concerned" as a stopword will not work.Try this if works for you.