Find your people. Pick a challenge. Ship something real. The CreatorCon Hackathon is coming to the Community Pavilion for one epic night. Every skill level, every role welcome. Join us on May 5th and learn more here.

Stopword list not working predictive intelligence clustering

GijsBeerens
Tera Contributor

Hi all, 

I am currently working on a clustering solution for incidents. However many of our incident content is generated through templates, which causes my clustering solution to register the template texts as clusters. 

Problem: The stopword list I am using does not seem to work, whichever format I put in. 

Example: 
What I want to exclude: "BSO concerned:"
What I have tried as stopwords in separate stopword configurations:

  • "BSO concerned"
  • "BSO concerned:" 
  • BSO, concerned
  • BSO,concerned
  • BSO, concerned: 

However, these things still show up in my cluster analysis. Does anyone have an idea what I might be doing wrong? 

Looking forward to your replies! 

 

 



1 REPLY 1

Not applicable

@GijsBeerens Might be tokenized issue, the text preprocessing in clustering often breaks down text into tokens (words). The problem might be due to tokenization splitting "BSO concerned:" into separate tokens like "BSO" and "concerned". In this case, listing "BSO concerned" as a stopword will not work.Try this if works for you.