Manjeet Singh
ServiceNow Employee
ServiceNow Employee

find_real_file.png

 

Sentiment analysis capabilities would seem to have come a long way in last two years – but is still far from perfect. No matter what sentiment API providers (IBM, Google, Azure) do you use, it’s important to understand how you can approach the performance and accuracy measurement. 

Since we launched Survey Sentiment Analysis support in London release, I am seeing this question getting asked quite often, "how to measure the accuracy of sentiments results?".

 

As you may know, most Sentiment Analysis algorithms would categorize the data into Positive / Neutral / Negative. So, the rule of thumb is to measure performance is whether the system categorized the data in accordance with the intuition of the user. This a very abstract as well as subjective problem, whose accuracy cannot be measured by plain mathematics.

Training and Testing data

You need testing data that has been verified by the human. There are plenty of sentiment analysis training data sets available for free. Training means that each chunk of text has been pre-categorized and verified by a human. For example, you can use a popular Twitter comment data set consisting of 498 tweets categorized by topic and sentiment as: Negative: 177, Neutral: 139, Positive: 182

Measuring the performance

There are three very important numbers that go into determining how well a sentiment analysis system works.

1. Accuracy: A measure of how often a sentiment rating is correct. [Num. of Correct Queries / Total Num. of Queries] - You would use this to check the overall accuracy of the system.

 

2. Recall: A measure of how many words with sentiment were rated as sentimental. This could be seen as how accurately the system determines neutrality.

 

3. F1 Score:

F-Score is a combination of precision and recall. This one of the most important measures and will tell you how your system is performing

The formula for calculating F1 Score is:

F1 = [ 2 * (Precision * Recall) / (Precision + Recall) ] 

 

The score is in a range of 0.0 - 1.0, where 1.0 would be perfect. The F1 Score is very helpful, as it gives us a single metric that rates a system by both precision and recall.  

 

Additional reading:

How to do Sentiment Analysis on ServiceNow Survey Result?

1 Comment