NLU Utterances - Use of filler words

Joe85 · ‎12-09-2020

Does anyone have any best practices for tuning the NLU? I've tried watching the SNow videos and reading the training, but they do not help as they do not go in-depth enough.

How do you mark filler words like "I want to" and "How do I" as irrelevant? I find that intents with the most filler words have the highest confidence. For example, If the user says "I want to request feedback", our intent on Recognition has an 80% threshold, while the Request Feedback intent doesn't even show up on the predicted results. The Recognition intent contains no utterances on request or feedback, but it does have "I want" several times.

Is it better to not include filler words in your utterances? Every training I've seen shows filler words being used. How does everyone else get around this?

Thanks for your help.

D van Heusden · ‎12-10-2020

Hi Joe,

There is no marking as irrelevant option. And you shouldn't (have to). From the information you provide I cannot determine whether your intents have an overlap, for example if the utterances you use in the Recognition intent are quite similar to the ones you use in the Request Feedback intent.

Let us take your example utterance: "I want to request feedback"

Questions to answer are:

1. What is the intent that I expect to match this utterance, if any?

2. In my current model what is the scoring for this utterance? Which intents does it score against and what are those scores?

3. If, as you mention, the expected intent is not even in the list of intent scores then:

Check the utterances for the expected intent
What utterances are in there that would partially match the example utterance
Look at the utterances shapes and see if they capture the example utterance. I'm sure there is a gap here, so I would first add the most obvious one: "request feedback"
This utterance would be good to have for the following reasons:

It is directly related to the intent name "Request Feedback" (not that there is a bonus for intent naming and utterances but it just make sense to have a sensible name for your intents and then a matching utterance)
It is a very relevant utterance as you expect many variations that would include "request feedback"
From a keyword search perspective you probably would have also included it as one of the key words

Train the model after making this first change and test. Did it impact the scores? If it works, great!
If not, let's look further. Add additional shapes of expected utterance that are not yet covered but still relevant (as in that you expect them to be used by your users or they have been used already in conversations), for example:

"Can i get feedback"
"Need feedback on something"

Train the model after making this first change and test. Did it impact the scores? If it works, great!
If not, think about additional shapes that you have not covered. And in some cases you will need to add something similar to the example utterance, like:

"I want to request a review" (if with review you mean the same as with feedback)

Train the model after making this first change and test. Did it impact the scores? If it works, great!

4. In general, do not make vocabulary modifications for this. They may have unintentional consequences downstream and it should not be necessary as it is common English.

Think about it, when you make an addition to the vocabulary you are telling the machine to always replace x y z for a. For example: "where do i", "can i", "i want to", "i need to" would then always be replaced with "how do I".
It would be better to add various shapes with the various ways of asking for something coupled with different ways how you name that something you are asking for like "can i get feedback" and "i would like to request a review" next to "request feedback".
Reserve the vocabulary changes for those words that are domain, organisation or product specific.

5. Keep the number of utterances for your intents limited. They should not cover all of the exact possible combinations but more the jist of it.

Here's how I tried your example in my Orlando instance:

1. I had a trained model with a number of intents already, then tested the example utterance without having an intent for it:

2. Logically, not a match to the intended intent as it was not there. So I added it with just a single utterance "request feedback"

3. Hmm, still no score. Okay let's add some extra shapes of how it could be asked:

4. That did it! The other intents are still scoring 76% and 75% so I might want to look at the utterances for those intents to see if they try to also cover this new "Request Feedback" intent. If so, re-phrase or remove them from those intents.

Let me know your thoughts.

View solution in original post

Susan Britt · ‎12-09-2020

I am no expert, as I've only implemented VA a few times, but here's what I found to help. I do normally include those filler words in the Utterances, but also create Vocabulary and Synonyms for the filler words/phrases. For example: "how do I" base vocabulary could have "where do i", "can i", "i want to", "i need to" as synonyms. I also suggest creating synonyms for common filler words that also include slang and regional terms, like "can't" > cannot, can not; "want" > wanna, need, like, desire; "calendar" > diary, event, schedule.

D van Heusden · ‎12-10-2020

Hi Joe,

There is no marking as irrelevant option. And you shouldn't (have to). From the information you provide I cannot determine whether your intents have an overlap, for example if the utterances you use in the Recognition intent are quite similar to the ones you use in the Request Feedback intent.

Let us take your example utterance: "I want to request feedback"

Questions to answer are:

1. What is the intent that I expect to match this utterance, if any?

2. In my current model what is the scoring for this utterance? Which intents does it score against and what are those scores?

3. If, as you mention, the expected intent is not even in the list of intent scores then:

Check the utterances for the expected intent
What utterances are in there that would partially match the example utterance
Look at the utterances shapes and see if they capture the example utterance. I'm sure there is a gap here, so I would first add the most obvious one: "request feedback"
This utterance would be good to have for the following reasons:

It is directly related to the intent name "Request Feedback" (not that there is a bonus for intent naming and utterances but it just make sense to have a sensible name for your intents and then a matching utterance)
It is a very relevant utterance as you expect many variations that would include "request feedback"
From a keyword search perspective you probably would have also included it as one of the key words

Train the model after making this first change and test. Did it impact the scores? If it works, great!
If not, let's look further. Add additional shapes of expected utterance that are not yet covered but still relevant (as in that you expect them to be used by your users or they have been used already in conversations), for example:

"Can i get feedback"
"Need feedback on something"

Train the model after making this first change and test. Did it impact the scores? If it works, great!
If not, think about additional shapes that you have not covered. And in some cases you will need to add something similar to the example utterance, like:

"I want to request a review" (if with review you mean the same as with feedback)

Train the model after making this first change and test. Did it impact the scores? If it works, great!

4. In general, do not make vocabulary modifications for this. They may have unintentional consequences downstream and it should not be necessary as it is common English.

Think about it, when you make an addition to the vocabulary you are telling the machine to always replace x y z for a. For example: "where do i", "can i", "i want to", "i need to" would then always be replaced with "how do I".
It would be better to add various shapes with the various ways of asking for something coupled with different ways how you name that something you are asking for like "can i get feedback" and "i would like to request a review" next to "request feedback".
Reserve the vocabulary changes for those words that are domain, organisation or product specific.

5. Keep the number of utterances for your intents limited. They should not cover all of the exact possible combinations but more the jist of it.

Here's how I tried your example in my Orlando instance:

1. I had a trained model with a number of intents already, then tested the example utterance without having an intent for it:

2. Logically, not a match to the intended intent as it was not there. So I added it with just a single utterance "request feedback"

3. Hmm, still no score. Okay let's add some extra shapes of how it could be asked:

4. That did it! The other intents are still scoring 76% and 75% so I might want to look at the utterances for those intents to see if they try to also cover this new "Request Feedback" intent. If so, re-phrase or remove them from those intents.

Let me know your thoughts.

Joe85 · ‎12-10-2020

Hi David,

First, let me say "Wow!". I never expected this kind of support from the community. You really exceeded my expectations as this was my first post.

I tried your suggestions and that increased the score of the Request Feedback intent to 88%. I still have other intents testing at 80% and 79% even though they do not have Request or Feedback in their utterances. I will try reducing the number of times "I want to" and "I need to" show up in their utterances and see if that helps.

Thanks again for taking the time to help.

Anurag Tripathi · ‎12-10-2020

So you should have marked David's response as Correct mate!!

-Anurag