- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
We are currently implementing a new knowledge search function for the Virtual Agent.
In preparation for acceptance testing, we are developing test guidelines. Could you please provide us with best practices based on recommendations and examples regarding the following three points?
1. Recommended Test Items Are there any essential aspects that should be covered to evaluate search accuracy and user experience? We are currently considering the following items:
Keyword Matching: Search accuracy using exact wording.
Natural Language Understanding: Search accuracy using sentences or spoken language.
Zero-hit Handling: System behavior when no matching articles are found.
2. Number of Test Data To statistically evaluate search accuracy, what is the typical number of data points (number of knowledge articles and test queries) required?
3. Evaluation Metrics Please advise if there are any metrics generally used as a "passing grade" for evaluating response accuracy.
Solved! Go to Solution.
- Labels:
-
Virtual Agent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
hey @晃秀黒
Recommended Test Items
To properly evaluate both search accuracy and user experience, testing should cover the following areas:
Keyword Matching
Verify search accuracy using:Exact keywords
Partial keywords
Synonyms
Abbreviations
Common misspellings
The correct article should ideally appear within the top 1 to 3 results.
Natural Language Understanding
Test full conversational queries such as:
“I forgot my VPN password, how do I reset it?”
“My email is not working on my phone.”
Evaluate:
Relevance of returned results
Logical ranking order
Avoidance of unrelated articles
Zero-hit Handling
Validate system behavior when no article matches:
User-friendly fallback message
Suggested alternative keywords
Option to contact support or create a case
Ranking Quality
Even when the correct article appears:
Is it ranked first?
Are outdated articles ranked above updated ones?
Are low-quality articles ranked too high?
Context Awareness
Test behavior based on:
User roles
Department-based visibility
Language filtering
Authenticated vs non-authenticated users
Conversational User Experience
Assess:
Number of steps to resolution
Article preview clarity
Formatting inside chat
Need for escalation
Recommended Test Data Volume
There is no strict universal standard, but the following is commonly recommended:
Knowledge Base Size:
Minimum 200 to 500 articles for meaningful testing
Preferably 1,000+ for mature environments
Test Queries:
Minimum 50 queries
Recommended 100 to 300 queries for statistical reliability
Suggested distribution for 100 queries:
20 exact keyword searches
20 natural language queries
20 ambiguous queries
15 typo or misspelled queries
15 zero-result scenarios
10 complex or edge-case queries
Using at least 100 test queries provides a more stable evaluation baseline.
Evaluation Metrics and Passing Criteria
Precision at Top Results
Precision@1: Correct article appears at rank 1
Precision@3: Correct article appears within top 3
Recommended benchmarks:
Precision@1 >= 70 to 80 percent
Precision@3 >= 85 to 90 percent
Success Rate
Percentage of queries resolved without escalation
Recommended:
Minimum 75 percent
85 to 90 percent indicates strong performance
Zero-hit Rate
Percentage of queries returning no results
Target:
Less than 5 to 10 percent
Escalation Rate
Percentage of sessions requiring live agent support
Target:
Less than 20 percent
Average Time to Resolution
Time from query submission to article access
Recommended:
Under 30 seconds on average
*************************************************************************************************************************************
If this response helps, please mark it as Accept as Solution and Helpful.
Doing so helps others in the community and encourages me to keep contributing.
Regards
Vaishali Singh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
hey @晃秀黒
Recommended Test Items
To properly evaluate both search accuracy and user experience, testing should cover the following areas:
Keyword Matching
Verify search accuracy using:Exact keywords
Partial keywords
Synonyms
Abbreviations
Common misspellings
The correct article should ideally appear within the top 1 to 3 results.
Natural Language Understanding
Test full conversational queries such as:
“I forgot my VPN password, how do I reset it?”
“My email is not working on my phone.”
Evaluate:
Relevance of returned results
Logical ranking order
Avoidance of unrelated articles
Zero-hit Handling
Validate system behavior when no article matches:
User-friendly fallback message
Suggested alternative keywords
Option to contact support or create a case
Ranking Quality
Even when the correct article appears:
Is it ranked first?
Are outdated articles ranked above updated ones?
Are low-quality articles ranked too high?
Context Awareness
Test behavior based on:
User roles
Department-based visibility
Language filtering
Authenticated vs non-authenticated users
Conversational User Experience
Assess:
Number of steps to resolution
Article preview clarity
Formatting inside chat
Need for escalation
Recommended Test Data Volume
There is no strict universal standard, but the following is commonly recommended:
Knowledge Base Size:
Minimum 200 to 500 articles for meaningful testing
Preferably 1,000+ for mature environments
Test Queries:
Minimum 50 queries
Recommended 100 to 300 queries for statistical reliability
Suggested distribution for 100 queries:
20 exact keyword searches
20 natural language queries
20 ambiguous queries
15 typo or misspelled queries
15 zero-result scenarios
10 complex or edge-case queries
Using at least 100 test queries provides a more stable evaluation baseline.
Evaluation Metrics and Passing Criteria
Precision at Top Results
Precision@1: Correct article appears at rank 1
Precision@3: Correct article appears within top 3
Recommended benchmarks:
Precision@1 >= 70 to 80 percent
Precision@3 >= 85 to 90 percent
Success Rate
Percentage of queries resolved without escalation
Recommended:
Minimum 75 percent
85 to 90 percent indicates strong performance
Zero-hit Rate
Percentage of queries returning no results
Target:
Less than 5 to 10 percent
Escalation Rate
Percentage of sessions requiring live agent support
Target:
Less than 20 percent
Average Time to Resolution
Time from query submission to article access
Recommended:
Under 30 seconds on average
*************************************************************************************************************************************
If this response helps, please mark it as Accept as Solution and Helpful.
Doing so helps others in the community and encourages me to keep contributing.
Regards
Vaishali Singh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
hey @晃秀黒
Hope you are doing well.
Did my previous reply answer your question?
If it was helpful, please mark it as correct ✓ and close the thread . This will help other readers find the solution more easily.
Regards,
Vaishali Singh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Hi Vaishali,
Thank you for following up. Yes, your previous reply was very helpful and clearly answered my question.
I have marked it as the correct solution and will close the thread now. Thanks again for your support!
Best regards,
