Scaling Knowledge Cleanup Efforts to Support AI – Recommendations Requested
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
We currently have a Knowledge Base corpus of approx. 35k Knowledge articles in our external domain. A significant portion of these articles have not been reviewed or updated by a content owner in over a year.
As we evaluate AI-powered capabilities across the ServiceNow platform that leverage knowledge content, the business has challenged us on how best to prioritize their knowledge cleanup efforts. Given the volume of articles, a manual review of all content is not practical.
To help identify high-value content, we are currently considering factors such as:
- Article view count / usage frequency
- Last updated date
- Number of cases associated with or resolved using the article
We would appreciate your guidance on the following:
- Are there additional metrics, reports, or best practices that you have seen customers use successfully to prioritize large-scale KB cleanup efforts and identify articles that should be reviewed, retained, updated, archived, or excluded?
- Are there tools or capabilities within ServiceNow that can help identify stale, low-value, or high-value knowledge articles? I understand Knowledge center capabilities will help Identify and Review Duplicate KBs, optimize KBs etc.
- Can AI-powered capabilities be configured to utilize only a subset of approved knowledge articles initially, with additional content introduced in phases as the knowledge corpus is validated and refined?
- Can certain knowledge bases, categories, or content sources be prioritized over others for AI-generated recommendations and responses? Are there configurable weighting or relevance mechanisms that favor trusted, validated, or high-confidence content?
- Have you seen customers adopt a phased approach to AI enablement where a curated set of knowledge articles is used initially before expanding to the broader knowledge corpus? If so, what best practices or lessons learned would you recommend?
Our objective is to establish a practical knowledge governance approach that improves the quality of AI-generated results while helping the business focus its cleanup efforts on the content that will provide the greatest value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
These are exactly the right questions to be asking before AI enablement, not after.
1. Additional prioritization signals
Beyond what you're already tracking:
-
Flagged articles / negative feedback — high flag rates or negative feedback are strong remediation candidates regardless of age
-
Deflection correlation — where deflection is being tracked, articles that appear frequently in search but do not contribute to deflection may warrant review
-
Owner validity — articles whose assigned owner is no longer active are structurally orphaned regardless of content quality
-
Business criticality — low-volume content may still need retention and review if it supports regulatory, security, outage, or seasonal processes; usage frequency alone isn't sufficient
-
Article type distribution — segmenting by article type before triaging gives you a more defensible framework; a 35k corpus with no type governance is almost certainly a mixed-format problem
-
AQI (Article Quality Index) — provides a consistent review checklist, but at this volume apply it strategically or through sampling rather than requiring a manual review of every article
2. Platform capabilities for identification
Knowledge Center surfaces several relevant tools:
-
Flagged for Optimization — articles surfaced by Article Optimization scans, including configurable and custom quality criteria
-
Duplicate Articles — identifies content overlap that dilutes search relevance and AI grounding
-
Knowledge Gaps — Knowledge Center can identify potential knowledge gaps based on patterns in cases, incidents, or other task records; separately, search analytics can surface queries returning no useful result, pointing to missing or underperforming content
-
Knowledge Management dashboards and reporting — use available KM reporting supplemented by Platform Analytics or custom reporting where needed for view counts, feedback, and deflection metrics
3. Limiting AI to a validated subset initially
For AI Search-powered experiences, Search Sources and filters can be used to limit the searchable corpus to validated content — scoping a Search Source to approved knowledge bases or defined validation criteria and leaving the broader corpus out of scope until remediated. This is a practical and supported approach for phased rollout. The exact configuration model should be validated for the specific AI capability being introduced, as not every Now Assist feature follows the same scoping mechanism.
4. Prioritizing or weighting specific content sources
Within AI Search, Result Improvement Rules are the mechanism for boosting, promoting, or blocking specific content. Boosted or promoted content is more likely to rank highly in results. Depending on the configuration, blocked content can be excluded from generated answers, standard search results, or both.
There is no universal trust-score weighting model, but combining Search Source scope filtering with Result Improvement Rules gives you meaningful control over what validated content surfaces and how prominently. The distinction between excluding unvalidated content and deprioritizing it is worth a deliberate design decision — both are achievable.
5. Phased AI enablement
Yes — this is the pattern that works. A typical approach:
-
Phase 1: Identify your highest-value articles by view count, deflection correlation, and business criticality. Scope AI to this set first using approved knowledge bases or a defined validation criterion that can be applied through Search Source filtering.
-
Phase 2: Run Article Optimization scans and duplicate detection across the remaining corpus. Remediate by priority tier.
-
Phase 3: Expand AI scope as content meets a defined quality threshold — article type completeness, owner assignment, and review date within policy.
The governance piece that makes this sustainable is defining what "validated" means before you start. Without that definition, phased expansion becomes indefinite.
At 35k articles, the cleanup effort will surface structural gaps in how knowledge has been governed — missing templates, no article type discipline, and orphaned ownership. The AI enablement initiative is an opportunity to establish that governance framework, not just remediate historical content.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Thank you Mary for sharing your inputs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2 weeks ago
Thank you Mary for sharing your inputs