Building Dashboards and Visualizations for Incident Clustering Using Predictive Intelligence

chaithra9 · 3 weeks ago

We are exploring the Predictive Intelligence Clustering solution in ServiceNow to identify recurring incident patterns and uncover potential underlying systemic issues.

Our objective is to leverage clustering to:

Identify recurring incident clusters and trends.
Detect groups of incidents that may indicate broader systemic problems.
Derive meaningful operational insights from the generated clusters.
Build dashboards and visualizations that effectively present these insights to stakeholders.

We would appreciate guidance on the following:

What are some recommended approaches for building dashboards using Predictive Intelligence clustering results?
Which reports, visualizations, or KPIs have you found most valuable when analyzing incident clusters?
Are there any best practices or example dashboards that demonstrate how clustering insights can be used for problem identification, trend analysis, or operational improvement?

If anyone has implemented a similar use case, examples of reports, dashboards, or practical experiences would be greatly appreciated.

Vikram Reddy · 3 weeks ago

Hi Chaithra,

I have implemented a similar use case for incident pattern detection, and the biggest lesson is this: do not treat Predictive Intelligence clusters as the final answer. Treat them as signals that help you identify candidates for Problem Management, service improvement, or operational cleanup.

Recommended dashboard approach:

1. Start with the cluster output, then add operational context

A cluster by itself usually tells you “these incidents look similar.” For stakeholders, that is not enough. Join or report the cluster results together with normal incident fields such as:

- Opened date
- Assignment group
- Business service / Service offering
- Configuration item
- Category / Subcategory
- Priority / Impact / Urgency
- State
- Resolution code
- Resolved by
- MTTR / duration
- Reopen count
- Caller / location, if relevant

This helps convert the cluster from a machine-generated grouping into an operational insight.

2. Create a cluster review dashboard

Useful widgets:

- Top recurring incident clusters by volume
- Cluster volume trend by week/month
- New or emerging clusters in the last 7/30 days
- Clusters with highest average resolution time
- Clusters with highest P1/P2 count
- Clusters with highest reopen rate
- Clusters concentrated on one CI or business service
- Clusters concentrated in one assignment group
- Clusters with no linked Problem record
- Clusters already linked to active Problems

3. KPIs I have found most useful

- Incident count by cluster
- % of total incident volume covered by top clusters
- Week-over-week cluster growth
- Average / median MTTR by cluster
- P1/P2 incident count by cluster
- Reopen rate by cluster
- Number of affected users
- Number of affected CIs
- Number of affected business services
- First seen date / last seen date
- Linked Problem count
- Reduction in incident volume after problem fix

4. Best visualizations

For an operational dashboard, I would use:

- Single score: Total clustered incidents this month
- Single score: Top cluster incident count
- Bar chart: Top 10 clusters by incident volume
- Trend chart: Cluster volume over time
- Heatmap: Cluster vs assignment group
- Heatmap: Cluster vs business service / CI
- Bubble chart: Volume vs MTTR vs priority impact
- List report: Sample incidents in selected cluster
- List report: Clusters without Problem records
- Pie/donut: Cluster distribution by category or service

5. Problem identification pattern

A good practical rule is to define thresholds for problem candidates.

Example:

Create Problem candidate when:
- Same cluster has more than 10 incidents in 30 days
- OR cluster has 3 or more P1/P2 incidents
- OR cluster MTTR is above SLA target
- OR cluster has repeated reopen behavior
- OR cluster is tied to the same CI/service repeatedly

This makes the dashboard actionable instead of just analytical.

6. Recommended dashboard layout

Executive section:
- Total clustered incidents
- Top recurring cluster
- High-severity cluster count
- Estimated incident reduction opportunity

Trend section:
- Weekly incident trend by cluster
- New/emerging clusters
- Growing vs declining clusters

Operational section:
- Top clusters by assignment group
- Top clusters by service/CI
- High MTTR clusters
- Reopened incident clusters

Problem Management section:
- Clusters with linked Problems
- Clusters without Problems
- Problem candidates
- Incident volume before/after Problem resolution

Drilldown section:
- Incident list for selected cluster
- Common keywords/phrases
- Sample incidents
- Owner/action/status

7. Important best practices

- Do not show raw cluster IDs only. Give each reviewed cluster a human-readable label such as “VPN login failures” or “Email delivery delay.”
- Have SMEs review the top clusters before presenting to leadership.
- Exclude noisy or irrelevant tickets before training, such as test incidents, duplicates, cancelled records, and very short descriptions.
- Use recent data windows, such as last 90 or 180 days, so old patterns do not dominate the model.
- Snapshot cluster results periodically if you want historical trending. Clusters may change after retraining, so weekly/monthly snapshots are useful.
- Do not assume clustering equals root cause. It identifies similarity, not the confirmed cause.
- Link reviewed clusters to Problem records, Known Errors, KB articles, or remediation tasks.

8. Practical implementation approach

Step 1:
Train the clustering solution on incident short description, description, category, subcategory, CI/service, and other fields that improve similarity.

Step 2:
Review the top clusters with Incident/Problem/Service owners.

Step 3:
Create a lightweight “Cluster Insight” tracking table or reportable dataset if needed.

Suggested fields:
- Cluster ID
- Cluster label
- Incident count
- First seen
- Last seen
- Top service
- Top CI
- Top assignment group
- Average MTTR
- Priority count
- Linked Problem
- Owner
- Status
- Notes / recommended action

Step 4:
Build Platform Analytics reports/dashboards on that enriched dataset.

Step 5:
Run a weekly review:
- What clusters are growing?
- Which clusters need Problem records?
- Which remediations reduced incident volume?
- Which clusters are noise and should be excluded/refined?

In short, the most valuable dashboard is not just “clusters found by PI.” The most valuable dashboard is “recurring incident patterns, operational impact, ownership, and action status.” That is what helps stakeholders move from AI output to actual problem reduction.

Thank you,

Vikram Karety

Octigo Solutions INC