Selva Arun
Mega Sage
Mega Sage

 

MID Server Upgrade During Patching: Why Your Change Window Needs to Be 5+ Hours

A Real Production Incident: How Manual Restarts Caused 2+ Hours of Yo-Yo Behavior

📚 Continuation of My Previous Article

This article is a continuation of my previous community post: "MID Server Pre-Upgrade Readiness Checklist for Any Upgrades" (October 2025), which focused on pre-upgrade validation.

While that article helps you prepare BEFORE the upgrade, this article focuses on what happens DURING and AFTER the upgrade process, based on a real production incident we experienced in December 2025.

Together, these two articles provide a complete guide to MID Server upgrade management.

Why I Created This Article

In December 2025, our organization experienced a production incident during a ServiceNow patching event. What should have been a routine 5-10 minute MID Server upgrade turned into a 2+ hour crisis with our MID Servers going up and down multiple times (yo-yo behavior).

After extensive investigation using Event Viewer logs, wrapper.log analysis, ServiceNow heartbeat data, and team discussions, we identified the root cause and implemented solutions. I'm sharing our findings hoping it will help someone in the ServiceNow community avoid the same pain.

🚨 The Real Production Incident: December 11, 2025

What Happened

Time Event
17:08 PM Instance patched to Yokohama Patch 7 HF2b
18:45 PM Change window started (CHG0106263)
20:36 PM Change window ended ⚠️ (too early!)
20:38 PM MID Server started upgrade (OUTSIDE change window!)
20:39 PM Alerts sent to NOC (incidents created)
21:27 PM NOC manually restarted MID Server (1st restart)
22:16 PM NOC manually restarted (2nd restart)
22:25 PM NOC manually restarted (3rd restart)
22:34 PM NOC manually restarted (4th restart)
22:42 PM MID Server started upgrade again (left alone this time)
22:48 PM Upgrade completed successfully (5 minutes!)

The Numbers Tell the Story

  • Total restart cycles: 5
  • Total disruption time: 2+ hours
  • Actual upgrade time (when left alone): 5 minutes
  • Manual stops by users: 0 (all system-initiated)
  • System reboots: 0

Root Cause Analysis

Finding #1: The Upgrade Process Works Correctly

When left alone, the final upgrade attempt (22:42 PM) completed in exactly 5 minutes 19 seconds with zero errors. The wrapper.log showed: "Upgrade process completed successfully"

Finding #2: Change Window Was Too Short

Issue What Happened
Change window duration 1 hour 51 minutes (18:45 - 20:36)
MID Server upgrade started 20:38 PM (2 minutes AFTER change closed!)
Result Alerts sent to NOC (outside maintenance window)

Finding #3: NOC Manual Restarts Caused Yo-Yo Behavior

Each time NOC manually restarted the MID Server service, it interrupted the natural upgrade process:

  1. NOC receives "MID Server Down" alert
  2. NOC restarts service per standard procedure
  3. Service starts, upgrade process begins again
  4. Upgrade stops service to deploy files
  5. ServiceNow marks "Down" after 100 seconds (heartbeat timeout)
  6. NOC receives another alert
  7. Cycle repeats...

Finding #4: Heartbeat Timeout vs Upgrade Duration Mismatch

Setting Value Impact
Heartbeat Interval 40 seconds MID sends "I'm alive" every 40 sec
Heartbeat Timeout 100 seconds Marked "Down" after 100 sec silence
Upgrade Duration 300+ seconds (5 min) Always exceeds timeout!
⚠️ Key Insight: Since 100 seconds < 300 seconds, alerts will ALWAYS be triggered during a normal upgrade. This is expected behavior - which is why maintenance windows are critical!

What We Discussed and Decided

After our investigation, we held a meeting with our ServiceNow Team, DevOps, and NOC to discuss findings and make decisions:

Decision 1: Change Window Must Be Minimum 5 Hours

Instance Patch:     Hour 0
MID Detection:      Hour 1-3 (staggered across servers)
MID Download:       Hour 1-4 (upgrade packages from install.service-now.com)
MID Upgrade:        Hour 3-5 (5-10 min per server)
All Complete:       Hour 5

Decision 2: No Manual Intervention During Upgrades

Manual restarts during the change window cause yo-yo behavior and extend downtime from 10 minutes to 2+ hours.

Decision 3: ServiceNow Team Validates (Not NOC)

Since we are the application owners, the ServiceNow Team is responsible for validating MID Server upgrade success - not NOC. NOC's role is monitoring only.

Decision 4: Post-Implementation Validation Process

At end of change window, ServiceNow Team checks:

  • Status: Up
  • Validated: Yes
  • Version: Matches new patch version

What Happens During a MID Server Upgrade

Understanding the process helps explain why manual intervention causes problems:

  1. Instance Patched → Instance upgraded to new version
  2. MID Detection → MID Servers detect upgrade needed (hourly AutoUpgrade.3600 check)
  3. Download → MID Servers download upgrade packages from install.service-now.com
  4. Pre-Upgrade Check → Validates prerequisites (permissions, disk space, PowerShell)
  5. Service Stop → MID Server stops Windows service (5-10 min downtime begins)
  6. File Deployment → ServiceNow Platform Distribution Upgrade replaces files
  7. Auto-Restart → Service restarts automatically via start.bat
  8. Validation → MID Server sends heartbeat, status changes to "Up"
Key Point: When left alone, this entire process completes automatically in 5-10 minutes per MID Server!

Critical Rules During Change Window

DO NOT During Change Window:

  • Restart MID Server services manually
  • Respond to MID Server "Down" alerts by restarting services
  • Run troubleshooting scripts on MID Servers

DO During Change Window:

  • Acknowledge alerts (but take no action)
  • Wait for automatic recovery
  • Monitor change ticket for updates

Post-Implementation Validation Steps

Step 1: Navigate to MID Servers

All > MID Server > Servers

Step 2: For each MID Server, verify:

Field Expected Value
Status Up
Validated Yes
Version Matches new patch version

Step 3: Compare version to expected version in change ticket

Decision Tree for Troubleshooting

All MID Servers show Status=Up, Validated=Yes, Version=Expected?
│
├── YES → Change successful. Close change ticket.
│
└── NO → Which issue?
          │
          ├── Status = Down?
          │   → Wait 20 more minutes
          │   → Check wrapper.log - did upgrade complete successfully?
          │   → If upgrade completed, restart MID Server service (from UI or on server)
          │   → If upgrade failed, troubleshoot errors in wrapper.log
          │
          ├── Validated = No?
          │   → Click "Validate" button, wait 5 minutes
          │   → If still No, check MID Server issues
          │
          └── Version = Wrong?
              → Check wrapper.log for upgrade errors
              → Restart MID Server service to trigger upgrade retry

Troubleshooting MID Server Upgrade Issues

If your MID Server upgrade fails, here are the steps to diagnose and fix:

Step 1: Check Upgrade History in ServiceNow

All > MID Server > Upgrade History

Look for failed stages: Pre Upgrade Check, Download, Extract, Deploy Binary Files

Step 2: Check wrapper.log on MID Server, this is where it is located on our servers

D:\ServiceNow MID Server <server_name>\agent\logs\wrapper.log

Key phrases to look for:

Phrase Meaning
"Checking to see if MID server needs to upgrade" Upgrade check started
"Setting mid status to Upgrading" Upgrade beginning
"Pre-upgrade validation tests successful" Pre-checks passed
"Upgrading MID server" File extraction starting
"Stopping MID server. Bootstrapping upgrade." Service stopping for file deployment
"Upgrade complete" Files deployed successfully
"Upgrade process completed successfully" Full success

Step 3: Check Windows Event Viewer

Event Viewer > Windows Logs > System
Filter: Event ID 7036 (Service state changes)

PowerShell command to check service restarts:

Get-EventLog -LogName System -After (Get-Date).AddDays(-1) |
  Where-Object { $_.EventID -eq 7036 -and $_.Message -like "*ServiceNow MID*" } |
  Sort-Object TimeGenerated |
  Format-Table TimeGenerated, Message -AutoSize

Step 4: Common Issues and Fixes

Issue Cause Fix
MID stuck in "Upgrading" status Upgrade process hung Wait 30 min, then restart service once
Version not updated Upgrade failed silently Check wrapper.log for errors, restart to retry
"Access denied" errors Service account permissions (PRB1547917) Grant FullControl to service account on MID folder
Multiple restart cycles (yo-yo) Manual intervention during upgrade Stop intervening! Let upgrade complete naturally
"null (vnull)" capabilities PowerShell execution policy restricted Set execution policy to RemoteSigned
Pre-upgrade check failed Various (permissions, disk, PowerShell) Run my Pre-Upgrade Validation Script (previous article)
File lock errors Antivirus or Application Experience Whitelist MID folder, enable Application Experience

Step 5: If All Else Fails - Manual Restart

After confirming upgrade completed in wrapper.log, restart the service:

From ServiceNow UI:

MID Server record > Related Links > Restart MID

From Windows Server:

services.msc > ServiceNow MID Server_[name] > Right-click > Restart

From Command Line:

net stop "ServiceNow MID Server_MIDSERVERNAME"
net start "ServiceNow MID Server_MIDSERVERNAME"

Change Ticket Requirements

For future ServiceNow patches, ensure your change ticket includes:

1. Change Window Duration

Minimum: 5 hours
Recommended: 6 hours (with buffer)

2. Affected CIs - Include All MID Servers

MIDSERVER01
MIDSERVER02
... (all production MID servers)

3. Instructions for Operations Team

IMPORTANT: MID Server Upgrade Instructions

During this change window:
- MID Server services will stop and restart AUTOMATICALLY
- DO NOT manually restart MID Server services
- DO NOT respond to MID Server "Down" alerts
- Upgrade takes 5-10 minutes per server - this is NORMAL

Post-implementation (at end of change window):
- ServiceNow Team validates MID Server Status/Validated/Version

Summary: Key Takeaways

Key Information

Item Value
Change Window Minimum 5 hours
Manual Intervention NOT required - upgrade is automatic
Expected Downtime 5-10 minutes per MID Server
Heartbeat Timeout 100 seconds (alerts expected during upgrade)
Root Cause of Yo-Yo Manual restarts interrupt upgrade process
Solution Maintenance windows + no manual intervention

Related ServiceNow Documentation

  • KB0696937: MID Server upgrade process - What actually happens
  • KB0596459: Troubleshoot MID Server upgrade issues
  • KB0713557: How to manually restore or upgrade a MID Server after failed auto-upgrade
  • KB0779816: How to continue a MID Server upgrade after it has crashed
  • KB1001745: MID Server fails to restart after upgrade (PRB1547917)


Tags: MID Server | Upgrade | Patching | ITOM | Best Practices | Change Management | NOC | Troubleshooting | KB0696937 | KB0596459 | Heartbeat | Maintenance Window | Yo-Yo Behavior | Production Incident | Healthcare

Version history
Last update:
3 hours ago
Updated by:
Contributors