Health Sentinel for ServiceNow: Bringing Sensor Logic to Platform Monitoring

Maik Skoddow · ‎05-17-2025

In complex enterprise environments, proactive monitoring is critical to maintaining stability and performance. Inspired by how physical sensors detect changes in environmental conditions and trigger appropriate responses, I developed a custom ServiceNow application called "Health Sentinel".

Its mission:
Monitor ServiceNow itself and respond automatically to signs of trouble.

Why It’s Useful for Operations Teams and System Administrators

Health Sentinel is designed with operational simplicity in mind. You don’t need to write code to benefit from it. Instead, you configure what to watch, define thresholds, and select how the system should respond. This allows to:

Detect issues before they escalate
Automate notifications and reactions
Reduce manual monitoring and intervention

Whether you’re maintaining a large-scale production instance or want to strengthen your internal governance, Health Sentinel provides a smart, low-effort way to increase visibility and improve response times.

From Physical Sensors to Platform Intelligence

In the physical world, sensors measure real-world conditions—like temperature or pressure—and send signals when thresholds are breached. A blinking warning light on a car dashboard or a spike on a heart monitor translates these signals into meaningful feedback.

Health Sentinel brings this concept into the ServiceNow universe. Each "sensor" in this application is a specialized scheduled job, implemented as a custom child table of sysauto_script. Thanks to the fact that this base table is exempted from being counted as a custom table (see Custom Table Guide), we gain a lot of flexibility - without affecting your license usage.

Each sensor runs at a configurable frequency and queries a specific table with a dedicated condition. You can even include related list conditions, significantly broadening the search scope and enabling more nuanced detection scenarios. For example:

Identifying recurring error messages in the syslog table
Spotting user transactions that fail due to quota restrictions
Detecting prolonged (sub)flow executions that exceed expected runtimes
Monitoring open incidents without updates for a defined period

Detection alone isn’t enough - action is what keeps systems healthy. For this reason, when the number of matching records exceeds your set threshold, the sensor triggers a so-called signal. This is implemented as a custom ServiceNow event (sysevent) and includes a rich JSON payload with all the information needed to initiate a follow-up action.

Each consumer can now decide how to process the signal and translate it into an appropriate mitigation process, such as

sending an email alert,
pushing notifications to a chat system,
executing any script or integration or
triggering a subflow.

To help you get started quickly, the application includes a working notification based on a custom event called health_sentinel.send_email, paired with a clean email template that can be easily adjusted to your organization's needs.

How to get started

(1) Import Application into your Instance

To import and install the Health Sentinel application in your own ServiceNow instance, follow these steps:

Make sure, your instance has at least the Xanadu release as I'm using the "Deny-Unless" ACL type.
Fork my GitHub repository using the instructions provided in the GitHub documentation.

Tip: A clone is generally not required because tools like the ServiceNow Studio can connect to and work with remote repositories.
Link and import the forked repository into ServiceNow by following the instructions in a great article by Jesse.
Hint: As the new ServiceNow Studio still has no Git capabilities you have to do perform all Git-based activities in the legacy Studio.

(2) Enable Users by Assigning the right Roles

After installing the application, you need to assign at least one of the two contained roles to users who should work with it. With having any of the application-specific roles, even users with the "admin" role will not be able to do anything.

The two roles differ in their capabilities as follows:

health_sentinel.user

can create sensors from scratch or copies of OOTB sensors in any other application than "Health Sentinel"
cannot modify or delete any of the existing application files or create new application files within the "Health Sentinel" application

health_sentinel.admin

can create new sensors within the "Health Sentinel" application scope
full access to the application's own files

(3) Create a Sensor

Now go to the sensors table (u_sensor) via navigationmodule Health Sentinel -> Sensors -> All Sensors.

Here you have two options:

create a new sensor from scratch or
copy one of the prebuilt sensors via "Insert and Stay" option.

In both situations, you should create the new sensor in any other application scope than "Health Sensor".

The reason for this is that in most cases, a sensor is tailored specifically to the needs of a particular instance and is therefore not universally applicable. However, if a sensor is designed to cover a broader use case and should serve as a template for future sensors, it can be included in the Health Sentinel application in a deactivated state. This allows it to be deployed alongside the application. On the target instance, any number of new sensors can then be created by duplicating this preconfigured template - offering a fast, consistent way to implement common monitoring patterns.

As an example, we want to be informed about cache flushes as a high number of cache flushes in ServiceNow is generally detrimental to system performance and user experience. Fortunately, these flushes are tracked as system events in the table diagnostic_event. Therefore, the first configuration of a sensor must be done on the "Detection criteria" tab and can look like this:

After saving (and without setting to "Active"!) you can leverage the UI Action "Simulate" which creates a signal payload just like you would find in the sysevent table:

In the above screenshot 52 total findings have been detected as the "Simulate" action always returns all records independent of what you have configured at field "Lookup strategy". The "data" branch contains the details of the findings, but not of all of them. Rather, it contains only a restricted number of records, as specified in the "Included findings" field. This protects the system from overloading. For example, a sensor run could deliver hundreds of thousands of findings, resulting in a megabyte payload that would exceed the capacity of a record in the "sysevent" table.

To enrich the findings data with more content we have to go to the "Signal contents" tab of our sensor record and fill in the technical fields and the maximum number of characters that should be included in a finding.

Now let's test a real execution. To prevent accidental sensor activation, the value "On Demand" is configured by default at tab "Execution cadence". You can now execute the sensor once using the "Execute now" UI action. A new entry will then appear under the "Fired Events" tab.

The value in the "Parm1" field is only used to connect the event to the sensor. In the "Parm2" field, you will find the same signal payload as before when using the "Simulate" UI action.

So far, there are no consumers of the signal. For this reason, we still have to configure what exactly should happen when a signal is sent. The most straightforward way to get started is likely by sending an email. As an HTML-based medium, email provides the flexibility to present all technical details from the signal in a visually structured and engaging format.

(4) Create an (email) notification and configure the sensor accordingly

I’ve developed a blueprint for email notifications that can be customized to suit your specific needs. To access it, navigate to: Health Sentinel -> Administration -> Notifications. Open the notification with the name "Health Sentinel Email Notification" and create a copy ("Insert and stay") in your own application scope with "Active" set to "true".

There is no direct link between a sensor and an email notification, as the core design principle is to decouple the Health Sentinel - which generates signals - from the components that consume them. This architectural separation provides maximum flexibility. The desired response behavior can be controlled by specifying the appropriate event type. To trigger the configured email notification, set the "Fired event" field under the "Signal contents" tab in the sensor record to the value “health_sentinel.send_email,” as this is the event type to which the previously activated email notification is subscribed.

Now execute again the sensor and go to the table sys_email. As a result, you should find there a prepared or sent email that looks similar to the following:

How to measure what is not available with a simple table lookup?

Use the "Related List Conditions"

In the "Filter" field I have activated via a dictionary attribute the feature of the so-called "Related List Conditions". This let you filter records based on related tables, making it easy to include linked data in your queries. It will help you create more precise filters and automations without custom code, saving time and simplifying maintenance.

Combine several tables via database views

Everything has already been said about database views by other authors. Just keep that option in mind if a single table does not provide all the necessary information for detection and/or reporting. A good candidate for example is the OOTB database view incident_metric.

Extend existing tables with valuable information

ServiceNow provides diagnostic tools such as the /stats.do page and the legacy "System Diagnostics" dashboard, which offer valuable insights into the health of individual application nodes. These tools present real-time metrics like memory usage, thread counts, and transaction rates, aiding administrators in monitoring system performance. The underlying data for these diagnostics is stored in the sys_cluster_node_stats table, where metrics are encapsulated in an XML-based structure.

In a previous initiative, the "Node Health Monitor" project, I extended the sys_cluster_state table to include additional fields such as "Available Integration Semaphores" and "Logged-In End User Sessions". These fields are populated via a Business Rule that parses the XML data from the sys_cluster_node_stats table. This enhancement enables a consolidated list view, providing administrators with a comprehensive overview of critical health indicators across all application nodes:

However, this setup lacks automated alerting mechanisms when specific metrics breach predefined thresholds, potentially delaying response to critical issues. The Health Sentinel application can now address this gap by introducing a fully automated monitoring and alerting framework.

Create new collector tables for aggregated data

In another initiative, I developed the so-called Table Footprint Generator. At its core, this application maintains a daily record count for each table over the past 30 days, capturing both the absolute and percentage changes compared to the previous day. Each table has a dedicated record and corresponding report, which I enhanced with automated visualizations to highlight data trends and patterns. This historical dataset has unlocked a wide range of new sensor scenarios. The screenshot below, for instance, shows the report for the syslog_cancellation table. This table logs all cancelled transactions—not only those explicitly cancelled by users, but more critically, those automatically terminated by the platform due to transaction quota rules. A sudden spike in cancelled transactions from one day to the next can signal underlying performance or stability issues and should prompt further investigation.

Leveraging Remote Tables for advanced Monitoring

In many cases, critical thresholds are not represented by individual records but are the result of aggregations, groupings, or multi-level calculations. While it is technically feasible to extend the Health Sentinel application to accommodate a wide range of such use cases through configurable options, doing so would introduce significant complexity and may not justify the development effort.

A more efficient and scalable approach involves utilizing remote tables. These tables allow for script-based data generation, enabling the creation of dynamic datasets that can represent complex metrics without the need to store data persistently. Notably, remote tables are considered "exempt" from custom table licensing in ServiceNow, meaning they do not count against the customer's licensed custom table quota .

List of included sensors

⚠️ Please note
The sensors listed below have been created and added to the application. However, they are intended as examples and sources of inspiration only. They should not be copied and activated without careful consideration. Because each organization has its own unique processes, structures, and risk landscape, sensors must be reviewed and adapted to reflect those specific circumstances. They must also be thoroughly tested before use.

Name	Table	Description
Broken Outbound Requests	`sys_outbound_http_log`	Detects outbound requests which were responded with a status code greater than 299
Occurences of System Cache Flushes	`dignostic_event`	A high number of cache flushes in ServiceNow is generally detrimental to system performance and user experience.
Application Node's Memory Consumption more than 80%	`sys_cluster_state`	Detects if the memory consumption on an application node is greater than the configured threshold.
Limited Semaphores for incoming Integration Requests	`sys_cluster_state`	Detects if the number of semaphores for inbound integration requests on an application node is at a critical level.
Limited Semaphores for incoming User Requests	`sys_cluster_state`	Detects if the number of semaphores for inbound user requests on an application node is at a critical level.
Scheduled Jobs on an Application Node are waiting to be executed	`sys_cluster_state`	Detects if the application nodes' waiting queue for background jobs to be executed has exceeded the configured value.
Erroneous (Sub-)Flow Executions	`sys_flow_context`	This sensor detects any any (sub-)flow executions with problematic states.
Errors in the syslog Table	`syslog`	Detects whether the number of errors in the syslog table has exceeded the configured threshold since the last run time.
High Response Time of User Transactions	`syslog_transaction`	Detects user transactions that take longer than the configured number of seconds.
New Store Application Version available	`sys_store_app`	Identifies store apps whose "latest_version" differs from the value in "version" and has a update available. If so, the related store applications can be updated.
Large number of emails are not sent	`sys_email`	It checks whether a certain number of emails have not yet been sent. In this way, problems with the SMTP server can be identified.
Performance Analytics Jobs finished with erros	`pa_job_logs`	It detects performance analytics data collection jobs that were not completed successfully during the data gathering process.

Where to get Inspirations for new Sensors?

System administrators seeking inspiration for key health metrics to monitor can begin by consulting the official which outlines fundamental indicators.

Another great source for inspiration is the excellent and must-read document Fine tune ServiceNow platform with regular performance administration which explains how to:

perform the activities your instance needs to give you the most value
review your log data for errors and warnings
maintain your tables for peak performance.

Many of the activities described in that document can be translated into corresponding sensors.

And the best source of new sensors is daily practice, which provides individual, context-dependent KPIs for each environment. If possible, it is important to configure the associated sensor while a specific problem is occurring in order to test whether the sensor would respond accordingly the next time the critical situation occurs, based on real data.

ChrisP_AMP · ‎05-19-2025

HI @Maik Skoddow - this is fantastic and I love the concept. Enabling and empowering a Platform Owner, the Centre of Excellence and its key resources with pro-active insights.

Forking is above my ability, even following the guide provided, but I trust you in that it works!

Keep posting and sharing, it would be great to get other input on this

Martin Ivanov · ‎05-20-2025

Another masterpiece. Thanks, @Maik Skoddow !

Maik Skoddow · ‎05-21-2025

Hi @ChrisP_AMP

please update your forked repo as I committed today some bugfixes.

Thanks
Maik

brianjoseph · ‎05-25-2025

Simple amazing. Keep up the good work!

Kamil22 · ‎07-09-2025

Hi @Maik Skoddow
It looks great. But I have a problem.
I imported your app. I am admin and I added role health_sentinel.admin (health_sentinel.user was added too). I created copy of sensor: "Occurences of System Cache Flushes" in global from scratch. "Insert and Stay" was not available, maybe because I was in global app scope, I am not sure. I changed name (and app scope). Rest is the same. I saved it as NOT active.
My problem is that button "Simulate" is not available.
I checked UI Action. Looks like condition is only for not new record and health_sentinel.user role.
Can you please help. Maybe something with cross application permissions?

Maybe this is an issue. In UI Action and in your scripts scripts are being called from global app:
global.HealthSentinelUtils.isHealthSentinelUser()

return HealthSentinelUtils.hasRole(global.HealthSentinelConst.ROLE_USER);
But both scripts: HealthSentinelUtils and HealthSentinelConst are in Health Sentinel application.

Maik Skoddow · ‎07-09-2025

@Kamil22

After assigning the role "health_sentinel.admin" to you: have you logged out and then logged in again? Only that way, you can be sure that all role-based functionalities are working as expected.

And for the rest: It's working like a charm and as designed.

Kamil22 · ‎07-10-2025

OMG. I forget about this little piece. Today after logged it, I have buttons. Thank You @Maik Skoddow !!!

Sebastian R_

Hi @Maik Skoddow,

love the idea for the app. Do you have any plans to enhance this further? E.g. creating tasks for the operations teams so that it can be worked on. Thinking to implement this myself otherwise (maybe as an add-on application).