IT Health Monitoring and the End of Crying Wolf

tom_molfetto · ‎08-14-2014

For Israel's El Al Airlines, an application going down could mean problems as large as keeping planes in the sky, or it could simply result in an office printer being offline temporarily. The point is that by having accurate and up-to-date service dependency maps, El Al's IT team can assess the business impact of problems when they occur, coupled with instant "mean time to know" (MTTK) which results in much quicker issue resolution.

Without this service-centric view, discussions around IT health monitoring often focus on the ability to monitor the health of a single entity, like a server or router. Each IT component is monitored separately and the information is compiled, but not necessarily correlated. It could be thought of as a type of bottom-up approach. When an organization chooses to monitor their overall IT health using this approach, the result is a disjointed effort that lends itself to information gaps which lead to unnecessary system outages and extended business service downtime scenarios.

Each IT alert or system failure is weighted equally. Tickets are issued to IT teams and problems are resolved in the order in which they are received, regardless of the issues' relative importance to maintaining normal operations. This old system of managing IT issues is akin to a child who cries wolf. A stubbed toe elicits the same reaction as a broken arm, and very quickly the parent learns to respond to all of the child's cries with the same level of response.

We believe a more strategic approach for organizations is one that looks at IT service monitoring from a top-down view that focuses on health of business services and then moves backwards to determine which individual components are contributing to that final result. Monitoring a single server to see if it's up and running is great, but having the ability to understand every IT component that plays a part in successfully fulfilling a business service is better.

This way of approaching IT service issues puts an end to the crying wolf conundrum. IT alerts are understood in relation to the system on the whole and their impact is weighted and given appropriate priority.

In addition, IT teams are immediately aware of how their work, including routine maintenance and change management operations, will impact each business service offering. This facilitates smooth communication between IT and the business side of operations. It works to reduce surprise outages and unintended effects that come along with IT additions, changes, and upgrades.

ServiceWatch starts with an understanding of exactly which entities within an IT infrastructure are involved in ensuring that a business service, like online reservations, runs without error. Having this type of service dependency map is vital to running ITIL compliant change management operations. It allows IT to forewarn other departments about possible outages and downtime scenarios when IT is making changes. It also reduces the mean time to know (MTTK) for when service interruptions do occur.

Without accurate, up-to-date service maps, IT staff is left to dig through service logs and attempt to recreate an IT components trail to figure out where the technology broke down. Managing the health of these siloed technologies (applications, servers, networks) while they're buried under huge volumes of disconnected data leads to wasted time, effort and money when system errors do occur.

ServiceWatch starts with the user-accessed entry point to the business service. It then drills down surgically through the applications and associated IT infrastructure, automatically mapping and monitoring only those things that are relevant to that service. This process was, until now, something that had to be down manually. Traditional IT service management tools are capable of discovering the components in an IT infrastructure, but then they had to be put into a database at which point someone has to manually correlate the IT components to business services. This style of mapping, due to its time consuming nature, quite often results in the map being out-of-date by the time it's completed. Not to mention that changes, updates, and additions in IT infrastructure need to be reflected and so the mapping process must be re-done time and time again.

Next… Reduce "Mean Time to Know" by Mapping Business Services

IT Health Monitoring and the End of Crying Wolf

An Exciting Opportunity to Become a 2026 Rising Star

Calling all Servicenow Developers! How can we improve the Developer site experience?

My Architecture Excellence (ArchX) Journey