AI comes to IT Ops

New AIOps tools can improve IT diagnostics and maintenance

Saving money with AIOPS tools
  • IT downtime costs are pushing companies to automate IT operations
  • New AIOps tools can help eliminate manual tasks and provide real‑time diagnostics
  • Companies need large amounts of training data and the right human talent to get value from AIOps platforms

IT system failures—caused by cyberattacks, poor maintenance or other reasons—can cost organizations huge amounts of money. Gartner has estimated the average cost of IT downtime at $5,600 per minute. So it’s little wonder why many companies are looking for better ways to avoid these interruptions altogether.

Emerging artificial intelligence applications can help organizations identify potential maintenance and security problems before they happen. Many large organizations are starting to use AI tools in IT operations (or “AIOps”) to automate many routine manual tasks, stay on top of IT system upkeep with minimal human intervention, and manage data generation and storage 24/7.

“The objective of AIOps is to make IT operations efficient and fast,” says Chanda Dani, senior director of product marketing at ServiceNow. “We do that by processing massive amounts of data and bringing it down to a few actionable items, and eventually using it to drive a more meaningful set of actions.”

The new AI tools can offer major assistance to an overworked IT staff, says Eric Moller, CTO of Atomic X, a chatbot vendor.

“IT professionals already deal with information overload,” Moller says. “When a service breaks, it’s up to IT to sift through log files and leverage their intuition and experience to isolate what went wrong. A huge advantage of AIOps is simply how much information an AI system can leverage in diagnosing issues.”

Even modern IT architectures can be a nightmare for diagnosing problems, Moller says, because of the amount of data they generate. AIOps tools can help consolidate diagnostic data and make it actionable.

Rapid adopters of AIOps seeing benefits

More than two‑thirds of all enterprises with more than 500 employees have begun to experiment with some form of AI ops, according to a recent survey from OpsRamp, an AIOps vendor.

Roughly 70% of those early adopters use AIOps for data insights, while nearly 75% see the main benefit in eliminating tedious manual tasks. Other benefits seen in the survey included anomaly detection and faster resolution of incidents.

Initial results such as those help explain why the global market for AIOps tools is expected to grow from $2.5 billion in 2018 to more than $11 billion in 2023, according to a study by ReportBuyer. The study found that general adoption of AI and concern about IT uptime are the top drivers of AIOps investment.

Companies are putting the new tools to work on a range of IT management tasks, says Tom Petrocelli, research fellow at Amalgam Insights, an IT consulting and strategy firm. A company running databases, for instance, could use AI tools to help fine‑tune and manage them.

“This reduces the amount of time database admins spend monitoring and troubleshooting problems,” Petrocelli says. “Using machine learning to find patterns in logs can help find problems before they become acute.”

AIOps tools can also analyze volumes of big data from IT operations’ tools and devices and use machine learning and analytics to create continuous, automated improvements, says Paul Mercina, director of innovation at Park Place Technologies, a data center maintenance vendor.

Doing so can “eliminate labor‑intensive manual tasks in IT operations to drive greater efficiency and uptime, while also offering a holistic view of an IT system,” he says.

Beyond maintenance

Apart from maintenance tasks, AIOps systems can also help organizations keep an eye on their IT systems as they shift processes to the cloud, says Enzo Signore, CMO of FixStream, a provider of AIOps applications.

The tools can process big data in real time, Signore says. “This is critically important because IT environments are very dynamic, and the location of a workload can change in minutes. As IT environments become more agile, visibility is decreasing, leaving IT operations to deal with constantly moving targets and increasing the risk of downtime.”

AIOps tools can crunch a company’s data in real time, giving IT managers an accurate foundation on which to manage operations. As with any AI platform, training the system is critical, says Bhanu Singh, vice president of product development at OpsRamp.

“Without access to vast sets of training data, AIOps tools will not be able to make the right predictions, extract signal from alert noise or drive preventive actions before any IT outage hits you hard,” he says.

Implementing a successful AIOps system also requires hiring talent that is currently rare in many IT organizations.

“Without skilled data scientists on your team, you won’t be able to continuously apply, refine, and adjust machine learning algorithms that help optimize digital operations,” Singh adds. “You will need the right human expertise that can aid and support machines to drive IT operations at scale.”

Many companies still rely on older hardware systems and networks that require constant monitoring and human intervention. That’s why automation of IT ops is becoming more critical today, says Jeff Bittner, founder and president of Exit Technologies, a computer asset management and IT asset disposition company.

“The way we think about it is, ‘What are the tasks that require a human and what can we automate?’” says ServiceNow’s Dani. “We can offer a massive amount of automation because we learn from human behavior. In the past, when a problem happened, how did a human fix it? We can automate that sequence of steps if the problem happens again.”

That doesn’t mean pushing people entirely out of the loop. Companies that implement AIOps should backstop these new systems with human staffers to check and occasionally override decisions made by the AI. But the data‑crunching in real time should be left to the machines.

“No mere mortal can absorb, analyze, and act on the huge amounts of data streaming in across the virtual enterprise,” Bittner says. “Networks are fast, but even a lag of microseconds could result in a machine overheating or a missed opportunity to execute a financial transaction.”

As IT environments become more complex, fewer companies are willing to take those risks.