This post originally appeared on MIT Technology Review
Sercompe Business Technology provides essential cloud services to roughly 60 corporate clients, supporting a total of about 50,000 users. So, it’s crucial that the Joinville, Brazil, company’s underlying IT infrastructure deliver reliable service with predictably high performance. But with a complex IT environment that includes more than 2,000 virtual machines and 1 petabyte—equivalent to a million gigabytes—of managed data, it was overwhelming for network administrators to sort through all the data and alerts to figure out what was going on when problems cropped up. And it was tough to ensure network and storage capacity were where they should be, or when to do the next upgrade.
To help untangle the complexity and increase its support engineers’ efficiency, Sercompe invested in an artificial intelligence operations (AIOps) platform, which uses AI to get to the root cause of problems and warn IT administrators before small issues become big ones. Now, according to cloud product manager Rafael Cardoso, the AIOps system does much of the work of managing its IT infrastructure—a major boon over the old manual methods.
“Figuring out when I needed more space or capacity—it was a mess before. We needed to get information from so many different points when we were planning. We never got the number correct,” says Cardoso. “Now, I have an entire view of the infrastructure and visualization from the virtual machines to the final disk in the rack.” AIOps brings visibility over the whole environment.
Before deploying the technology, Cardoso was where countless other organizations find themselves: snarled in an intricate web of IT systems, with interdependencies between layers of hardware, virtualization, middleware, and finally, applications. Any disruption or downtime could lead to tedious manual troubleshooting, and ultimately, a negative impact on business: a website that won’t function, for example, and irate customers.
AIOps platforms help IT managers master the task of automating IT operations by using AI to deliver quick intelligence about how the infrastructure is doing—areas that are humming along versus places that are in danger of triggering a downtime event. Credit for coining the term AIOps in 2016 goes to Gartner: it’s a broad category of tools designed to overcome the limitations of traditional monitoring tools. The platforms use self-learning algorithms to automate routine tasks and understand the behavior of the systems they monitor. They pull insights from performance data to identify and monitor irregular behavior on IT infrastructure and applications.
Market research company BCC Research estimates the global market for AIOps to balloon from $3 billion in 2021 to $9.4 billion by 2026, at a compound annual growth rate of 26%.1 Gartner analysts write in their April “Market Guide for AIOps Platforms” that the increasing rate of AIOps adoption is being driven by digital business transformation and the need to move from reactive responses to infrastructure issues to proactive actions.
“With data volumes reaching or exceeding gigabytes per minute across a dozen or more different domains, it is no longer possible for a human to analyze the data manually,” the Gartner analysts write. Applying AI in a systematic way speeds insights and enables proactivity.
According to Mark Esposito, chief learning officer at automation technology company Nexus FrontierTech, the term “AIOps” evolved from “DevOps”—the software engineering culture and practice that aims to integrate software development and operations. “The idea is to advocate automation and monitoring at all stages, from software construction to infrastructure management,” says Esposito. Recent innovation in the field includes using predictive analytics to anticipate and resolve problems before they can affect IT operations.
AIOps helps infrastructure fade into the background
Network and IT administrators harried by exploding data volumes and burgeoning complexity could use the help, says Saurabh Kulkarni, head of engineering and product management at Hewlett Packard Enterprise. Kulkarni works on HPE InfoSight, a cloud-based AIOps platform for proactively managing data center systems.
“IT administrators spend tons and tons of time planning their work, planning the deployments, adding new nodes, compute, storage, and all. And when something goes wrong in the infrastructure, it’s extremely difficult to debug those issues manually,” says Kulkarni. “AIOps uses machine-learning algorithms to look at the patterns, examine the repeated behaviors, and learn from them to provide a quick recommendation to the user.” Beyond storage nodes, every piece of IT infrastructure will send a separate alert so issues can be resolved speedily.
The InfoSight system collects data from all the devices in a customer’s environment and then correlates it with data from HPE customers with similar IT environments. The system can pinpoint a potential problem so it’s quickly resolved—if the problem crops up again, the fix can be automatically applied. Alternatively, the system sends an alert so IT teams can clear up the issue quickly, Kulkarni adds. Take the case of a storage controller that failed because it doesn’t have power. Rather than assuming the problem relates exclusively to storage, the AIOps platform surveys the entire infrastructure stack, all the way to the application layer, to identify the root cause.
“The system monitors the performance and can see anomalies. We have algorithms that constantly run in the background to detect any abnormal behaviors and alert the customers before the problem happens,” says Kulkarni. The philosophy behind InfoSight is to “make the infrastructure disappear” by bringing IT systems and all the telemetry data into one pane of glass. Looking at one giant set of data, administrators can quickly figure out what’s going wrong with the infrastructure.
Kulkarni recalls the difficulty of managing a large IT environment from past jobs. “I had to manage a large data set, and I had to call so many different vendors and be on hold for multiple hours to try to figure out problems,” he says. “Sometimes it took us days to understand what was really going on.”
By automating data collection and tapping a wealth of data to understand root causes, AIOps enables companies to reallocate core personnel, including IT administrators, storage administrators, and network admins, consolidating roles as the infrastructure is simplified, and spending more time ensuring application performance. “Previously, companies used to have multiple roles and different departments handling different things. So even to deploy a new storage area, five different admins each had to do their individual piece,” says Kulkarni. But with AIOps, AI handles much of the work automatically so IT and support staff can devote their time to more strategic initiatives, increasing efficiency and, in the case of a business that provides technical support to its customers, improving profit margins. For example, Sercompe’s Cardoso has been able to reduce the average time his support engineers spend on customer calls, reflecting better customer experience while increasing efficiency.
Download the full report.
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review’s editorial staff.