“AIOps,” which stands for “AI for IT operations,” refers to the way data and information from a dev environment is managed by an IT team — in this case, using AI. AIOps platforms leverage big data, machine learning, and analytics to enhance IT operations via monitoring, automation, and service desk functions with proactive and personal insights, enabling the use of multiple data sources and data collection methods. In theory, AIOps can provide faster resolutions to outages and other performance problems, in the process decreasing the costs associated with IT challenges.
The benefits of AIOps are driving enterprise adoption. Eighty-seven percent of respondents to a recent OpsRamp survey agree that AIOps tools are improving their data-driven collaboration, and Gartner predicts that AIOps service usage will rise from 5% in 2018 to 30% in 2023.
But when deploying an AIOps solution, businesses without a clear idea of potential blockers can run into challenges. That’s why it’s important to have a holistic understanding of AIOps before formulating a strategy.
What is AIOps?
AIOps platforms collect data from various IT operations tools in order to automatically spot issues while providing historical analytics. They typically have two components — big data and machine learning — and require a move away from siloed IT data in order to aggregate observational data alongside the engagement data in ticket, incident, and event recording.
As Seth Paskin, director of operations at BMC Software, writes: “The outcomes IT professionals expect from AIOps can be categorized generally as automation and prediction … Their first expectation from AIOps is that it will allow them to automate what they are currently doing manually and thus increase the speed at which those tasks are performed. Some specific examples I’ve heard include: correlate customer profile information with financial processing applications and infrastructure data to identify transaction duration outliers and highlight performance impacting factors; evaluate unstructured data in service tickets to identify problem automation candidates; categorize workloads for optimal infrastructure placement; and correlate incidents with changes, work logs, and app dev activities to measure production impact of infrastructure and application changes.”
An AIOps platform canvasses data on logs, performance alerts, tickets, and other items using an auto-discovery process that automatically collects data across infrastructure and application domains. The process identifies infrastructure devices, running apps, and business transactions and correlates all the data in a contextual form. Automatic dependency mapping determines the relationships between elements such as the physical and virtual connections at the networking layer by mapping app flows to the supporting infrastructure and between the business transactions and the apps.
AIOps’ automated dependency mapping has another benefit: helping to track relationships between hybrid infrastructure entities. AIOps platforms can create service and app topology maps across technology domains and environments, allowing IT teams to accelerate incident response and quantify the business impact of outages.
To identify patterns and predict future events, like service outages, AIOps employs supervised learning, unsupervised learning, and anomaly detection based on expected behaviors and thresholds. Particularly useful is unsupervised machine learning, which enables AIOps platforms to learn to recognize expected behavior and set thresholds across data and performance metrics. The platforms can analyze event patterns in real time and compare those to expected behavior, alerting IT teams when a sequence of events (or groups of events) demonstrates activity that indicates anomalies are present.
The insights from AIOps platforms can be turned into a range of intelligent actions performed automatically, from expediting service desk requests to end-to-end provisioning to deployment of network, compute, cloud, and applications. In sum, AIOps brings together data from both IT operations management and IT service management, allowing security teams to observe, engage, and act on issues more efficiently than before.
Not every AIOps deployment goes as smoothly as planned. Challenges can stand in the way, including poor-quality data and IT team errors. Employees sometimes face difficulty in learning how to use AIOps tools, and handing over control to autonomous systems can pose concerns among the C-Suite. Moreover, adopting new AIOps solutions can be time-consuming — a majority of respondents to the OpsRamp survey said it takes three to six months to implement an AIOps solution, with 25% saying that it takes greater than six months.
Because AIOps platforms rely so heavily on machine learning, challenges in data science can impact the success of AIOps strategies. For example, getting access to quality data to train machine learning systems isn’t easy. According to a 2021 Rackspace Technology survey, poor data quality was the main reason for machine learning R&D failure among 34% of respondents. Thirty-one percent said they lacked production-ready data.
Beyond data challenges, the skills gap also presents a barrier to AIOps adoption. A majority of respondents in a 2021 Juniper report said their organizations were struggling with expanding their workforce to integrate with AI systems. Laments over the AI talent shortage have become a familiar refrain from private industry — O’Reilly’s 2021 AI Adoption in the Enterprise paper found that a lack of skilled people and difficulty hiring topped the list of challenges in AI, with 19% of respondents citing it as a “significant” blocker.
Unrealistic expectations from the C-suite are another top reason for failure in machine learning projects. While 9 in 10 of C Suite survey respondents characterized AI as the “next technological revolution,” according to Edelman, Algorithmia found that a lack of executive buy-in contributes to delays in AI deployment.
Successfully adopting AIOps isn’t a sure-fire thing, but many businesses find the benefits worth wrestling with the challenges. AIOps systems reduce the torrent of alerts that inundate IT teams and learn over time which types of alerts should be sent to which teams, reducing redundancy. They can be used to handle routine tasks like backups, server restarts, and low-risk maintenance activities. And they can predict events before they occur, such when network bandwidth is reaching its limit.
As Accenture explains in a recent whitepaper, AIOps ultimately improves an IT organization’s ability to be an effective partner to the business. “An IT operations platform with built-in AIOps capabilities can help IT operations proactively identify potential issues with the services and technology it delivers to the business and correct them before they become problems,” the consultancy wrote. “That’s the value of having a single data model that service and operations management applications can share seamlessly.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more