AI-Powered Apps Bring a New Level of Observability Challenges

For at least a decade, the IT industry has been shaped by a paradox: The more efficient and scalable technologies become, the more difficult they are to monitor, observe, and manage.

Generative AI is writing a major new chapter in this story. If applications and digital services become dependent on complex AI engines, it will become even harder than it already is to observe, monitor, debug, and predict the behavior of the software that powers modern workloads.

Here's why, and what, AI could mean for observability and modern applications.

Observability Challenges: A Brief History

The IT industry has a long history of shifting toward technologies that are increasingly difficult to manage, monitor, and observe than those that they supplant.

The story goes back at least to the advent in the mid-2000s of "first-generation" cloud services, like virtual machine (VM) instances. When organizations moved workloads into the cloud initially, they sacrificed a degree of observability because they no longer had access to the physical infrastructure that powered their workloads. They could only monitor and observe the virtual infrastructure to which cloud providers gave them access.

Then, in the 2010s, the adoption of cloud-native technologies, such as containers and serverless, introduced another layer of complexity to monitoring and observability. The challenge in this regard wasn't about a lack of access to infrastructure, but rather the difficulty of analyzing the complex relationships that exist between the various parts of a distributed, cloud-native hosting environment. It's a lot harder to pinpoint the root cause of a performance issue when your application is sharded into dozens of microservices spread across a cluster of servers than it would be if the application were a monolith running on a single VM.

Now, I'm not saying that the adoption of modern technology was a bad thing. The advantages that the cloud and cloud-native architectures offer — such as greater scalability and the ability to consume infrastructure resources more efficiently — outweigh the observability challenges that they introduce.

Still, it's worth noting that modern technology comes with increased observability difficulties — which also lead to an increase in manageability challenges, because it's harder to manage workloads and hosting environments when it's harder to predict their behavior and troubleshoot problems due to monitoring and observability challenges.

Monitoring and Troubleshooting AI: The Next Frontier in Observability

Increased integration of AI into software applications and cloud environments will take the monitoring and observability challenges I just described to a whole new level.

The reason why is simple: When your application relies on complex machine learning models to make decisions, even the people who wrote the application code can't always predict or understand why the application processes data in the way it does. This is a point driven home by the tendency of AI-powered chatbots, like ChatGPT, to "hallucinate," or (to put it less euphemistically) make stuff up.

Hallucination in AI-based applications happens because the machine learning process may end up training models to produce results that software developers didn't anticipate or desire. In other words, when you write an application that relies on a machine learning algorithm to process data instead of relying solely on instructions that are written out in computer code, you lose the ability to interpret the application's output in a precise way.

What this means from a monitoring and observability standpoint is that traditional methods of debugging and troubleshooting applications won't always work for AI-powered apps. If a traditional app doesn't perform as it should, you can collect monitoring and observability data from it, and possibly run some traces, to track the problem back to the code that triggered it. Then, you update your code and move on.

But with an AI-driven app, this approach won't work. You might be able to detect unusual application behavior using monitoring and observability tools, but tracing them back to their root cause is much more difficult because the cause may not be some specific lines of code within the app. The problem might instead stem from a fluke in the training data that was used to teach the application how to make decisions.

To be sure, there are methods and tools available to help debug machine learning models. But this task is a less exact science than traditional application monitoring, observability, and troubleshooting. Machine learning debugging strategies are useful for improving models overall, but they can't always tell you exactly why an AI-powered app produces a certain result or how to stop it from doing so.

So, in short, as more and more applications rely on AI to help them process data and make decisions, it's going to get a lot harder for admins to figure out exactly why the apps do the things they do and fix problems when app don't behave as they should.

Conclusion: Is AI the Next Frontier in Observability?

The IT industry has been pretty successful in solving the observability challenges that have arisen in the past. Monitoring and observability tools have evolved significantly over the past decade to make it easier to troubleshoot software hosted in complex, cloud-native environments, and the struggles that teams faced in this regard in years past are no longer very serious in most cases.

Hopefully, the industry will make the same strides within the realm of observability for AI-powered apps. But until it does — until we have better tools and techniques at our disposal for monitoring and observing applications that depend on complex AI models — the ability for businesses to benefit from AI will be constrained by their inability to troubleshoot AI-based applications as effectively as traditional apps.

About the author

Christopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” was published by MIT Press.

Comments

Plain text