Insight and analysis on the information technology space from industry thought leaders.

How Does Inference at the Edge Fit Into Your AI Strategy?

Edge inference is an indicator of AI maturity, giving enterprises a significant competitive advantage over those with more nascent AI operations.

Industry Perspectives

August 8, 2024

6 Min Read
AI spelled out among code
Alamy

By Kevin Cochrane, CMO, Vultr

When more than 80% of 1,000 AI practitioners and leaders say that AI operations are moving to distributed edge environments, enterprises scaling AI should listen. That's the finding of a recent S&P Global Market Intelligence study commissioned by Vultr where 8 in 10 respondents reported that their organization is either "likely" or "extremely likely" to conduct more training and inference at the edge soon.

Inference at the edge is both a goal and an indicator of AI maturity, and enterprises that can achieve it see significant advantages over their competitors with more nascent AI operations. The operating model to support distributed inference at the edge is sophisticated but accessible to enterprises that make suitable AI infrastructure investments. The research from S&P Global makes it clear that such investments pay off.

Here's what mature AIOps looks like, and how AI inference in edge environments fits into the picture.

Why Inference at the Edge?

It's no longer a question of whether enterprises need to pursue AI maturity; it's a question of how quickly they can get there. According to the S&P Global study, nearly three-quarters of respondents (72%) reported that their organizations had reached mature AI use. These AI-driven organizations performed significantly better in 2023 than they did in 2022 across key business outcomes:

Related:AI Basics: A Quick Reference Guide for IT Professionals

  • 90% saw improved customer satisfaction.

  • 91% experienced increased revenue.

  • 89% gained market share.

  • 88% achieved cost reduction/margin expansion.

  • 87% enhanced their risk management.

These significant improvements are only possible when companies make substantial investments in AI, which means deploying models close to where business activities occur. This translates to distributing AI operations across the enterprise and delivering inference in edge environments.

Inference at the edge makes sense for three critical reasons:

  1. AI application performance: Whether an AI app requires ultra-low latency or needs to process large volumes of data, backhauling data over significant distances is either operationally impractical or financially infeasible.

  2. Relevance to local user bases: To make AI applications maximally valuable to end users, models must reflect the unique cultural attributes of each community in each geography. Local data science teams must apply regional knowledge to fine-tune models to these particular attributes.

  3. Data residency and privacy requirements: Proper data governance requires that data residency be maintained at all times. To ensure compliance, fine-tuning on local data must occur in the regions where the data is collected.

Related:AI Quiz 2024: Test Your AI Knowledge

This helps explain why, according to the S&P Global study, enterprises with mature AIOps have, on average, 158 models in production at any given time. It also provides insight into a recent Menlo Ventures study that found over 90% of funds invested in machine learning operations (MLOps) and large language model operations (LLMOps) are dedicated to inference rather than model training. Enterprises should train centrally, fine-tune regionally, and deploy and monitor locally.

Inference at the Edge Starts with Centralized Model Development

Mature AI operations start from a central location within the enterprise and extend outward to edge environments. A hub-and-spoke operating model is the most efficient way to enable inference at the edge.

The "hub" is the AI Center of Excellence, a centralized development facility where the enterprise's primary data science team develops and trains the foundation models used across the enterprise. For operational and cost efficiency, enterprises frequently draw on open-source models sourced from public registries. (In other instances, AI Center of Excellence data scientists retrain existing models within the enterprise's inventory to apply them to new use cases.)

Once trained on proprietary data, these models are containerized and stored in a private registry. This ensures they are discoverable and available across the enterprise by all regional data science teams working in edge locations in the enterprise's different geographies.

The hub-and-spoke model allows the core data science team to control the initial model development and training, ensuring consistency and quality. Meanwhile, regional teams can focus on fine-tuning and deploying these models to meet local needs, leveraging their unique insights into local user bases and regulatory requirements to keep models relevant to each audience and compliant with local legislation.

Fine-Tuning, Deployment, and Monitoring Occur in Edge Data Center Locations

Data science teams working in different geographies set up Kubernetes clusters in edge locations and deploy the containerized AI models to these edge clusters, where the data scientists fine-tune the models based on regional or local data.

Often, certain proprietary data — especially highly sensitive or confidential information — is excluded from the core training data and instead stored as embeddings in vector databases. This practice enhances the model outputs' quality, accuracy, and factuality.

Storing such data as embeddings offers three key benefits:

  1. Incorporating up-to-date information: Data scientists can include real-time information from external sources that may not be present in the original training data.

  2. Increasing transparency: By providing sources for the retrieved context, the model's outputs become more transparent. This allows data scientists to evaluate model performance more efficiently.

  3. Reducing retraining needs: As new data becomes available, embeddings minimize the need for complete model retraining.

The data scientists working in edge data center locations then move the fine-tuned models into production and leverage observability tools to monitor model performance continuously. This localized monitoring ensures that data scientists can quickly adapt the models to account for any changes or anomalies in the local environment and correct any instances of model drift or bias.

Mature AI Can't Exist Without Responsible AI

As we enter the age of AI regulation, evidenced by the passage of the EU AI Act earlier this year, enterprises moving toward mature AI must make responsible AI practices foundational to their AI operations. Operationalizing responsible AI at scale requires two critical components:

Comprehensive observability across the AI/ML lifecycle: From initial training to deployment to continuous monitoring, observability ensures models operate ethically, securely, and in compliance with given standards or regulations. This includes tracking model performance, identifying potential biases, and ensuring transparency in decision-making processes.

Robust data governance: This encompasses data quality and lineage, federated data governance, model governance, and data security and privacy. Effective data governance ensures that data used in AI models is accurate, consistent, and compliant with regulatory requirements, safeguarding both the enterprise and its customers.

Responsible AI through model observability and data governance must underpin all AI operations. Without it, enterprises risk running afoul of legislation, exposing themselves to crippling sanctions. Moreover, minimizing the value of being regarded as a standard-bearer for ethical AI puts brand reputation at risk, which can be just as damaging as any financial penalties incurred. Building responsible AI practices into the infrastructure for mature AI and inference at the edge ensures enterprises can scale their AI operations confidently and sustainably.

Looking Ahead

AI inference at the edge is the end-state for mature AI operations. The roadmap is available, and as enterprises invest in AI infrastructure and partner with cloud providers that can guide them as they adopt comprehensive operating models like hub-and-spoke, they position themselves to leverage AI's full potential and reap significant returns on their investments.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like