Insight and analysis on the information technology space from industry thought leaders.

Speak Easy: LLMs and the Future of Natural Voice Interactions

Large language models are poised to revolutionize voice assistance, enabling more natural, human-like, precise, and contextually aware interactions.

Industry Perspectives

August 22, 2024

6 Min Read
voice assistant asking "Hi, how can I help you?" on a laptop screen
Alamy

By Lakshmikanth Alluru

Large language models (LLMs) are on the brink of revolutionizing voice assistance technology, promising to enhance accuracy, understanding, and overall user experience. Estimates show that artificial intelligence (AI) in the voice assistants market will grow significantly, reaching $31.9 billion (USD) by 2033, with a compound annual growth rate (CAGR) of 28.5% during the forecast period.

According to Juniper Research, LLM-based chatbots could soon handle as much as 70% of customer interactions. This trend indicates a significant shift toward more advanced AI-driven communications, enhancing the user experience by providing more natural and contextually aware responses.

Broad general-purpose assistants like Alexa, Siri, and Google Assistant will improve overall, become much more conversational, and engage in human-like dialogue. Beyond these well-known voice assistants, LLMs can enable new voice agent types in business-to-business (B2B) and business-to-consumer (B2C) settings. In B2B settings, LLM-powered voice agents can automate/replace human phone calls to complete tasks, such as scheduling appointments, processing leads for small businesses, and verifying health insurance benefits. In B2C settings, a newer wave of voice assistants will emerge for use cases like tutoring, therapy, and companionship, making these services more accessible and cost-effective.

Related:Should You Specialize in LLM Development? Probably Not

Understanding LLMs in Voice Assistance

Unlike basic natural language processing (NLP), which often results in robotic, predefined responses from voice assistants, LLMs are trained on extensive datasets, giving them the potential to produce contextually appropriate and conversationally fluent answers. They differ from traditional voice assistants in the following ways: 

  • Understanding complex queries. Their expanded knowledge base enables LLMs to understand complex and ambiguous queries, moving beyond simple commands to grasp user intent in context. For example, while current voice assistants can quickly respond to direct questions like "Is the Apple Watch waterproof?" they may struggle with more nuanced queries such as "Can I swim with an Apple Watch on?" LLMs have the potential to bridge this gap by considering user context, previous interactions, and subtle conversational nuances to provide more accurate and helpful responses.

  • Maintained conversational context. LLMs have the potential to generate human-like responses and maintain conversational context across multiple turns. This adaptability enables users to speak more naturally, with LLMs efficiently handling interruptions and follow-up questions.

  • Broader task support. They can support a broader range of tasks, promising a more comprehensive language processing experience. (It's important to note that LLMs are generally more computationally expensive than traditional NLP models.)

  • Language and dialect recognition. LLMs can recognize multiple languages, especially high-resource languages.

Related:AI Quiz 2024: Test Your AI Knowledge

Current Business Implications and Use Cases

Global leaders are increasingly adopting voice assistants in business settings, with 88% believing that voice assistant technology can contribute to the growth of their organizations. The ability of voice assistants to automate routine tasks, enhance customer service, and improve overall operational efficiency are the catalysts driving this adoption. Voice is already one way to interact in many applications. In the future, more apps will have voice as one of the input modalities.

eMarketer reports on how Gen Z is driving voice assistant use growth. Their familiarity with technology and preference for hands-free interactions contribute to their increased adoption of voice assistants. Key use cases include playing music, listening to podcasts, asking questions, controlling smart devices, and making purchases.

Ensuring Accuracy and Reducing Hallucinations

One challenge in implementing LLMs for voice assistance is ensuring accuracy and minimizing "hallucinations" — instances where the model generates false or misleading information. Several strategies can help address this issue, such as automatic evaluations using other LLMs and reference datasets, human evaluation and user feedback to measure response quality, and retrieval-augmented generation (RAG) to ground LLMs to pull from specific, verified data sources. Human oversight is still needed for sensitive use cases.

Other tactics include implementing strategies to detect and handle potential hallucinations and encouraging LLMs to admit uncertainty by saying "I don't know" when appropriate. Organizations can establish baseline hallucination rates and set goals for reduction. Citing sources and asking clarifying questions can also help minimize inaccurate responses.

Implementing LLM-Powered Voice Assistants

Organizations looking to implement LLM-powered voice assistants may consider developing robust quality evaluation processes, including human and automated metrics. Testing with diverse datasets helps measure and reduce potential biases. Exploring multiple LLM options can determine the best fit for specific use cases. Implementing privacy safeguards and clear data usage policies is essential, as is allowing users to opt-in for personalized experiences and data sharing.

Privacy and security implications are important considerations when using LLM-powered voice assistants. On-device processing can help protect user privacy, while precise data collection and usage policies, including clear explanations of what data is collected, how it is used, and how it is stored, can build trust. As voice assistants execute more complex tasks, robust security measures, such as encryption and user authentication, will be crucial to prevent potential misuse or unauthorized access.

While LLMs show improved accuracy for common languages and dialects, they may need help with low-resource languages and regional variations. To address this, developers can utilize diverse training data to improve accuracy on accents and idioms. Implementing strategies for handling low-resource languages is also crucial. Additionally, exploring multimodal integration of text, voice, and visual inputs can lead to better understanding and more comprehensive language processing capabilities.

Several trends are shaping the future of LLM-powered voice assistance, including natural human-like conversation capabilities, leveraging multimodal inputs, and the emergence of voice agents to perform actions on users' behalf. As technologies grow, voice assistants will be capable of executing complex, multi-step tasks and detecting emotions, with broader contextual awareness integration. Additionally, voice interfaces could become standard across various products and services.

Recent demonstrations, such as Apple's updates to Siri — improved contextual understanding, enhanced functionality through Apple Intelligence, and ChatGPT integration — showcase some of these capabilities. While no organization has fully implemented LLM-powered voice assistants, the technology is rapidly advancing. Other examples include voice AI technology that helps contact centers reduce hold times and dropped calls, AI that answers phone calls and books appointments seamlessly, and AI that answers questions and takes reservations.

These developments bring both opportunities and challenges for enterprises. Improved customer support automation through more natural, conversational interactions is now feasible. Enhanced accessibility for voice-activated interface users creates additional interaction opportunities. Introducing voice-driven interfaces into previously unfeasible products and services may open new avenues for user engagement and tailored experiences.

It's vital for organizations to carefully consider ethical implications of voice cloning and possible abuse. As these technologies become more common at work and home, companies need to establish clear rules and ethical AI practices.

Integrating LLMs into voice assistance technology marks a significant leap forward, offering more natural, human-like, accurate, and contextually aware interactions. Organizations and developers can embrace these advancements and consider ethical standards, privacy, and diverse training data to fully capitalize on the potential of LLM-powered voice assistants.

About the Author:

Lakshmikanth Alluru is a seasoned principal product manager with over 10 years of experience in consumer product management and 15 years in creating software products. He has a proven track record at top-tier companies including LinkedIn, Amazon, IMDb, Deloitte, and IBM. Lakshmikanth has a strong background in leveraging AI/ML to drive engagement and growth. He holds an MBA from UCLA's Anderson School of Management and a master's degree in engineering management from Duke University. Connect with him on LinkedIn.

The views and opinions expressed in this article are those of the author and may not reflect those of his employer.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like