One of verbal synthetic intelligence’s most pervasive issues is the odd delays that appear when a caller is speaking to a machine. A three-way partnership between Iot telephone support company Phonely, inference optimization platform Maitai, and chip manufacturer Groq has led to a breakthrough.
The collaboration has allowed Phonely to lower response times by more than 70 % while boosting accuracy from 81.5 % to 99.2 % across four model iterations, beating GPT-4o’s benchmark of 94.7 % by 4.5 percentage points. The improvements stem from Groq’s fresh ability to quickly move between many specialised AI models without added overhead, orchestrated through Maitai’s optimization platform.
The breakthrough addresses what industry experts refer to as the “uncanny river” of voice AI: the subtle cues that artificial conversations have that are decidedly non-human. The implications for call centers and customer services may be revolutionary because 350 people agents are being replaced by one Phonely customer each month alone.
Why Artificial phone calls however good humanoid: the four-second problem
Traditional large vocabulary systems, such as OpenAI’s GPT-4o, have long struggled with what appears to be a straightforward problem: responding rapidly enough to sustain normal conversation flowing. While a few seconds of delay hardly files in text-based meetings, the same wait seems endless during life telephone conversations.
” One of the things that most people don’t realize is that major LLM providers, such as OpenAI, Claude, and others have a very high degree of latency variance”, said Will Bodewes, Phonely’s founder and CEO, in an exclusive interview with VentureBeat. When talking to a voice AI on the phone, 4 seconds feels like an eternity because most voice AI today feels non-human.
The issue occurs roughly once every ten requests, which means that standard conversations inevitably include at least one or two awkward pauses that immediately reveal the artificial nature of the interaction. For businesses considering AI phone agents, these delays have created a significant barrier to adoption.
” This kind of latency is unacceptable for real-time phone support,” Bodewes said. Legacy LLM providers haven’t cracked anything in the voice realm, aside from latency, conversational accuracy and humanlike responses.
How three startups solved AI’s biggest conversational challenge
The ability to instantly switch between multiple specialized AI model variants without any performance loss was the result of Groq’s development of what the company calls “zero-latency LoRA hotswapping.” LoRA, or Low-Rank Adaptation, enables developers to make light, task-specific modifications to existing models rather than creating entirely new ones from scratch.
” Groq’s combination of fine-grained software controlled architecture, high-speed on-chip memory, streaming architecture, and deterministic execution means that it is possible to access multiple hot-swapped LoRAs with no latency penalty”, explained Chelsey Kantor, Groq’s chief marketing officer, in an interview with VentureBeat. The LoRAs are kept and managed in SRAM in addition to the model weights from the originals.
This development in infrastructure enabled Maitai to develop what model performance expert Christian DalSanto refers to as a “proxy-layer orchestration” system. ” Maitai acts as a thin proxy layer between customers and their model providers”, DalSanto said. This enables us to dynamically choose and optimize the best model for each request, automatically utilizing evaluation, optimizations, and resiliency strategies like fallbacks.
The system works by gathering performance data from each interaction, identifying flaws, and iteratively improving the models without the customer’s intervention. ” Since Maitai sits in the middle of the inference flow, we collect strong signals identifying where models underperform”, DalSanto explained. These” soft spots” are clustered, labeled, and gradually fine-tuned to address particular flaws without causing regressions.
The figures that led to AI’s human-like breakthrough range from 81 % to 91 % accuracy.
The results demonstrate significant improvements across multiple performance dimensions. Time to the first token, or how quickly an AI reacts, dropped by 73.4 % from 661 milliseconds to 176 milliseconds at the 90th percentile. Overall completion times decreased by 74.6 % from 1, 446 milliseconds to 339 milliseconds.
Perhaps more significantly, accuracy improvements followed a clear upward trajectory across four model iterations, starting at 81.5 % and reaching 99.2 % — a level that exceeds human performance in many customer service scenarios.
According to Bodewes,” We’ve been seeing about 70 % + of people who call into our AI not being able to tell a person apart from another person.” The dead giveaway that it was an AI was, or was, patience. With a custom fine tuned model that talks like a person, and super low-latency hardware, there isn’t much stopping us from crossing the uncanny valley of sounding completely human”.
The improved performance directly affects the company’s results. One of our biggest customers experienced a 32 % increase in qualified leads when compared to a previous version using outdated, cutting-edge models, according to Bodewes.
350 human agents replaced in one month: call centers go all-in on AI
Call centers are now facing increasing pressure to lower costs while maintaining service quality as a result of the improvements. Traditional human agents demand training, coordinated scheduling, and significant overhead costs that AI agents can cut.
” Call centers are really seeing huge benefits from using Phonely to replace human agents”, Bodewes said. One of the call centers we work with is essentially replacing 350 human agents with Phonely just this month. This is a game changer from the perspective of the call center because they don’t have to coordinate supply and demand, train agents, and manage human support agent schedules.
The technology shows particular strength in specific use cases. Beyond what traditional providers are able to handle,” Phonely really excels in a few areas, including industry-leading performance in appointment scheduling and lead qualification in particular,” Bodewes said. The business has forged partnerships with major corporations that deal with customers in the automotive, legal, and insurance industries.
The hardware edge: why Groq’s chips make sub-second AI possible
The hardware foundation that makes the multi-model approach viable is provided by Groq’s specialized AI inference chips, known as Language Processing Units ( LPUs ). LPUs optimize specifically for the sequential nature of language processing, unlike general-purpose graphics processors typically used for AI inference.
” The LPU architecture is optimized for precisely controlling data movement and computation at a fine-grained level with high speed and predictability, allowing the efficient management of multiple small’ delta’ weights sets ( the LoRAs ) on a common base model with no additional latency”, Kantor said.
Additionally, the cloud-based infrastructure addresses scalability concerns that have historically limited AI deployment. The benefit of using a cloud-based solution like GroqCloud is that Groq handles orchestration and dynamic scaling for our customers for any AI model we offer, including carefully tuned LoRA models, Kantor explained.
For enterprises, the economic advantages appear substantial. ” Groq can provide customers with the lowest cost per token without sacrificing performance as they scale,” Kantor said.” The simplicity and efficiency of our system design, low power consumption, and high performance of our hardware, allows.”
Same-day AI deployment: how businesses skip months of integration planning
One of the partnership’s most compelling aspects is implementation speed. Maitai’s strategy makes it possible for businesses that already employ general-purpose models to transition on the same day, as opposed to months-long traditional AI deployments that call for months of integration work.
” We typically transition companies that are already in production using general-purpose models to Maitai on the same day, with no disruption,” DalSanto said. ” We begin immediate data collection, and within days to a week, we can deliver a fine-tuned model that’s faster and more reliable than their original setup”.
This quick deployment capability addresses a common issue with AI projects: protracted implementation delays and poor ROI. Companies can maintain their existing API integrations while gaining access to continuously improving performance thanks to the proxy-layer approach.
The future of enterprise AI: specialized models replace one-size-fits-all
The collaboration reflects a broader shift in enterprise AI architecture, moving away from monolithic, general-purpose models to specialized, task-specific systems. We’re seeing a growing demand from teams that break their applications down to less specialized, highly specialized workloads, each gaining from individual adapters, DalSanto said.
This trend reflects maturing understanding of AI deployment challenges. Enterprises are increasingly acknowledging the value of purpose-built solutions that can be continuously refined based on real-world performance data rather than assuming that all models will perform satisfactorily.
” Multi-LoRA hotswapping eliminates traditional cost and complexity barriers, allowing companies to deploy faster, more accurate models that are specifically tailored for their applications,” DalSanto said. ” This fundamentally shifts how enterprise AI gets built and deployed”.
As technology develops, the technical foundation also enables more sophisticated applications. Using Groq’s infrastructure, dozens of specialized models can be run on one instance, potentially enabling businesses to develop highly personalized AI experiences for various customer types and use cases.
” Multi-LoRA hotswapping enables low-latency, high-accuracy inference tailored to specific tasks”, DalSanto said. Our roadmap places more emphasis on infrastructure, tools, and optimization to make fine-grained, application-specific inference the new standard.
The partnership demonstrates that technical limitations that are once deemed insurmountable can be overcome by specialized infrastructure and careful system design for the broader conversational AI market. As more enterprises deploy AI phone agents, the competitive advantages demonstrated by Phonely may establish new baseline expectations for performance and responsiveness in automated customer interactions.
The success also supports the emerging business model of AI infrastructure companies working together to address challenging deployment issues. As specialized capabilities come together to deliver solutions that surpass what any single provider could achieve independently, this collaborative approach may accelerate innovation across the enterprise AI sector. If this partnership is any indication, the era of obviously artificial phone conversations may be coming to an end faster than anyone expected.