According to The Information, the company has reorganised several engineering, product and research teams around improving its voice technology, after internal assessments found that OpenAI’s audio models lag behind its text-based systems in both accuracy and responsiveness.
While OpenAI’s text models have become synonymous with cutting-edge artificial intelligence, its spoken interactions through ChatGPT have reportedly struggled to feel as fluid or reliable.
That gap matters because OpenAI’s first hardware product - expected in 2026 - is designed to be largely audio-first.
Rather than relying on screens, the device is supposedly intended to let users interact naturally through speech, with AI responding in real time - handling interruptions, and even speaking simultaneously with the user.
Internally, OpenAI is said to believe this kind of conversational flow is essential if AI is to feel less like software and more like a companion.
The company is reportedly developing a new audio-model architecture that produces more emotive, natural-sounding responses, while also delivering deeper and more accurate answers.
The model is expected to launch in the first quarter of 2026 and would underpin OpenAI’s wider push into audio-driven experiences.
This strategy places OpenAI alongside rivals such as Google, Apple, Amazon and Meta, all of which are exploring post-smartphone devices built around AI.
The difference, however, is OpenAI’s ambition to strip things back.
Former Apple design chief Jony Ive, who is working with OpenAI following its multibillion-dollar acquisition of his startup io, has argued that screenless devices could help reduce digital addiction rather than amplify it.
Still, there is a hurdle to clear, as many ChatGPT users rarely speak to the chatbot at all, either because they are unaware of the feature or unimpressed by its current performance.