OpenAI unveils ‘magic new AI that can see, hear and speak

ChatGPT creator’s latest flagship product can respond as quickly as a human

Anthony Cuthbertson
Tuesday 14 May 2024 05:00 EDT
Comments
(Anadolu via Getty Images)

Your support helps us to tell the story

From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.

At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.

The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.

Your support makes all the difference.

OpenAI, the creator of viral chatbot ChatGPT, has unveiled a new AI model that can interact with the world via audio, vision and text in real time.

GPT-4o is the latest flagship product for the Microsoft-backed company, aiming to offer users a “more natural human-computer interaction”.

In a presentation on Monday, OpenAI said its latest AI could respond to queries in less than a third of a second – similar to human response time in conversation.

Using a smartphone’s camera and microphone, GPT-4o is capable of understanding audio and visual inputs, while using the speakers to respond in a personalised and natural voice.

OpenAI CEO Sam Altman said the new technology “feels like magic”, writing in a blog post that it was “the best computer interface” he had ever used.

“It feels like AI from the movies; and it’s still a bit surprising to me that it’s real,” he wrote.

“The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.”

Unlike its other advanced AI models, OpenAI said it would offer GPT-4o for free, making it available within the next few weeks.

In an effort to prevent misuse or potential harm, OpenAI said it carried out extensive testing that covered everything from cyber security to psychology.

“We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities,” the company explained in a blog post introducing the product.

“GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities... We will continue to mitigate new risks as they’re discovered.”

OpenAI acknowledged that its latest AI model has several limitations that it hopes to overcome with future versions.

Videos of the AI making mistakes showed GPT-4o switching between languages without being prompted, making errors with language translation, and mispronouncing someone’s name as “Nacho”.

The announcement comes just a day ahead of Google I/O, the tech giant’s biggest event of the year that is expected to have a heavy focus on artificial intelligence.

“All eyes will be on how AI becomes more integrated into connected devices, particularly smartphones, given the sheer scale of the opportunity,” Leo Gebbie, a principal analyst at CSS Insight, told The Independent ahead of the event.

“Google needs to clearly articulate the benefits of AI to avoid consumers succumbing to AI fatigue.”

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in