OpenAI Debuts an Audio Function-Early demos and use cases from a small-scale preview of the text-to-speech model. Termed Voice Engine, which the company has shared with approximately ten developers thus far, according to a spokesperson.
Early test results, for a feature that can read audibly words in a convincing human voice are being released by OpenAI. Revealing a new frontier for artificial intelligence and raising the specter of deepfake risks.
Early demos and use cases from a small-scale preview of the text-to-speech model. Termed Voice Engine, which the company has shared with approximately ten developers thus far, according to a spokesperson.
OpenAI Debuts an Audio Function decided against implementing the feature more broadly. Contrary to what it had earlier this month informed reporters.
OpenAI, according to a spokesperson, determined to scale back the release in response to feedback from educators, policymakers, industry experts, and creatives, among others. A previous press briefing stated that the organisation had intended to distribute the utility to a maximum of one hundred developers via an application process.
The company wrote in a blog post on Friday, “We recognize that generating speech that resembles people’s voices carries significant risks, which are especially prominent in an election year.” “We are engaging with US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.”
In some instances, other AI technologies have been employed to generate phoney voices. A phoney yet convincing phone call claiming to be from President Joe Biden in January advised New Hampshire residents not to vote in the primaries; this incident sparked concerns about artificial intelligence in advance of crucial global elections.
Voice Engine is unlike OpenAI’s previous attempts to generate audio content; it is capable of producing speech that resembles that of specific individuals, down to the cadence and intonation. A vocal recreation software requires only 15 seconds of recorded audio of an individual speaking.
Amid a tool demonstration, Bloomberg viewed a segment featuring OpenAI CEO Sam Altman, who provided a concise explanation of the technology in a wholly AI-generated voice that closely resembled his natural speech.
“With the proper audio configuration, it’s essentially a human-calibre voice,” said OpenAI product lead Jeff Harris. “It’s a pretty impressive technical quality.” However, according to Harris, “There’s obviously a lot of safety delicacy around the ability to really accurately mimic human speech.”
The Norman Prince Neurosciences Institute, an OpenAI developer partner presently operational at the non-profit health system Lifespan, is implementing technological advancements to assist patients in regaining vocal function.
The custom speech model developed by OpenAI is also capable of translating the generated audio into various languages. This renders it advantageous for audio industry enterprises such as Spotify Technology SA. OpenAI also highlighted additional advantageous implementations of the technology. Including the incorporation of a broader spectrum of voices into children’s educational materials.
OpenAI mandates that its partners participate in the testing program by consenting to its usage policies. Acquiring the consent of the original speaker prior to utilising their voice, and notifying audiences that the voices.
Before deciding whether to make the feature available to a wider audience. OpenAI is seeking input from outside experts, according to the company. Whether we ultimately deploy it widely ourselves or not,” the organisation stated in a blog post.
OpenAI expressed its desire that the software preview “encourages the necessity to strengthen societal resilience”. In the face of the challenges that more sophisticated AI technologies will present. Additionally, it aims to increase public awareness regarding deceptive AI content. Advance the development of methods to distinguish between authentic and AI-generated audio content.
Bimal Mardi is a Professional Content Writer. He works in First Santal Broadcast Network TV/ News channel in India. Bimal Mardi writes about Technology, Education and Tech Product Reviews