OpenAI’s voice cloning AI model only needs a 15-second sample to work

1 year ago

OpenAI is offering constricted entree to a text-to-voice procreation level it developed called Voice Engine, which tin create a synthetic sound based connected a 15-second clip of someone’s voice. The AI-generated sound tin publication retired matter prompts connected bid successful nan aforesaid connection arsenic nan speaker aliases successful a number of different languages. “These mini standard deployments are helping to pass our approach, safeguards, and reasoning astir really Voice Engine could beryllium utilized for bully crossed various industries,” OpenAI said successful its blog post.

Companies pinch entree see nan acquisition exertion institution Age of Learning, ocular storytelling level HeyGen, frontline wellness package shaper Dimagi, AI connection app creator Livox, and wellness strategy Lifespan.

In these samples posted by OpenAI, you tin perceive what Age of Learning has been doing pinch nan exertion to make pre-scripted voice-over content, arsenic good arsenic reference retired “real-time, personalized responses” to students written by GPT-4.

First, nan reference audio successful English:

And present are 3 AI-generated audio clips based connected that sample,

OpenAI said it began processing Voice Engine successful precocious 2022 and that nan exertion has already powered preset voices for nan text-to-speech API and ChatGPT’s Read Aloud feature. In an question and reply pinch TechCrunch, Jeff Harris, a personnel of OpenAI’s merchandise squad for Voice Engine, said nan exemplary was trained connected “a operation of licensed and publically disposable data.” OpenAI told nan publication nan exemplary will only beryllium disposable to astir 10 developers.

AI text-to-audio procreation is an area of generative AI that’s continuing to evolve. While astir attraction connected instrumental aliases earthy sounds, less person focused connected sound generation, partially owed to nan questions OpenAI cited. Some names successful nan abstraction see companies for illustration Podcastle and ElevenLabs, which supply AI sound cloning exertion and tools nan Vergecast explored past year.

At nan aforesaid time, nan US authorities is trying to curb unethical uses of AI sound technology. Last month, nan Federal Communications Commission banned robocalls utilizing AI voices aft group received spam calls from an AI-cloned sound of President Joe Biden.

According to OpenAI, its partners agreed to abide by its usage policies that opportunity they will not usage Voice Generation to impersonate group aliases organizations without their consent. It besides requires nan partners to get nan “explicit and informed consent” of nan original speaker, not build ways for individual users to create their ain voices, and to disclose to listeners that nan voices are AI-generated. OpenAI besides added watermarking to nan audio clips to trace their root and actively show really nan audio is used.

OpenAI suggested respective steps that it thinks could limit nan risks astir devices for illustration these, including phasing retired voice-based authentication to entree slope accounts, policies to protect nan usage of people’s voices successful AI, greater acquisition connected AI deepfakes, and improvement of search systems of AI content.

Source The Verge