technology

Microsoft’s VALL-E artificial intelligence mimics human voice perfectly after just 3 seconds


Microsoft is getting into the artificial intelligence game too (Credit: Getty)

Artificial intelligence is having a bit of a moment right now, thanks to the likes of Dall-E and ChatGPT.

Microsoft apparently wants in on the act and has revealed its own artificial intelligence program that can creepily mimic a human voice.

Called VALL-E, all it needs is a three second sample of the target voice and it can generate a super-high-quality text-to-speech (TTS) example using that exact same voice.

Which is, frankly, a bit spooky.

At present, VALL-E can only work by converting text into speech in the chosen voice. It can’t generate new content in the same way as ChatGPT’s chatbot. It’s also not available for public consumption.

Microsoft calls VALL-E a ‘neural codec language model’ and says it trained it on 60,000 hours of English language speech from over 7,000 speakers. All this data was pulled from an audio library owned by Facebook’s parent company, Meta.

You can listen to VALL-E yourself and compare it to the original samples by following this link.

‘VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt,’ Microsoft’s engineers explain in the abstract summary of their white paper about the AI.

Readers Also Like:  Change the Face ID setting, or anyone can get into your iPhone

‘Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity.

‘In addition, we find VALL-E could preserve the speaker’s emotion and acoustic environment of the acoustic prompt in synthesis.’

A diagram showing the Vall E model’s input and output (Credit: Microsoft)

The obvious question following all this is where could this lead?

On the one hand, this kind of technology could be hugely beneficial for the world. Stephen Hawking famously used a text-to-speech generator to continue his work in physics even though he suffered from motor neurone disease ALS.

Then again, it doesn’t take much to imagine a dark future where scammers or criminals are able to use this kind of technology to impersonate someone without their knowledge.

And we’ll just casually mention here that Microsoft doesn’t have the best track record when it comes to artificial intelligence bots.


MORE : Mental health app faces backlash for testing AI chatbot to counsel 4000 users


MORE : Earth’s ozone layer is on track to recover within four decades





READ SOURCE

This website uses cookies. By continuing to use this site, you accept our use of cookies.