Voices produced by AI become indistinguishable from human voices (also in Spanish)

In 2018, experts agreed false (AI-generated votes) is still far from changing the democratic process. There is no doubt that five years of intensive development have passed since then. Today, in 2023, the technology to fake audio already exists… in Spanish.

This process confused experts because of the media coverage given to it explosion generative artificial intelligence doesn’t put much emphasis on that. When analyzing ChatGPT’s ability to answer licensing exams, we don’t see how the voice cloning training algorithm has advanced by leaps and bounds.

The consequences are already visible in Mexico, especially in CDMX, where in a month it will be decided who will be the candidate of Morena, the party currently in power, to govern the entity.

It all started with a controversial WhatsApp audio. In it, a suspect Martí Batres, the current Head of the CDMX Government, is heard conspiring to stop the candidacy of Omar García Harfuch, one of the candidates of the Morena party, the most popular party in the country, for the next municipal elections. year. .

This audio snippet, originally published in a TikTok post, became the subject of widespread debate about the veracity of the sound. Is it really Martí Batres? Is the audio recording evidence of a possible internal conspiracy within the party? Batres himself was quick to express his position on the matter: this is not me, this is a voice generated by artificial intelligence, he said in a tweet.


AI Safety Summit 2023: major countries seek to contain artificial intelligence, but also exploit it

The United States, China and other countries signed a statement coordinated by Britain warning that AI could be “catastrophic,” but none appeared to suggest that the technology’s development should be slowed.


In the audio, fillers, breathing, capital accents can be detected, and the voice is very similar to Martí. Immediately the fact-checking journalist community raised their hands. User X also didn’t take long to guess the answer. There are free audio analysis and tools used to identify cloned voices. Some people categorically confirmed that the audio was fake. Another way, for example El Sabueso, from Animal Político, stated that there was no conclusive evidence that could determine the truth. Other journalists say artificial intelligence is not yet developed enough to do such things. Some users commented: “Artificial intelligence doesn’t sound like this in Spanish.” The elements to reach a conclusive answer may be scarce, but one thing is certain: these last two beliefs are outdated.

In one of the user verification

However, the software that users show is software that can transform input send a text message to output voice. Many analyzes start from this: that if there is artificial intelligence intervention in audio, it must be used by programs that convert text to speech. The level of sophistication of WhatsApp audio means that this is not possible with software like this. This is where verification has to be more clever, as there are companies that dub voices, and not text, to produce and imitate other voices.


Collage of open mouth screaming next to sound level
How worried would you be if they could spoof your voice with artificial intelligence?

Voice spoofing attacks are real, but researchers remind us that these attacks have been around for a long time.


This generational AI system and “speech-to-speech” voiceover, at least when we talk about software that I know in Spanish, his imitation ability is truly extraordinary… the rate of progress in Spanish is very high,” said Natalia Martos Díaz, former head of Tuenti legal and now CEO of Legal Armya Spanish company dedicated to protecting and advising companies that create software voice cloning for charity purposes.

Natalia stated that, if the controversial audio of alleged Martí Batres was indeed generated by AI, then “it would be impossible to do it from [un programa de] text”. To generate a synthetic voice, you first need a large amount of data from the original. Interviews, press conferences, audio, all quality and well-defined, to train the AI. This kind of material is endless for politicians.

More data set, in this case audio, provide AI, the better the cloned sound. “You have to train the algorithm and Dataset by using this language to be able to produce perfect cloned voices. And you can achieve similar things… The results will only be as good depending on the amount of data you can put in data ingestion to train the voice.”


Digitally generated image of abstract sound waves scanned with a red laser on a blue background
Musicians, machines, and the future of sound are driven by artificial intelligence

The fear that computers will replace composers is real. But there are also those who have found ways to use generative AI creatively.


Then a speaker is required to pronounce, in the most natural way possible, the sentences he wishes to imitate, including fillers, breathing, slang, accent, and manner of speaking. If AI is trained well, imitation can reach levels approaching perfection. Ambient noise is not a challenge for software. Some include it so that the audio file does not arouse suspicion. So breathing, accents and noise are no longer a problem.

“With capital, investment, time and knowledge, you can do very good things. A perfect clone,” says Natalia, who has seen companies charge as little as $3,000 to train an AI. Once the simulation is achieved, the provider usually charges for sound per second.

If private people want to try it, there are conditions, “because it all depends on the amount of data they have”. data set, phrases, time you have and knowledge. It makes perfect sense for someone to do that. “If you have model training capabilities, you can do it.”

Among the evidence some people have put forward that shows the audio is real is the device AI Voice Detector, a free website where you input audio and return possible artificial sounds. However, this is not conclusive: “It’s not definitive, it’s just an aid. Detect probabilities, not certainties. They will never give you absolute certainty,” said Natalia.

Roderick Gilbert

"Entrepreneur. Internet fanatic. Certified zombie scholar. Friendly troublemaker. Bacon expert."

Leave a Reply

Your email address will not be published. Required fields are marked *