According to a recent study conducted by British scientists, it has become more difficult for people to determine whether it is a human or an artificial recording when conversing on the phone.
The group of researchers used a text-to-speech (TTS) algorithm which was trained on English and Mandarin datasets in order to synthesize 50 deepfake speech samples in each language.
The scientists gave computer-generated and human samples to 529 participants for them to listen to in order to find out whether they would be able to distinguish between human and machine speech. It turned out that 27 percent of the time, the participants mistook an artificial sample for a human recording.
Moreover, special training offered by the research team for the participants proved surprisingly ineffective, as the detection rate saw only a slight upsurge. The rates for English and Mandarin speakers were almost identical, while the participants pointed out different aspects which helped them distinguish between the nature of the recordings. For English ones, paying attention to breathing was the most helpful, while Mandarin speakers looked at cadence, pacing, and fluency.
“Our findings confirm that humans are unable to reliably detect deepfake speech, whether or not they have received training to help them spot artificial content,” study first author Kimberly Mai of UCL Computer Science, noted in a press release.
Although generative AI audio technology can be advantageous when dealing with life quality improvement for individuals with speech limitations, it could also be exploited by governments to influence citizens of other nations and criminals for malicious purposes, according to the scientists.
“With generative artificial intelligence technology getting more sophisticated and many of these tools openly available, we’re on the verge of seeing numerous benefits as well as risks. It would be prudent for governments and organizations to develop strategies to deal with abuse of these tools, certainly, but we should also recognize the positive possibilities that are on the horizon,” Professor Lewis Griffin, the senior author of the study, stressed.
In response to the threats posed by deepfake technology, the research team decided to work on automated speech detectors.