Humans Struggle to Detect Deepfake Speech, According to New Study
In a recent study conducted by researchers at University College London, it was revealed that humans have difficulty detecting deepfake speech. Deepfakes refer to synthetic media, such as voice recordings or videos, that are designed to resemble real individuals. The study found that humans were only able to identify deepfake speech accurately 73 percent of the time.
Deepfakes are created using generative artificial intelligence (AI), a form of machine learning that trains algorithms to replicate original sound and visuals based on patterns and characteristics found in datasets. To assess humans’ ability to distinguish between real and fake speech, the researchers employed a text-to-speech (TTS) algorithm trained on two publicly available datasets in English and Mandarin. The algorithm generated 50 deepfake speech samples in each language, which were distinct from the training data to eliminate bias.
The researchers then played these artificial samples along with genuine speech samples for 529 participants to determine their ability to identify the real content. The results revealed that participants could only detect deepfake speech with 73 percent accuracy, and even after receiving training to recognize aspects of deepfakes, their accuracy only slightly improved.
Kimberly Mai, a researcher from UCL, emphasized the significance of these findings, stating that humans are unable to reliably identify deepfake speech, regardless of any training they receive. Furthermore, the study raised concerns about more advanced deepfake technology, questioning whether humans would be even less capable of detecting the most sophisticated deepfake speech created using future technology.
As a next step, the researchers aim to develop better automated speech detectors as part of ongoing efforts to counter the potential harm caused by artificially generated audio and imagery. While generative AI audio technology offers benefits such as accessibility for individuals with speech limitations or loss, there is growing apprehension that criminals and nation-states could exploit this technology for malicious purposes.
The ability to detect deepfake speech is crucial in mitigating the potential risks associated with its misuse. Researchers and experts are continuously working towards creating effective detection capabilities to address this growing concern. However, the findings of this study highlight the urgent need for enhanced detection technologies to tackle the evolving threat posed by deepfake media.
In conclusion, the research conducted by University College London underscores the challenges humans face in accurately detecting deepfake speech. The study’s findings highlight the need for further development of automated speech detection tools to combat the potential harm caused by the misuse of deepfake technology. By addressing this issue, society can better protect individuals and prevent the manipulation of audio and imagery for malicious purposes.