AI Chatbots Mistake Nonsense for Language: Can Flaws Unlock Secrets of Human Cognition?
Artificial intelligence (AI) chatbots have made impressive advancements in language understanding. However, a new study has revealed that these chatbots can sometimes misinterpret nonsense sentences, raising concerns about their role in critical decision-making and sparking exploration into the differences between AI and human cognition.
Researchers at Columbia University conducted a study to track how current language models, including ChatGPT, confuse nonsense sentences with meaningful ones. The team believes that these flaws in AI chatbots could actually provide valuable insights into improving their performance and understanding how humans process language.
Published in the journal Nature Machine Intelligence, the study challenged nine different language models by presenting them with pairs of sentences. Human participants were asked to identify which sentence they believed was more natural, or commonly encountered in everyday life. The researchers then compared the models’ ratings with the human judgments.
While more sophisticated AI systems based on transformer neural networks generally performed better than simpler recurrent neural network models and statistical models, all the models made mistakes. Some even selected sentences that sounded like gibberish to human ears.
Dr. Nikolaus Kriegeskorte, a principal investigator at Columbia’s Zuckerman Institute and coauthor of the paper, expressed that although the large language models perform well, there are still important aspects of language processing that they miss. The fact that even the best models can be fooled by nonsense sentences suggests that there is room for improvement in capturing the way humans understand language.
In one example from the study, human participants and the AI models were presented with a sentence pair. Humans judged the first sentence as more likely to be encountered, while one of the better models, BERT, rated the second sentence as more natural. GPT-2, another widely known model, correctly identified the first sentence as more natural, aligning with the human judgments.
The study’s senior author, Dr. Christopher Baldassano, an assistant professor of psychology at Columbia, highlighted that every model exhibited blind spots, labeling some sentences as meaningful that human participants considered gibberish. This raises concerns about relying on AI systems for important decisions, at least for now.
Dr. Kriegeskorte finds the imperfect yet impressive performance of many models intriguing. Understanding the gaps in performance and why some models outperform others could pave the way for advancements in language models.
The research team also wonders if studying the computations in AI chatbots could inspire new scientific questions and hypotheses that could help neuroscientists gain a better understanding of the human brain’s circuitry. By comparing the language understanding of these chatbots to our own, we could explore alternative ways of thinking about human cognition.
Future analysis of different chatbots, their algorithms, and their strengths and weaknesses could provide further insights into this matter.
Dr. Tal Golan, the corresponding author of the paper, expressed that the ultimate goal is to understand how people think. While AI tools are becoming more powerful, their language processing differs from ours. Comparing their language understanding to human understanding presents a fresh approach to studying cognition.
The study, titled Testing the limits of natural language models for predicting human language judgments, was published on September 14, 2023, in Nature Machine Intelligence.
In conclusion, the study’s findings highlight the susceptibility of AI chatbots to mistaking nonsense for language. Although there is still work to be done to improve their accuracy, these flaws could offer valuable insights into the human cognitive process. With further research, scientists hope to enhance chatbot performance and gain a deeper understanding of how our brains process language.