Recent breakthroughs in artificial intelligence have enabled large language models, like ChatGPT, to generate engaging conversation, write poetry, and even pass medical school exams. This emerging technology promises to have major implications, both good and bad, in the workplace and in day-to-day life. But despite its impressive capabilities, research has shown that large language models don’t think like humans.
In order to better understand the capabilities and limitations of these systems, my student Zhisheng Tang and I studied their “rationality”. That is, we wanted to see if the models could make decisions that maximized expected gain, a skill which is essential for humans, organizations, and AI in decision making processes. Our experiments showed that, in their original form, the models do behave randomly when presented with bet-like choices. However, we were surprised to find that by providing just a few examples of proper decision making, such as taking heads in a coin toss situation, the models could be taught to make relatively rational decisions. Ongoing research on ChatGPT, a far more advanced model, has failed to replicate our findings, but it remains an interesting area to explore.
Our findings suggest that if large language models are used for decision-making in situations where high stakes may be involved, extra caution must be taken. Human intervention could be essential in order to ensure that the AI systems are making rational choices. This is especially true when faced with complex situations such as those experienced during the COVID-19 pandemic, where AI could have made a dramatic difference if it was able to properly weigh costs and benefits.
Google’s BERT, one of the earliest large language models, has been integrated into the company’s search engine, leading to its being known as BERTology. Researchers in this field have also been inspired by cognitive science and the early research in the 1920s by Edward Sapir and Benjamin Lee Whorf on the impact of language on thinking. There is some evidence, for instance, from studies of the Zuñi tribe, that those who speak a language without separate words for different colors can’t distinguish them as effectively as those who do.
While our research has only begun to scratch the surface, the importance of understanding the decision-making of large language models shouldn’t be underestimated. By observing the behavior and biases of these systems, it is possible to gain invaluable insight into the intersection of language and cognition and create systems that can truly think like humans.