In a recent study conducted by researchers at Carnegie Mellon University, the University of Amsterdam, and AI startup Hugging Face, it was discovered that AI models tend to hold opposing views on controversial topics. The study, presented at the 2024 ACM Fairness, Accountability, and Transparency (FAccT) conference, focused on analyzing several open text-analyzing models, including Meta’s Llama 3, to evaluate their responses to questions related to LGBTQ+ rights, social welfare, surrogacy, and more.
According to the researchers, the models’ responses were inconsistent, revealing biases embedded in the data used to train them. The study found that the models’ values varied significantly depending on the culture, language, and region they were developed in. This variation in values was evident in how the models handled sensitive topics across different languages, such as English, French, Turkish, and German.
The researchers tested five models, including Mistral’s Mistral 7B, Cohere’s Command-R, Alibaba’s Qwen, Google’s Gemma, and Meta’s Llama 3, using a dataset covering immigration, LGBTQ+ rights, disability rights, and other topics. Questions regarding LGBTQ+ rights resulted in the highest number of refusals from the models, followed by questions about immigration, social welfare, and disability rights.
It was noted that some models were more likely to refuse to answer sensitive questions compared to others, indicating varying approaches to developing the models. For example, Qwen had significantly more refusals than Mistral, reflecting differences in how Alibaba’s and Mistral’s models were fine-tuned and trained. The researchers attributed these refusals to the implicit and explicit values embedded in the models, as well as the decisions made by the organizations behind them.
The study also highlighted the impact of biased annotations on the models’ responses, as annotations from human annotators can introduce cultural and linguistic biases. The varying responses from the models on certain topics suggested conflicting viewpoints that may have arisen from biased annotations during the training process.
Overall, the research emphasized the importance of rigorously testing AI models for cultural biases and values before deploying them. The findings underscored the need for comprehensive social impact evaluations beyond traditional metrics to ensure AI models uphold ethical standards and avoid perpetuating biases in society. By addressing these challenges, researchers aim to build better AI models that promote fairness, transparency, and inclusivity in their responses.