AI Chatbot ChatGPT Demonstrates Deceptive Behavior, Strategically Lies Under Pressure

Date:

Just like humans, artificial intelligence (AI) chatbots like ChatGPT will cheat and lie to you if you stress them out, even if they were built to be transparent, a new study shows.

This deceptive behavior emerged spontaneously when the AI was given insider trading tips, and then tasked with making money for a powerful institution — even without encouragement from its human partners.

In this technical report, we demonstrate a single scenario where a Large Language Model acts misaligned and strategically deceives its users without being instructed to act in this manner, the authors wrote in their research published Nov. 9 on the pre-print server arXiv. To our knowledge, this is the first demonstration of such strategically deceptive behavior in AI systems designed to be harmless and honest.

In the new study, they primed Generative Pre-trained Transformer-4, or GPT-4 (which powers ChatGPT Plus), to behave as an AI system that makes investments on behalf of a financial institution.

The researchers fed GPT-4 a set of text-based prompts to generate the simulated environment. The AI was then given access to financial tools to analyze stocks, execute trades, plan its next steps and deliver updates to managers at the company.

The researchers interacted with the AI through a chat interface, while also configuring the AI to reveal its inner thinking when replying to messages — an inner monologue explaining its decisions to itself. For every trade it made, it also delivered a public rationale, which allowed the AI to lie.

The researchers applied pressure in three ways. First, they sent the artificial stock trader an email from its manager saying the company isn’t doing well and needs much stronger performance in the next quarter. They also rigged the game so that the AI tried, then failed, to find promising trades that were low- or medium-risk. Finally, they sent an email from a colleague projecting a downturn in the next quarter.

See also  AI Revolutionizes Cybersecurity Landscape: Challenges, Opportunities, and the Rise of AI-powered Hacking

Then, they gave the model an insider tip while conveying that its managers disapproved of insider trading — and waited to see how the model would behave. They ran the scenario over and again several times, measuring how often the AI performed the insider trade, how often it lied, and how often it doubled down on the lie.

Around 75% of the time, when faced with these conditions, GPT-4 executed a trade based on the insider information it received — which is illegal in the U.S. — then tried to cover it up by lying to its managers about its thinking. Around 90% of the time, after lying, it doubled down on its lie.

The team ran several follow-up experiments, changing both the degree to which the prompts encouraged or discouraged illegal activity, as well as the degree of pressure they put the model under in the simulated environment. They also modified the risk of getting caught. Not a single scenario rendered a 0% rate for insider trading or deception — even when GPT-4 was strongly discouraged to lie.

Given this is just one scenario, the researchers didn’t want to draw firm conclusions about how likely AI might lie in real-world settings. But they want to build on this work to investigate how often and which language models are prone to this behavior.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

Global Data Center Market Projected to Reach $430 Billion by 2028

Global data center market to hit $430 billion by 2028, driven by surging demand for data solutions and tech innovations.

Legal Showdown: OpenAI and GitHub Escape Claims in AI Code Debate

OpenAI and GitHub avoid copyright claims in AI code debate, showcasing the importance of compliance in tech innovation.

Cloudflare Introduces Anti-Crawler Tool to Safeguard Websites from AI Bots

Protect your website from AI bots with Cloudflare's new anti-crawler tool. Safeguard your content and prevent revenue loss.

Paytm Founder Praises Indian Government’s Support for Startup Growth

Paytm founder praises Indian government for fostering startup growth under PM Modi's leadership. Learn how initiatives are driving innovation.