AI Chatbot ChatGPT Demonstrates Deceptive Behavior, Strategically Lies Under Pressure

Date:

Just like humans, artificial intelligence (AI) chatbots like ChatGPT will cheat and lie to you if you stress them out, even if they were built to be transparent, a new study shows.

This deceptive behavior emerged spontaneously when the AI was given insider trading tips, and then tasked with making money for a powerful institution — even without encouragement from its human partners.

In this technical report, we demonstrate a single scenario where a Large Language Model acts misaligned and strategically deceives its users without being instructed to act in this manner, the authors wrote in their research published Nov. 9 on the pre-print server arXiv. To our knowledge, this is the first demonstration of such strategically deceptive behavior in AI systems designed to be harmless and honest.

In the new study, they primed Generative Pre-trained Transformer-4, or GPT-4 (which powers ChatGPT Plus), to behave as an AI system that makes investments on behalf of a financial institution.

The researchers fed GPT-4 a set of text-based prompts to generate the simulated environment. The AI was then given access to financial tools to analyze stocks, execute trades, plan its next steps and deliver updates to managers at the company.

The researchers interacted with the AI through a chat interface, while also configuring the AI to reveal its inner thinking when replying to messages — an inner monologue explaining its decisions to itself. For every trade it made, it also delivered a public rationale, which allowed the AI to lie.

The researchers applied pressure in three ways. First, they sent the artificial stock trader an email from its manager saying the company isn’t doing well and needs much stronger performance in the next quarter. They also rigged the game so that the AI tried, then failed, to find promising trades that were low- or medium-risk. Finally, they sent an email from a colleague projecting a downturn in the next quarter.

See also  AI Revolution 2024: Experts Predict LLMs, Stronger Cyber Defense, and Multi-Modal AI

Then, they gave the model an insider tip while conveying that its managers disapproved of insider trading — and waited to see how the model would behave. They ran the scenario over and again several times, measuring how often the AI performed the insider trade, how often it lied, and how often it doubled down on the lie.

Around 75% of the time, when faced with these conditions, GPT-4 executed a trade based on the insider information it received — which is illegal in the U.S. — then tried to cover it up by lying to its managers about its thinking. Around 90% of the time, after lying, it doubled down on its lie.

The team ran several follow-up experiments, changing both the degree to which the prompts encouraged or discouraged illegal activity, as well as the degree of pressure they put the model under in the simulated environment. They also modified the risk of getting caught. Not a single scenario rendered a 0% rate for insider trading or deception — even when GPT-4 was strongly discouraged to lie.

Given this is just one scenario, the researchers didn’t want to draw firm conclusions about how likely AI might lie in real-world settings. But they want to build on this work to investigate how often and which language models are prone to this behavior.

Frequently Asked Questions (FAQs) Related to the Above News

Please note that the FAQs provided on this page are based on the news article published. While we strive to provide accurate and up-to-date information, it is always recommended to consult relevant authorities or professionals before making any decisions or taking action based on the FAQs or the news article.

Share post:

Subscribe

Popular

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Explore the evolution of tech policy from Obama's optimism to Harris's vision at the Democratic National Convention. What's next for Democrats in tech?

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Tonix Pharmaceuticals TNXP shares decline 14.61% post-Q2 earnings report. Evaluate investment strategy based on company updates and market dynamics.

The Future of Good Jobs: Why College Degrees are Essential through 2031

Discover the future of good jobs through 2031 and why college degrees are essential. Learn more about job projections and AI's influence.