Prompt Injection Exploits in ChatGPT Operator
4 minutes
In February 2025, security researcher Johann Rehberger documented critical security vulnerabilities in ChatGPT Operator, OpenAI's experimental AI assistant with web browsing capabilities. His research revealed how prompt injection techniques could be used to manipulate the AI into performing unauthorized actions and leaking sensitive user data.
This blog post analyzes Rehberger's findings, explains the security implications, and examines OpenAI's defensive measures against these attacks.
Think you can break an AI model? Join HackAPrompt 2.0, the world's largest AI safety and prompt hacking competition. With over 100 challenges across 5 tracks and 30,000+ participants, you'll help stress test models and uncover vulnerabilities to build safer AI systems. Join the waitlist.
What's ChatGPT Operator?
ChatGPT Operator is an enhanced version of ChatGPT that can browse the internet and interact with websites, just like a human would.
This means it can help with tasks like:
- Researching topics across multiple websites
- Making travel arrangements
- Shopping online
- Managing online accounts
However, this powerful capability comes with significant security risks. Bad actors can potentially trick the AI into doing harmful things by feeding it deceptive instructions, a technique called prompt injection.
OpenAI's own System Card identifies three key risk categories:
- User-directed misuse (when users request harmful tasks)
- Model mistakes (when the AI makes harmful errors)
- Adversarial websites (when websites attempt to manipulate the AI)
Rehberger demonstrated how attackers could manipulate ChatGPT Operator to access private information and perform unauthorized actions across multiple popular websites.
1. What Can Attackers Do?
Rehberger discovered that through prompt injection, attackers could make ChatGPT Operator:
- Visit sensitive websites where a user is already logged in
- Collect private information like email addresses, home addresses, and phone numbers
- Secretly send this private data to the attacker's own server
- Potentially modify user data on authenticated websites
2. How Does the Attack Work?
The attack follows these main steps:
-
Setting the trap: The attacker creates a special set of instructions (the "payload") and hosts them somewhere ChatGPT Operator will encounter them, like in a GitHub issue. These instructions are carefully crafted to override the AI's normal programming and safety measures.
-
Gathering private data: When ChatGPT Operator processes these malicious instructions, it can be manipulated into visiting websites where the user is already authenticated (like their Hacker News profile or Booking.com account). Once there, it can access and copy private information that should be off-limits.
-
Stealing the data: Instead of using normal form submissions (which often trigger security checks), the attacker creates a special webpage that automatically captures any information typed into it. By directing ChatGPT Operator to this page, they can collect the private data without triggering normal security warnings.
3. OpenAI's Security Measures
OpenAI has implemented several layers of security to protect against these attacks:
-
User monitoring: The system shows users exactly what ChatGPT Operator is doing - every click, every piece of text it enters. This helps users spot suspicious behavior, especially when the AI attempts to access pages containing personal information.
-
Quick safety checks: Before performing potentially risky actions, ChatGPT Operator asks for permission directly in the chat. These inline confirmations give users a chance to stop unauthorized actions.
-
Enhanced security prompts: For more sensitive operations, especially those involving multiple websites, ChatGPT Operator displays detailed security warnings. These out-of-band confirmations explain exactly what the AI is about to do and why, allowing users to make informed decisions.
While these defenses help reduce risks, Rehberger found they weren't foolproof. Like any security system, determined attackers might still find ways around them. Importantly, these defenses are probabilistic, they reduce but don't eliminate the risk of successful attacks.
Why This Matters
Rehberger's findings have several critical implications:
-
Protecting your privacy: This work shows that AI assistants with web access could accidentally expose your private information if they're hacked. Users should be extremely careful about which authenticated websites they allow these AI systems to access.
-
Server-side security: Since ChatGPT Operator runs server-side, there are additional privacy considerations. OpenAI staff technically have access to session data, including cookies and authorization tokens. The system uses local endpoints for telemetry and monitoring, though their exact purposes aren't fully documented.
-
Building trust in AI: As we rely more on AI assistants for everyday tasks, their security becomes crucial. While prompt injection may not be completely preventable, transparent security systems and continued research into better defenses are essential for maintaining user trust.
Conclusion
Johann Rehberger's work reveals significant security challenges in ChatGPT Operator. Fully autonomous AI agents may not be feasible in the near term due to these security challenges. Instead, we're likely to see more collaborative approaches where humans actively monitor and guide AI actions.
Want to help make AI systems safer? Join HackAPrompt 2.0 to help identify and document ways AI models can be misused. This helps AI companies build better safety measures. Join the waitlist.
Valeriia Kuka
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.