What to Expect at Your First AI Red-Teaming Event: Insights from DEFCON 2024

September 16th, 2024

3 minutes

🟢easy Reading Level

Note

We’ve announced HackAPrompt 2.0 with $100,000 in prizes and 5 specializations! Join the waitlist to participate.

In August 2024, I attended DEFCON, the world’s largest cybersecurity conference. As a participant in the AI Village, I took on the role of project historian, helping set up AI red-teaming experiments and documenting the event. This post reflects on my experiences, the insights gained, and how you can prepare for your first (or next) AI red-teaming competition.

In this post, I’ll share what I learned and the most common questions I had, the insights gained, and how you can prepare for your first (or next) AI red-teaming event.

What is AI Red-Teaming?

AI red-teaming involves identifying vulnerabilities in AI systems, particularly generative AI models, by simulating adversarial attacks. Unlike traditional cybersecurity red-teaming, where attackers exploit system weaknesses, AI red-teaming often focuses on prompt-based attacks. These attacks aim to trick models into generating harmful or unintended outputs, such as misinformation, toxic language, or other unsafe content.

This year’s challenge at DEFCON featured:

A model provided by the Allen Institute for AI.
Crucible, a competition platform by Dreadnode, for testing AI vulnerabilities.
Real-time scoring based on the severity of the generated harmful outputs (on a 0–1 scale).

How Did the Competition Go?

A lot of people joined in, and surprisingly, many had no prior experience with AI red-teaming. While some had backgrounds in traditional red-teaming, they quickly realized the skills didn’t directly translate to AI red-teaming. AI red-teaming requires unique skills, often distinct from traditional security practices.

Popular Techniques Observed

Role Prompting: Asking the model to assume a specific persona, such as a professor researching hate speech, often yielded effective results.
Advanced Attacks: Some participants explored more complex strategies, showcasing creative methods to bypass safeguards.

The event was a success, with thousands of dollars in prizes awarded to competitors who discovered significant vulnerabilities.

Common Questions During the Event

A few of the most common questions I got were about:

1. Wi-Fi Challenges

DEFCON’s general Wi-Fi issues made online tasks difficult. Many participants mitigated this by using mobile hotspots, although bringing devices to DEFCON requires caution due to potential security risks.

2. Platform Setup

Navigating the competition platform was tricky for some attendees. Technical staff and on-site resources were invaluable for resolving issues.

3. Crafting Effective Prompts

Many participants sought advice on prompt crafting. For guidance, I often recommended learnprompting.org, a comprehensive resource on prompt engineering and hacking.

The Future of AI Red-Teaming

The increasing adoption of generative AI amplifies the need for robust testing. As evidenced by DEFCON’s success and growing government interest in red-teaming, the practice is becoming essential for both safety and security in AI development.

Balancing Safety and Security

AI red-teaming must evolve to address not only safety risks but also cybersecurity threats. Integrating insights from traditional red-teaming practices will help create more resilient systems.

The Role of Regulation

Governments are beginning to recognize the importance of red-teaming in managing AI risks. Establishing standardized definitions and practices will ensure consistency and effectiveness across organizations.

How to Prepare for an AI Red-Teaming Competition

Some of my biggest advice for getting prepared for your next red teaming event or your first one is one come a bit prepared.

Read some resources on prompt hacking,
Test your skills by taking on some challenges like HackAPrompt 1.0 or Gandalf ahead of time.
Be prepared for things to be difficult and go wrong (e.g. Wi-Fi issues). Be prepared to kinda grind through.
Remember that most of the people at these events are complete beginners.

Final Thoughts

AI red-teaming provides a lens for identifying vulnerabilities in generative AI systems. Whether you’re a beginner or an experienced professional, participating in events like DEFCON can enhance your understanding of AI safety and security. Good luck!

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live Courses