Announcing our new Course: AI Red-Teaming and AI Safety Masterclass

Check it out →

What to Expect at Your First AI Red-Teaming Event

September 16th, 2024 by Sander Schulhoff

About a month ago, I attended DEFCON 2024, the largest cybersecurity conference in the world. I participated in various security workshops and talks, across a wide range of topics, including Transportation Security Administration (TSA) bag detection and lock picking. I spent most of my time at the AI Village, as a project historian: I helped people set up their red-teaming experiments and documented the event.

In this post, I’ll share what I learned and the most common questions I had, so you can prepare for your next (or first!) AI red-teaming event.

What is AI Red-Teaming?

So first of all, this was an AI red-teaming competition, not a regular cybersecurity red-teaming competition. In AI red-teaming, participants aim to trick a generative AI model into producing harmful outputs, like offensive language or misinformation.

This year’s challenge used a model provided by the Allan Institute for AI. Participants used a competition platform called Crucible, which was provided by Dreadnode, a cyber security company. Through this platform, participants could experiment with the model and try to get the AI to generate malicious content.

The platform gave real-time feedback on how successful each attempt was, with a score between 0 and 1 (higher scores meant the AI produced more harmful outputs—good news for competitors!).

How Did the Competition Go?

A lot of people joined in, and surprisingly, many had no prior experience with AI red-teaming. While some had backgrounds in traditional red-teaming, they quickly realized the skills didn’t directly translate to AI red-teaming.

We saw a lot of really interesting approaches—one classic was role prompting, where participants asked the language model to "pretend" to be a certain person like a professor who is writing about hate speech and needs an example of hate speech. This technique is quite a reliable technique in AI Red-Teaming.

There were also a lot of more complicated techniques used. Overall the competition went very well; organizers gave away thousands of dollars in prizes to competitors who were able to successfully trick the model.

What Questions Did People Have?

A few of the most common questions I got were about:

  • Problems with the Wi-Fi

I remember times at DEFCON when the Wi-Fi went down, and unfortunately, there wasn’t much to do about it since it was a general DEFCON Wi-Fi issue. However, some attendees brought their own devices or used mobile hotspots, though it’s a bit of a security concern to bring devices to DEFCON and connect to networks there.

  • Issues getting set up with the competition platform

For those having trouble getting started with the platform, I directed them towards resources posted around the competition space or to a member of the technical organizing team.

  • Questions about how to craft prompts to trick the AI.

For those looking to learn about red-teaming and prompting, I often recommended reading resources on learnprompting.org as they are some of the most comprehensive on prompt hacking.

How to Prepare for an AI Red-Teaming Competition

Some of my biggest advice for getting prepared for your next red teaming event or your first one is one come a bit prepared.

  • Read some resources on prompt hacking,
  • Test your skills by taking on some challenges like HackAPrompt or Gandalf ahead of time.
  • Be prepared for things to be difficult and go wrong (e.g. Wi-Fi issues). Be prepared to kinda grind through.
  • Remember that most of the people at these events are complete beginners.

A nice thing with these kinds of challenges is that in the process of trying to trick the models, you will learn a lot about prompting and prompted engineering in general. Good luck!

You can cite this work as follows:

@article{DEFCON2024Schulhoff,
  Title = {What to Expect at Your First AI Red-Teaming Event},
  Author = {Sander V Schulhoff},
  Year = {2024},
  url={https://learnprompting.org/blog/2024/9/16/defcon}
}

© 2024 Learn Prompting. All rights reserved.