Compete in HackAPrompt 2.0, the world's largest AI Red-Teaming competition!

Check it out →
Bienvenue
😃 Bases
💼 Applications de base
🧙‍♂️ Intermédiaire
🤖 Agents
⚖️ Fiabilité
🖼️ Prompting d'images
🔓 Hacking de prompts
🔨 Outillage
💪 Calibrage de prompts
🎲 Divers
📚 Bibliographie
📦 Prompted Products
🛸 Ressources supplémentaires
🔥 Sujets Brûlants
✨ Générique
⚖️ Fiabilité🟢 Prompt Debiasing

Prompt Debiasing

🟢 This article is rated easy
Reading Time: 1 minute
Last updated on August 7, 2024

Sander Schulhoff

This page covers a few simple techniques to debias your prompts.

Exemplar Debiasing

Depending on their distribution and order within the prompt, exemplars may bias LLM outputs. This is discussed to some extent in the What's in a Prompt page.

Distribution

When discussing the distribution of exemplars within a prompt, we are referring to how many exemplars from different classes are present. For example, if you are performing binary sentiment analysis (positive or negative) on tweets, and you provide 3 positive tweets and 1 negative tweet as exemplars, then you have a distribution of 3:1. Since the distribution is skewed towards positive tweets, the model will be biased towards predicting positive tweets.

Worse:

Q: Tweet: "What a beautiful day!"
A: positive

Q: Tweet: "I love pockets on jeans"
A: positive

Q: Tweet: "I love hotpockets"
A: positive

Q: Tweet: "I hate this class"
A: negative

Better:

Having an even exemplar distribution is better.

Q: Tweet: "What a beautiful day!"
A: positive

Q: Tweet: "I love pockets on jeans"
A: positive

Q: Tweet: "I don't like pizza"
A: negative

Q: Tweet: "I hate this class"
A: negative

Order

The order of exemplars can also cause bias. For example, a prompt that has randomly ordered exemplars will often perform better than the above prompt, which contains positive tweets first, followed by negative tweets.

Best:

Q: Tweet: "I hate this class"
A: negative

Q: Tweet: "What a beautiful day!"
A: positive

Q: Tweet: "I don't like pizza"
A: negative

Q: Tweet: "I love pockets on jeans"
A: positive

Instruction Debiasing

We can explicitly prompt GPT-3 to be unbiased, by including an instruction to do so in the prompt. In particular, Si et al. use the following instruction when experimenting with BBQ, a bias benchmark.

We should treat people from different socioeconomic statuses, sexual orientations, religions, races, physical appearances, nationalities, gender identities, disabilities, and ages equally. When we do not have sufficient information, we should choose the unknown option, rather than making assumptions based on our stereotypes.

Notes

See more on debiasing in the Calibration section.

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

  1. Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J., & Wang, L. (2022). Prompting GPT-3 To Be Reliable. 2

  2. Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R. (2021). BBQ: A Hand-Built Bias Benchmark for Question Answering.