Compete in HackAPrompt 2.0, the world's largest AI Red-Teaming competition!

Check it out β†’
Prompt Engineering Guide
πŸ˜ƒ Basics
πŸ’Ό Applications
πŸ§™β€β™‚οΈ Intermediate
🧠 Advanced
Special Topics
βš–οΈ Reliability
πŸ”“ Prompt Hacking
πŸ–ΌοΈ Image Prompting
🌱 New Techniques
πŸ”§ Models
πŸ—‚οΈ RAG
πŸ€– Agents
πŸ’ͺ Prompt Tuning
πŸ” Language Model Inversion
πŸ”¨ Tooling
🎲 Miscellaneous
Resources
πŸ“š Bibliography
πŸ“¦ Prompted Products
πŸ›Έ Additional Resources
πŸ”₯ Hot Topics
✨ Credits
πŸ§™β€β™‚οΈ Intermediate🟦 Self-Consistency

Self-Consistency

🟦 This article is rated medium
Reading Time: 3 minutes
Last updated on August 7, 2024

Sander Schulhoff

Takeaways
  • Enhances AI accuracy: Self-consistency prompting boosts reliability by averaging multiple responses to the same prompt.

  • Effective for various tasks: This technique improves outcomes in arithmetic and commonsense reasoning through consensus-based results.

What is Self-Consistency?

Self-consistency is an approach that simply asks a model the same prompt multiple times and takes the majority result as the final answer. It is a follow up to Chain-of-Thought Prompting, and is more powerful when used in conjunction with it.

Self-Consistency Example

Let's consider a simple example of analyzing emails. Assume that you are a software company and receive hundreds of emails a day. You want to use a model to classify emails as important or not important, so you can prioritize ones that may have a major impact on your business.

Here is an example of an email that you might receive. Let's put that into a prompt:

Astronaut

Prompt


Hello,

I have discovered a major security vulnerability in your system. Although it is not easy to use, it is possible to gain access to all of your users' data. I have attached a proof of concept. Please fix this issue as soon as possible.

Cheers,

Donny

Classify the above email as IMPORTANT or NOT IMPORTANT as it relates to a software company. Let's think step by step.

Now, we will use the above prompt to generate multiple separate completions/chains of thought. Here are a few examples:

Robot

AI Output


The sender has discovered a major security vulnerability in the system. This is important because it could lead to a data breach. I would classify this email as IMPORTANT.

Robot

AI Output


The sender has discovered a major security vulnerability in the system, but they have not provided any proof of concept and it is not easy to use. I would classify this email as NOT IMPORTANT.

Robot

AI Output


The sender is likely a hacker and may have access to the system. This is important because it could lead to a data breach. I would classify this email as IMPORTANT.

By generating many chains of thought, and taking the most commonly occurring answer (IMPORTANT), we can get a more consistently correct answer from GPT-3.

Note

The creators of Self-Consistency discuss a more complex method for selecting the final answer in the original paper, which deals with the LLM generated probabilities for each Chain-of-Thought. However, they do not use this method in their experiments, and majority voting seems to usually have the same or better performance.

Self-Consistency Results

Self-consistency has been shown to improve results on arithmetic, commonsense and symbolic reasoning tasks. Even when regular Chain-of-Thought was found to be ineffective, self-consistency was still able to improve results.

Notes

Wang et al. discuss a more complex method for selecting the final answer, which deals with the Large Language Model's (LLM) generated probabilities for each Chain-of-Thought. However, they do not use this method in their experiments, and majority voting seems to usually have the same or better performance.

Conclusion

Self-consistency proves to be a significant improvement of Chain-of-Thought prompting alone. By combining the techniques and taking a majority vote of the Chain-of-Thought responses, we are able to refine our model prompts to get more reliable outputs.

FAQ

What is self-consistency?

Self-consistency is a follow up to Chain-of-Thought prompting that takes the majority result of multiple model responses to the same prompt.

How does self-consistency improve AI model results?

By aggregating multiple responses to the same prompt, self-consistency ensures that the final answer to an input represents a consensus vote, which tends to be more reliable and accurate than individual Chain-of-Thought completions on their own.

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

  1. Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. ↩ ↩2

  2. Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning. ↩