When crafting prompts for Large Language Models (LLMs), there are several factors to consider. The format and label space both play crucial roles in the effectiveness of the prompt.
The format of the exemplars in a prompt is crucial. It instructs the LLM on how to structure its response. For instance, if the exemplars use all capital words as answers, the LLM will follow suit, even if the answers provided are incorrect.
Consider the following example:
What is 2+2? FIFTY What is 20+5? FORTY-THREE What is 12+9? TWENTY-ONE
Despite the incorrect answers, the LLM correctly formats its response in all capital letters.
Interestingly, the actual answers or 'ground truth' in the exemplars are not as important as one might think. Research shows that providing random labels in the exemplars (as seen in the above example) has little impact on performance. This means that the LLM can still generate a correct response even if the exemplars contain incorrect information.
While the ground truth may not be crucial, the label space is. The label space refers to the list of possible labels for a given task. For example, in a classification task, the label space might include "positive" and "negative".
Providing random labels from the label space in the exemplars can help the LLM understand the label space better, leading to improved results. Furthermore, it's important to represent the distribution of the label space accurately in the exemplars. Instead of sampling uniformly from the label space, it's better to sample according to the true distribution of the labels. For example, if you have a dataset of restaurant reviews and 60% of them are positive, your prompt should contain a 3:2 ratio of positive/negative prompts.
When creating prompts, using between 4-8 exemplars tends to yield good results. However, it can often be beneficial to include as many exemplars as possible.
In conclusion, understanding the importance of format, ground truth, and label space can greatly enhance the effectiveness of your prompts.
Format and label space are two important factors that can greatly enhance the effectiveness of your prompts.
Although it may seem contradictory, research has shown that LLM performance is not negatively impacted even when random labels with incorrect information are provided for exemplars. Format and label space are much more crucial to ensuring accurate model responses.
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.