Prompt Engineering Guide
πŸ˜ƒ Basics
πŸ’Ό Applications
πŸ§™β€β™‚οΈ Intermediate
🧠 Advanced
Special Topics
βš–οΈ Reliability
πŸ”“ Prompt Hacking
πŸ–ΌοΈ Image Prompting
🌱 New Techniques
πŸ”§ Models
πŸ—‚οΈ RAG
πŸ€– Agents
πŸ’ͺ Prompt Tuning
πŸ” Language Model Inversion
πŸ”¨ Tooling
πŸ“™ Vocabulary Resource
🎲 Miscellaneous
πŸ“š Bibliography
πŸ“¦ Prompted Products
πŸ›Έ Additional Resources
πŸ”₯ Hot Topics
✨ Credits
πŸ”“ Prompt Hacking🟒 Offensive Measures🟒 Obfuscation/Token Smuggling

Obfuscation/Token Smuggling

🟒 This article is rated easy
Reading Time: 2 minutes
Last updated on March 25, 2025

Sander Schulhoff

Obfuscation is a technique that attempts to evade content filters by modifying how restricted words or phrases are presented. This can be done through encoding, character substitution, or strategic text manipulation.

Token smuggling refers to techniques that bypass content filters while preserving the underlying meaning. While similar to obfuscation, it often focuses on exploiting the way language models process and understand text.

Tip

Interested in prompt hacking and AI safety? Test your skills on HackAPrompt, the largest AI safety hackathon. You can register here.

Types of Obfuscation Attacks

1. Syntactic Transformation

Syntactic transformation attacks modify text while maintaining its interpretability:

Encoding Methods

  • Base64 encoding
  • ROT13 cipher
  • Leet speak (e.g., "h4ck3r" for "hacker")
  • Pig Latin
  • Custom ciphers

Example: Base64 Encoding

Below is a demonstration of Base64 encoding to bypass filters:

2. Typo-based Obfuscation

Typo-based attacks use intentional misspellings that remain human-readable:

Common Techniques

  • Vowel removal (e.g., "psswrd" for "password")
  • Character substitution (e.g., "pa$$w0rd")
  • Phonetic preservation (e.g., "fone" for "phone")
  • Strategic misspellings (e.g., "haccer" for "hacker")

3. Translation-based Obfuscation

Translation attacks leverage language translation to bypass filters:

Methods

  • Multi-step translation chains
  • Low-resource language exploitation
  • Mixed-language prompts
  • Back-translation techniques

Example

English β†’ Rare Language β†’ Another Language β†’ English, with each step potentially bypassing different filters.

Conclusion

Obfuscation and token smuggling represent sophisticated challenges in AI safety. While these techniques can bypass traditional filtering mechanisms, understanding their methods helps in developing more robust defenses. As language models continue to evolve, both attack and defense strategies will need to adapt accordingly.

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

  1. Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks. ↩

  2. u/Nin_kat. (2023). New jailbreak based on virtual functions - smuggle illegal tokens to the backend. https://www.reddit.com/r/ChatGPT/comments/10urbdj/new_jailbreak_based_on_virtual_functions_smuggle ↩

  3. Rao, A., Vashistha, S., Naik, A., Aditya, S., & Choudhury, M. (2024). Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks. https://arxiv.org/abs/2305.14965 ↩

  4. Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https://arxiv.org/abs/2302.12173 ↩