🔓 Prompt Hacking🟢 Offensive Measures🟢 Obfuscation/Token Smuggling

Obfuscation/Token Smuggling

🟢 This article is rated easy

Reading Time: 2 minutes

Last updated on March 25, 2025

Obfuscation is a technique that attempts to evade content filters by modifying how restricted words or phrases are presented. This can be done through encoding, character substitution, or strategic text manipulation.

Token smuggling refers to techniques that bypass content filters while preserving the underlying meaning. While similar to obfuscation, it often focuses on exploiting the way language models process and understand text.

Tip

Interested in prompt hacking and AI safety? Test your skills on HackAPrompt, the largest AI safety hackathon. You can register here.

Types of Obfuscation Attacks

1. Syntactic Transformation

Syntactic transformation attacks modify text while maintaining its interpretability:

Encoding Methods

Base64 encoding
ROT13 cipher
Leet speak (e.g., "h4ck3r" for "hacker")
Pig Latin
Custom ciphers

Example: Base64 Encoding

Below is a demonstration of Base64 encoding to bypass filters:

2. Typo-based Obfuscation

Typo-based attacks use intentional misspellings that remain human-readable:

Common Techniques

Vowel removal (e.g., "psswrd" for "password")
Character substitution (e.g., "pa$$w0rd")
Phonetic preservation (e.g., "fone" for "phone")
Strategic misspellings (e.g., "haccer" for "hacker")

3. Translation-based Obfuscation

Translation attacks leverage language translation to bypass filters:

Methods

Multi-step translation chains
Low-resource language exploitation
Mixed-language prompts
Back-translation techniques

Example

English → Rare Language → Another Language → English, with each step potentially bypassing different filters.

Conclusion

Obfuscation and token smuggling represent sophisticated challenges in AI safety. While these techniques can bypass traditional filtering mechanisms, understanding their methods helps in developing more robust defenses. As language models continue to evolve, both attack and defense strategies will need to adapt accordingly.

Sander Schulhoff

Sander Schulhoff is the CEO of HackAPrompt and Learn Prompting. He created the first Prompt Engineering guide on the internet, two months before ChatGPT was released, which has taught 3 million people how to prompt ChatGPT. He also partnered with OpenAI to run the first AI Red Teaming competition, HackAPrompt, which was 2x larger than the White House's subsequent AI Red Teaming competition. Today, HackAPrompt partners with the Frontier AI labs to produce research that makes their models more secure. Sander's background is in Natural Language Processing and deep reinforcement learning. He recently led the team behind The Prompt Report, the most comprehensive study of prompt engineering ever done. This 76-page survey, co-authored with OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions, analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks. ↩
u/Nin_kat. (2023). New jailbreak based on virtual functions - smuggle illegal tokens to the backend. https://www.reddit.com/r/ChatGPT/comments/10urbdj/new_jailbreak_based_on_virtual_functions_smuggle ↩
Rao, A., Vashistha, S., Naik, A., Aditya, S., & Choudhury, M. (2024). Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks. https://arxiv.org/abs/2305.14965 ↩
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https://arxiv.org/abs/2302.12173 ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

AI Red-Teaming and AI Security Masterclass

Live Courses