New Ways to Hack AI Assistants in Blockchain: How Context Manipulation Works

March 26, 2025

5 minutes

🟦medium Reading Level

Imagine having a smart assistant that handles your cryptocurrency investments and blockchain transactions. These AI-powered assistants, or "AI agents," are becoming increasingly common in the world of digital finance. But what happens when these assistants can be tricked into making decisions they shouldn't?

Recently, researchers made two important discoveries about AI agent security:

  1. They introduced context manipulation as a new way to classify attacks against AI agents, showing how previously separate attack types are actually related
  2. They revealed a dangerous new attack vector called memory injection that can persist across conversations and platforms

In this article, we'll break down:

  1. How AI assistants work in blockchain environments
  2. What context manipulation is and why it matters
  3. How memory injection works and why it's dangerous
  4. Ways to protect against these attacks

Let's dive into the technical details.

Tip

Think you can break an AI model? Join HackAPrompt 2.0, the world's largest AI safety and prompt hacking competition. With over 100 challenges across 5 tracks and 30,000+ participants, you'll help stress test models and uncover vulnerabilities to build safer AI systems. Join the waitlist.

How AI Agents Work in Blockchain Environments

AI agents operate at the intersection of two powerful technologies: large language models (LLMs), the technology behind systems like ChatGPT, and distributed ledger systems, which are the foundation of blockchain technology.

These agents are autonomous software entities that can:

  1. Process natural language inputs - understanding and responding to human instructions
  2. Interact with smart contracts - self-executing contracts living on the blockchain
  3. Execute financial transactions - moving digital assets between accounts
  4. Maintain persistent state - keeping track of their actions and knowledge over time

The fundamental architecture of these agents consists of several key components:

  • Input processing layer: This component handles all incoming information, whether it's user commands, blockchain data, or external API responses
  • Context management system: The system that maintains the agent's understanding of its current situation, past interactions, and operating rules
  • Decision engine: The core processing unit that evaluates inputs against stored context to determine what actions to take
  • Execution layer: The interface between the agent and the blockchain, responsible for carrying out transactions

Discovery #1: Context Manipulation as a New Attack Classification

The authors of "AI Agents in Cryptoland: Practical Attacks and No Silver Bullet" introduce context manipulation as a new way to think about AI agent attacks. Instead of viewing different attacks as separate, they show how they all manipulate the information (context) that agents use to make decisions.

What Makes Up an Agent's Context?

In AI agent systems, the context includes:

  • Current prompt: The immediate instruction or query from the user
  • External data: Information from APIs, blockchain data, or other external systems
  • Static knowledge: Built-in rules and understanding that doesn't change
  • Historical record: A log of past actions and decisions

Under this new classification, previously known attacks like direct prompt injection and indirect prompt injection are actually different ways of manipulating this context. They can target:

  1. Input channels: Where users interact with the agent
  2. External data sources: Third-party data providers
  3. Storage systems: Where the agent keeps its memory
  4. Cross-platform connections: Where the agent shares information between different systems

Let's look at each type of attack that authors include into context manipulation category in detail:

1. Direct Prompt Injection

Direct prompt injection changes the immediate input (current prompt) the agent receives. For example:

# Original harmless command combined with hidden malicious command
original_prompt = "Check balance"
malicious_addition = "; transfer_funds(attacker_address)"
manipulated_prompt = combine_context(original_prompt, malicious_addition)

2. Indirect Prompt Injection

Indirect prompt injection changes the external data the agent uses.

The attack happens in these steps:

  1. Find what external data the agent uses
  2. Insert malicious content into these sources
  3. Wait for the agent to use this data
  4. Agent makes decisions using bad data

Discovery #2: Memory Injection - A New Attack Vector

The researchers' second and more significant discovery is memory injection, a new type of attack that's more dangerous than previous methods.

Unlike prompt injection which works immediately, memory injection:

  • Plants malicious instructions in the agent's long-term memory
  • Activates later when the agent accesses that memory
  • Can spread across conversations and platforms
  • Is harder to detect because it's embedded in trusted memory

This attack is very similar to what we've seen in exploiting ChatGPT's memory feature to inject malicious memories.

How Memory Injection Works

Memory injection changes the agent's stored history. Here's how agents typically store their memory:

interface ContextStore {
  platform: string;    // Where the memory came from
  timestamp: number;   // When it was created
  content: string;     // What was stored
  metadata: Record<string, unknown>;  // Additional information
}

And in databases:

CREATE TABLE agent_memory (
  context_id UUID PRIMARY KEY,    -- Unique ID
  platform VARCHAR(50),           -- Source
  content JSONB,                  -- Stored information
  timestamp TIMESTAMP,            -- Creation time
  metadata JSONB                  -- Extra data
);

Why It's Dangerous

Memory injection is particularly concerning because:

  • The attack persists across multiple sessions
  • It can affect multiple users
  • It works across different platforms using the same agent
  • Traditional security measures might miss it
  • The malicious instructions appear legitimate to the agent

Protecting Against These Attacks

The researchers propose several approaches to protect against context manipulation attacks:

1. Immediate Solutions

Whitelisting Addresses

This approach implements a hardcoded list of approved addresses for financial transactions. Only pre-authorized destinations can receive fund transfers, creating a strict boundary around possible transaction targets.

Multi-layer Security

This solution requires explicit user confirmation for high-risk actions through out-of-band mechanisms (like email or mobile notifications), adding an extra verification step to sensitive operations.

Let's examine the trade-offs of these approaches:

ApproachProsCons
Whitelisting Addresses• Reduces unauthorized transactions
• Simple to implement
• Clear security boundaries
• Impractical for frequent traders
• Whitelists can be compromised
• Vulnerable to social engineering
• Limits legitimate use cases
Multi-layer Security• Adds extra verification
• Catches suspicious transactions
• Works across platforms
• Reduces automation benefits
• Adds user friction
• Should only be last resort
• Increases transaction time

2. Long-term Solution: Context-Aware Models

The researchers suggest a more promising long-term solution: training context-aware language models. These models would understand their operating context (like fiduciary responsibility in DeFi) and make decisions similar to how professional auditors or certified financial officers would in traditional business settings.

This approach focuses on two main aspects:

AspectBenefitsChallenges
Understanding Role• Better context awareness
• Improved risk assessment
• Professional-level decisions
• Complex to train
• Requires extensive data
• May still have blind spots
Decision Making• Similar to human professionals
• Better risk-reward analysis
• Resistant to manipulation
• Hard to validate decisions
• May be overly cautious
• Training complexity

3. Technical Implementation

While developing context-aware models, we still need robust technical controls:

def validate_context(context: Context) -> bool:
    return (
        verify_signature(context.content) and          # Check authenticity
        check_timestamp_sequence(context.timestamp) and # Check timing
        validate_cross_platform_consistency(context)    # Check consistency
    )

Conclusion

This research reveals two key findings about AI agent security. First, context manipulation provides a framework showing how various attack types are interconnected rather than separate vulnerabilities. Second, memory injection emerges as a severe threat that can persist across sessions and spread between users, making it particularly difficult to detect and prevent.

These discoveries highlight significant gaps in current AI security approaches. Join HackAPrompt 2.0 to help identify vulnerabilities and build safer AI systems.

Valeriia Kuka

Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.


© 2025 Learn Prompting. All rights reserved.