Context Engineering, Lessons from Building Manus

The article, “Context Engineering for AI Agents: Lessons from Building Manus ” by Yichao ‘Peak’ Ji, discusses the critical role of “context engineering” in building effective AI agents, drawing lessons from the development of the “Manus ” project.

I decided to summarise the article since the insight is truly worth reading and ideas worth spreading.

Here’s a breakdown of the key concepts and lessons presented:

The Rationale for Context Engineering

Shift from Fine-tuning to In-context Learning: The author emphasises a significant change in NLP, moving from time-consuming model fine-tuning (common in the BERT era) to leveraging the “in-context learning abilities” of frontier models like GPT-3 and Flan-T5. This allows for faster iterations and product development, crucial for fast-moving applications.
“Rising Tide” Analogy: Manus aims to be the “boat” that benefits from the “rising tide” of model progress, rather than a “pillar stuck to the seabed” (i.e., being tied to specific underlying models). This means focusing on how context is managed and presented to the AI, making the system adaptable to new, more powerful LLMs.
“Stochastic Graduate Descent”: The process of refining context engineering is described as a manual, experimental process involving “architecture searching, prompt fiddling, and empirical guesswork.”

Key Principles and Practices of Context Engineering

Design Around the KV-Cache (Prioritise KV-Cache Hit Rate):

What it is: The KV-cache (Key-Value cache) stores previously computed token representations, significantly reducing inference time (Time-To-First-Token or TTFT) and cost, especially for long contexts.
Why it’s crucial for agents: Agents operate in a loop where context grows, but outputs are short. This leads to a high input-to-output token ratio. Maximising KV-cache hits is vital for efficiency and cost savings (e.g., 10x cost difference with Claude Sonnet).

Practices for high hit rate

Stable Prompt Prefix: Avoid dynamic elements like precise timestamps at the beginning of system prompts, as even a single token difference invalidates the cache.
Append-Only Context: Don’t modify previous actions or observations. Ensure deterministic serialisation (e.g., stable JSON key ordering).
Explicit Cache Breakpoints: Manually insert breakpoints if the inference framework doesn’t handle incremental caching, ensuring they cover the system prompt.
Self-hosting considerations: Enable prefix/prompt caching and use session IDs for consistent routing.

Mask, Don’t Remove (for Action Space Management)

Problem: As agents gain more tools, the action space becomes complex, leading to poor action selection. Dynamically adding/removing tools causes KV-cache invalidation and confuses the model.
Solution: Instead of removing tools, Manus uses a “context-aware state machine” to mask token logits during decoding. This prevents or enforces the selection of certain actions based on the current state (e.g., forcing a reply, or only allowing browser tools).
Benefit: Maintains KV-cache effectiveness and prevents schema violations/hallucinations by keeping tool definitions stable in the context.

Use the File System as Context (for Long Contexts and Persistence)

Problem: Even large context windows (e.g., 128K tokens) are often insufficient for real-world agentic tasks, and model performance degrades with very long inputs. Long inputs are also expensive.
Solution: Treat the file system as the “ultimate context” — unlimited, persistent, and directly operable by the agent.
Mechanism: The model learns to read and write to files on demand. Compression strategies are restorable (e.g., dropping web page content but keeping the URL).
Implication: This externalises long-term memory, potentially paving the way for efficient “Agentic SSMs” (State Space Models) that don’t rely on full attention for long-range dependencies.

Manipulate Attention Through Recitation (Self-Correction/Focus)

Problem: In long, multi-step tasks, LLM-based agents can drift off-topic or forget earlier goals (“lost-in-the-middle” issues).
Solution: Manus creates and repeatedly updates a todo.md file, checking off completed items.
Mechanism: By constantly rewriting the to-do list, the agent “recites” its objectives into the end of the context, pushing the global plan into the model’s recent attention span. This biases the model’s focus without architectural changes.

Keep the Wrong Stuff In (for Error Recovery and Adaptation)

Problem: A natural impulse is to hide or remove errors from the context, but this prevents the model from learning.
Solution: Leave failed actions and their resulting observations/stack traces in the context.
Mechanism: Seeing failures implicitly updates the model’s internal beliefs, shifting its “prior” away from repeating similar mistakes.
Significance: Error recovery is a key indicator of true agentic behavior and is often overlooked in benchmarks.

Don’t Get Few-Shotted (Increase Diversity)

Problem: While few-shot prompting is useful, in agent systems, it can lead to the model “mimicking” repetitive patterns, causing drift, overgeneralisation, or hallucination.
Solution: Introduce small amounts of structured variation in actions, observations, serialisation templates, phrasing, or formatting.
Benefit: This “controlled randomness” breaks uniform patterns and tweaks the model’s attention, making the agent less brittle.

Conclusion

The author concludes that context engineering is an “emerging science” but is “essential” for agent systems. It directly impacts an agent’s speed, recovery capability, and scalability. The lessons shared are practical patterns discovered through real-world testing with Manus, emphasising that how context is shaped fundamentally defines an agent’s behaviour.