What if LLM agents passed KV-cache to each other instead of text? I tried it -- 73-78% token savings across Qwen, Llama, and DeepSeek
If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.
2-Minute Brief
- Affects widely-used AI models.
- If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.
- Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through.
- Open receipts to verify and go deeper.
8-Minute Deep Dive
Context
If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch. Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through. When I measured this across Qwen2.5, Llama 3.2, and DeepSeek-R1-Distill, 47-53% of all tokens in text mode turned out to be redundant re-processing. AVP (Agent Vector Protocol) is my attempt to fix this. Instead of passing t
For builders
If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.
Verify
Prefer primary announcements, papers, repos, and changelogs over reposts.