Skip to content
PROVENANCE BRIEF
Community

Community-submitted content. Signal comes from upvotes, not editorial vetting. Always check the linked source.

What if LLM agents passed KV-cache to each other instead of text? I tried it -- 73-78% token savings across Qwen, Llama, and DeepSeek

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

2-Minute Brief
  • Affects widely-used AI models.
  • If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.
  • Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through.
  • Open receipts to verify and go deeper.
8-Minute Deep Dive

Context

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch. Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through. When I measured this across Qwen2.5, Llama 3.2, and DeepSeek-R1-Distill, 47-53% of all tokens in text mode turned out to be redundant re-processing. AVP (Agent Vector Protocol) is my attempt to fix this. Instead of passing t

For builders

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

Verify

Prefer primary announcements, papers, repos, and changelogs over reposts.

Read Original

What if LLM agents passed KV-cache to each other instead of text? I tried it -- 73-78% token savings across Qwen, Llama, and DeepSeek

TLDR

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

2-Minute Brief
  • Affects widely-used AI models.
  • If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.
  • Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through.
  • Open receipts to verify and go deeper.
8-Minute Deep Dive

Context

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch. Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through. When I measured this across Qwen2.5, Llama 3.2, and DeepSeek-R1-Distill, 47-53% of all tokens in text mode turned out to be redundant re-processing. AVP (Agent Vector Protocol) is my attempt to fix this. Instead of passing t

For builders

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

Verify

Prefer primary announcements, papers, repos, and changelogs over reposts.

Open
O open S save B back M mode