Community

Community-submitted content. Signal comes from upvotes, not editorial vetting. Always check the linked source.

What if LLM agents passed KV-cache to each other instead of text? I tried it -- 73-78% token savings across Qwen, Llama, and DeepSeek

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

Reddit LocalLLaMA · Feb 28, 2026 17:10 UTC · ~2 min + comments

2-Minute Brief

Affects widely-used AI models.
If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.
Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through.
Open receipts to verify and go deeper.

8-Minute Deep Dive

Context

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch. Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through. When I measured this across Qwen2.5, Llama 3.2, and DeepSeek-R1-Distill, 47-53% of all tokens in text mode turned out to be redundant re-processing. AVP (Agent Vector Protocol) is my attempt to fix this. Instead of passing t

For builders

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

Verify

Prefer primary announcements, papers, repos, and changelogs over reposts.

Read Original

What if LLM agents passed KV-cache to each other instead of text? I tried it -- 73-78% token savings across Qwen, Llama, and DeepSeek

TLDR

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

2-Minute Brief

Affects widely-used AI models.
If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.
Agent 3 in a 4-agent chain is re-reading everything agents 1 and 2 already chewed through.
Open receipts to verify and go deeper.

8-Minute Deep Dive

Context

For builders

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent re-tokenizes and re-processes the full conversation from scratch.

Verify

Prefer primary announcements, papers, repos, and changelogs over reposts.

Open

O open S save B back M mode