Skip to content
Mobrief

Comparing OAI 120B OSS, Qwen 3.5, and Gemini 3.0 Flash with LLM Multi-Agent Avalon

The author has been running a multi-agent test for the social deduction game Avalon.

Reddit LocalLLaMA · · ~2 min + comments
Community

Community-submitted content. Signal comes from upvotes, not editorial vetting. Always check the linked source.

  • This tests context tracking, hidden intentions, and theory of mind.
  • Here is a breakdown of how different models handled the gameplay.
  • System Architecture Notes: * Structured Non-Native CoT: The test prompts all models to generate a JSON response before taking action or speaking publicly.

Context

This tests context tracking, hidden intentions, and theory of mind. Here is a breakdown of how different models handled the gameplay. System Architecture Notes: * Structured Non-Native CoT: The test prompts all models to generate a JSON response before taking action or speaking publicly. Instead of a single reasoning field, it forces a structured breakdown across 4 specific fields: self_check (persona verifica

For builders

This tests context tracking, hidden intentions, and theory of mind.

This tests context tracking, hidden intentions, and theory of mind.

Read Original
Open
O open S save B back M mode