Skip to content
PROVENANCE BRIEF
PROVENANCE BRIEF
Research 5h ago

Even with newer models like GPT-5.2 and Claude 4.6, AI chatbots still…Even frontier LLMs from GPT-5 onward lose up to 33% accuracy when you chat too long

Even with newer models like GPT-5.2 and Claude 4.6, AI chatbots still give worse answers the longer a conversation goes on.

Even with newer models like GPT-5.2 and Claude 4.6, AI chatbots still give worse answers the longer a conversation goes on.

Why it matters

Affects widely-used AI models.

The Decoder
Research 2h ago

Industry expectations in Machine Learning Engineers in 2026Industry expectations in Machine Learning Engineers in 2026

Reddit MachineLearning: Industry expectations in Machine Learning Engineers in 2026

Find the core claim, method, and released artifacts.

Why it matters

Part of the evolving AI landscape.

Reddit MachineLearning
Research just now

1) #2 Gemini-2.5-Pro-Preview-05-06 (Score: 2) #3 GLM-4.5 (Score: 2)…Chatbot Arena Elo Rankings — Top 20 Models

1 Gemini-2.5-Pro (Score: 1) #2 Gemini-2.5-Pro-Preview-05-06 (Score: 2) #3 GLM-4.5 (Score: 2) #4 Grok-4-0709 (Score: 2) #5 ChatGPT-4o-latest (2025-03-26) (Score: 3) #6 o3-2025-04-16 (Score: 3) #7…

1 Gemini-2.5-Pro (Score: 1) #2 Gemini-2.5-Pro-Preview-05-06 (Score: 2) #3 GLM-4.5 (Score: 2) #4 Grok-4-0709 (Score: 2) #5 ChatGPT-4o-latest (2025-03-26) (Score: 3) #6 o3-2025-04-16 (Score: 3) #7…

Why it matters

Affects widely-used AI models.

LMArena Elo Rankings
Community 4h ago

There's been a lot of buzz about Qwen3.5 models being smarter than…Qwen3.5 35B-A3B replaced my 2-model agentic setup on M1 64GB

There's been a lot of buzz about Qwen3.5 models being smarter than all previous open-source models in the same size…

There's been a lot of buzz about Qwen3.5 models being smarter than all previous open-source models in the same size…

Reddit LocalLLaMA
Community just now

My frends trained and benchmarked 4 diffusion model versions…My frends trained and benchmarked 4 diffusion model versions entirely on an RTX 2050 (4GB VRAM) — the 17.8M model beat the 143.8M one

Reddit LocalLLaMA: My frends trained and benchmarked 4 diffusion model versions entirely on an RTX 2050 (4GB VRAM) —…

Reddit LocalLLaMA: My frends trained and benchmarked 4 diffusion model versions entirely on an RTX 2050 (4GB VRAM) —…

Reddit LocalLLaMA
Community 5h ago

If you've used multi-agent setups with LangChain, CrewAI, AutoGen,…What if LLM agents passed KV-cache to each other instead of text? I tried it -- 73-78% token savings across Qwen, Llama, and DeepSeek

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent…

If you've used multi-agent setups with LangChain, CrewAI, AutoGen, or Swarm, you've probably noticed: every agent…

Reddit LocalLLaMA
THE WIRE
Product 8h ago

In preparation for an XPU-specific backend for scaledmmv2 , move…: Factor out scaled_mm algo checks to non-CUDA ()

Summary: In preparation for an XPU-specific backend for scaledmmv2 , move some helpful…

Summary: In preparation for an XPU-specific backend for scaledmmv2 , move some helpful…

PyTorch Releases
Labs 4h ago

Feb 28 , 18:34 UTC Resolved - Between 9:50 PT / 17:50…Elevated errors on Claude Opus 4.6

Feb 28 , 18:34 UTC Resolved - Between 9:50 PT / 17:50 UTC and 10:12 PT / 18:12 UTC we…

Feb 28 , 18:34 UTC Resolved - Between 9:50 PT / 17:50 UTC and 10:12 PT / 18:12 UTC we…

Anthropic Status
Product 3h ago

Support for dict attribute is a little inconsistent in Dynamo: Support dict in NestedUserFunctionVariable ()

Support for dict attribute is a little inconsistent in Dynamo.

Support for dict attribute is a little inconsistent in Dynamo.

PyTorch Releases
Labs 7h ago

Feb 28 , 15:50 UTC Resolved - This incident has been resolvedElevated errors on claude.ai

Feb 28 , 15:50 UTC Resolved - This incident has been resolved.

Feb 28 , 15:50 UTC Resolved - This incident has been resolved.

Anthropic Status
Research 5h ago

Really interesting projectTiny transformers (<100 params) can add two 10-digit numbers to 100% accuracy

Really interesting project.

Really interesting project.

Reddit MachineLearning
Browse all stories
/ Search M Mode T Theme