Qwen3.5-24B-A3B-REAP-0.32: 32% Expert-Pruned for Agentic Coding (GGUF)

I forked CerebrasResearch/reap and added some custom patches for Qwen3.5 support, I have just released a REAPed version of Qwen3.5-35B-A3B focused on coding and agentic tasks.

Reddit LocalLLaMA · Mar 04, 2026 22:46 UTC · ~2 min + comments

Community

Community-submitted content. Signal comes from upvotes, not editorial vetting. Always check the linked source.

Key Takeaways

I forked CerebrasResearch/reap and added some custom patches for Qwen3.5 support, I have just released a REAPed version of Qwen3.5-35B-A3B focused on coding and agentic tasks.
I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this.

What It Means

Context

I forked CerebrasResearch/reap and added some custom patches for Qwen3.5 support, I have just released a REAPed version of Qwen3.5-35B-A3B focused on coding and agentic tasks. I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this. I've added the scripts i used to prune and quantize the model here. I'd recommend the [Qwen3.5-24B-A3B-REAP-0.32-IQ4KS.gguf](https://huggingface.co/sandeshrajx/Qwen3.5-24B-A3B-REAP-0.32-GGUF/blob/main/Qwen3.

For builders

I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this.

For Builders

I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this.

Read Original