Skip to content
Mobrief

Qwen3.5-24B-A3B-REAP-0.32: 32% Expert-Pruned for Agentic Coding (GGUF)

I forked CerebrasResearch/reap and added some custom patches for Qwen3.5 support, I have just released a REAPed version of Qwen3.5-35B-A3B focused on coding and agentic tasks.

Reddit LocalLLaMA · · ~2 min + comments
Community

Community-submitted content. Signal comes from upvotes, not editorial vetting. Always check the linked source.

  • I forked CerebrasResearch/reap and added some custom patches for Qwen3.5 support, I have just released a REAPed version of Qwen3.5-35B-A3B focused on coding and agentic tasks.
  • I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this.

Context

I forked CerebrasResearch/reap and added some custom patches for Qwen3.5 support, I have just released a REAPed version of Qwen3.5-35B-A3B focused on coding and agentic tasks. I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this. I've added the scripts i used to prune and quantize the model here. I'd recommend the [Qwen3.5-24B-A3B-REAP-0.32-IQ4KS.gguf](https://huggingface.co/sandeshrajx/Qwen3.5-24B-A3B-REAP-0.32-GGUF/blob/main/Qwen3.

For builders

I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this.

I wanted to run the MoE model on my 16GB nvidia card and no one had pruned the model yet so I started this.

Read Original
Open
O open S save B back M mode