Primary Source

Official announcement from Microsoft. These are their claims—they have marketing incentives.

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

At a glance Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs. It is a broadly capable model...

Microsoft Research · Mar 04, 2026 18:05 UTC · ~4 min read

2-Minute Brief

According to Microsoft Research: At a glance Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs. It is a broadly capable model that allows for natural interaction for a wide array of vision-language tasks and excels at math and science reasoning and understanding user-interfaces. We share lessons learned and best practices for training a multimodal reasoning model—showing the benefit of careful architecture choices, rigorous

Read Original

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

TLDR

2-Minute Brief

According to Microsoft Research: At a glance Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs. It is a broadly capable model that allows for natural interaction for a wide array of vision-language tasks and excels at math and science reasoning and understanding user-interfaces. We share lessons learned and best practices for training a multimodal reasoning model—showing the benefit of careful architecture choices, rigorous

Open

O open S save B back M mode