MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models
Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards.
What’s new (20 sec)
Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards.
Why it matters (2 min)
- Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards.
- However, recent studies reveal a critical constraint: reinforcement learning succeeds only when the base model already assigns non-negligible probability to correct answers -- a property we term…
- Open receipts to verify and go deeper.
Go deeper (8 min)
Context
Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards. However, recent studies reveal a critical constraint: reinforcement learning succeeds only when the base model already assigns non-negligible probability to correct answers -- a property we term 'latent solvability'. This work investigates the emergence of chemical reasoning capabilities and what these prerequisites mean for chemistry. We identify two necessary conditions for RL-based chemical reasoning: 1) Symbolic competence, and 2) Latent chemical knowledge. We propose mid-stage scientific training (MiST): a set of mid-stage training techniques to satisfy these, including data-mixing with SMILES/CIF-aware pre-processing, continued pre-training on 2.9B tokens, and supervised fine-tuning on 1B tokens. These steps raise the latent-solvability score on 3B and 7B models by up to 1.8x, and enable RL to lift top-1 accuracy from 10.9 to 63.9% on organic reaction naming, and from 40.6 to 67.4% on inorganic material generation. Similar results are observed for other challenging chemical tasks, while producing interpretable reasoning traces. Our results define clear prerequisites…
For builders
Builder: scan the abstract + experiments; look for code, datasets, and evals.
Verify
Prefer primary announcements, papers, repos, and changelogs over reposts.