MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models

Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards.

Receipts Open original

What’s new (20 sec)

Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards.

Why it matters (2 min)

Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards.
However, recent studies reveal a critical constraint: reinforcement learning succeeds only when the base model already assigns non-negligible probability to correct answers -- a property we term…
Open receipts to verify and go deeper.

Go deeper (8 min)

Context

Large Language Models can develop reasoning capabilities through online fine-tuning with rule-based rewards. However, recent studies reveal a critical constraint: reinforcement learning succeeds only when the base model already assigns non-negligible probability to correct answers -- a property we term 'latent solvability'. This work investigates the emergence of chemical reasoning capabilities and what these prerequisites mean for chemistry. We identify two necessary conditions for RL-based chemical reasoning: 1) Symbolic competence, and 2) Latent chemical knowledge. We propose mid-stage scientific training (MiST): a set of mid-stage training techniques to satisfy these, including data-mixing with SMILES/CIF-aware pre-processing, continued pre-training on 2.9B tokens, and supervised fine-tuning on 1B tokens. These steps raise the latent-solvability score on 3B and 7B models by up to 1.8x, and enable RL to lift top-1 accuracy from 10.9 to 63.9% on organic reaction naming, and from 40.6 to 67.4% on inorganic material generation. Similar results are observed for other challenging chemical tasks, while producing interpretable reasoning traces. Our results define clear prerequisites…

For builders

Builder: scan the abstract + experiments; look for code, datasets, and evals.

Verify

Prefer primary announcements, papers, repos, and changelogs over reposts.

Receipts

MiST: Understanding the Role of Mid-Stage Scientific Training in Developing Chemical Reasoning Models (arXiv cs.LG)