When Fine-Tuning Changes the Evidence: Architecture-Dependent Semantic Drift in Chest X-Ray Explanations

Key Takeaways

However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions.
Hugging Face Daily Papers define semantic drift as systematic changes in the attribution structure supporting a model's predictions between transfer learning and full fine-tuning, reflecting…
Using a five-class chest X-ray task, Hugging Face Daily Papers evaluates Dense Net201, Res Net50 V2, and Inception V3 under a two-stage training protocol and quantify drift with reference-free…

Why It Matters

Context

However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions. Hugging Face Daily Papers define semantic drift as systematic changes in the attribution structure supporting a model's predictions between transfer learning and full fine-tuning, reflecting potential shifts in underlying visual reasoning despite stable classification performance. Using a five-class chest X-ray task, Hugging Face Daily Papers evaluates Dense Net201, Res Net50 V2, and Inception V3 under a two-stage training protocol and quantify drift with reference-free metrics capturing spatial localization and structural consistency of attribution maps. Across architectures, coarse anatomical localization remains stable, while overlap Io U reveals pronounced architecture-dependent reorganization of evidential structure. Beyond single-method analysis, stability rankings can reverse across Layer CAM and Grad CAM++ under converged predictive performance, establishing explanation stability as an interaction between architecture, optimization phase, and attribution objective.

For Builders

However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions.