Revisiting the Role of Foundation Models in Cell-Level Histopathological Image Analysis under Small-Patch Constraints -- Effects of Training Data Scale and Blur Perturbations on CNNs and Vision Transformers

Background and objective: Cell-level pathological image analysis requires working with extremely small image patches (40x40 pixels), far below standard ImageNet resolutions.

arXiv cs.CV · Mar 04, 2026 13:52 UTC · Paper: ~15 min

Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

Key Takeaways

May affect how AI can be used.
Background and objective: Cell-level pathological image analysis requires working with extremely small image patches (40x40 pixels), far below standard ImageNet resolutions.
It remains unclear whether modern deep learning architectures and foundation models can learn robust and scalable representations under this constraint.

What It Means

Context

Background and objective: Cell-level pathological image analysis requires working with extremely small image patches (40x40 pixels), far below standard ImageNet resolutions. It remains unclear whether modern deep learning architectures and foundation models can learn robust and scalable representations under this constraint. We systematically evaluated architectural suitability and data-scale effects for small-patch cell classification. Methods: We analyzed 303 colorectal cancer specimens with CD103/CD8 immunostaining, generating 185,432 annotated cell images. Eight task-specific architectures were trained from scratch at multiple data scales (FlagLimit: 256--16,384 samples per class), and three foundation models were evaluated via linear probing and fine-tuning after resizing inputs to 224x224 pixels. Robustness to blur was assessed using pre- and post-resize Gaussian perturbations. Results: Task-specific models improved consistently with increasing data scale, whereas foundation models saturated at moderate sample sizes. A Vision Transformer optimized for small patches (CustomViT) achieved the highest accuracy, outperforming all foundation models with substantially lower…

For builders

It remains unclear whether modern deep learning architectures and foundation models can learn robust and scalable representations under this constraint.

For Builders

It remains unclear whether modern deep learning architectures and foundation models can learn robust and scalable representations under this constraint.

Artifacts

Paper PDF

Read Original