← Latest
arXiv cs.AI Dec 24, 2025 14:33 UTC

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models.

Receipts Open original

What’s new (20 sec)

Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models.

Why it matters (2 min)

  • Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models.
  • To address this gap, this paper introduces SpidR-Adapt for rapid adaptation to new languages using minimal unlabeled data.
  • Open receipts to verify and go deeper.

Go deeper (8 min)

Context

Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and spoken language modeling (sWUGGY, sBLIMP, tSC), improving over in-domain language models after training on less than 1h of target-language audio, over $100\times$ more data-efficient than standard training. These findings…

For builders

Builder: scan the abstract + experiments; look for code, datasets, and evals.

Verify

Prefer primary announcements, papers, repos, and changelogs over reposts.

Receipts

  1. SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation (arXiv cs.AI)