Skip to content
Mobrief

Bypassing CoreML to natively train a 110M Transformer on the Apple Neural Engine (Orion)

It is hard to communicate how frustrating the current Apple ML stack is for low-level research.

Reddit MachineLearning · · ~4 min read
Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

  • CoreML imposes opaque abstractions that prevent direct ANE programming and do not support on-device training.
  • Despite having up to 38 TOPS (INT8) and \~19 TFLOPS of fp16 compute, the ANE remains almost entirely unused for large language model workloads.
  • Building on the foundational hardware reverse-engineering by maderix (who mapped the private API surface and benchmarked the 32 MB SRAM cliff), I w

Context

CoreML imposes opaque abstractions that prevent direct ANE programming and do not support on-device training. Despite having up to 38 TOPS (INT8) and \~19 TFLOPS of fp16 compute, the ANE remains almost entirely unused for large language model workloads. Building on the foundational hardware reverse-engineering by maderix (who mapped the private API surface and benchmarked the 32 MB SRAM cliff), I w

For builders

CoreML imposes opaque abstractions that prevent direct ANE programming and do not support on-device training.

CoreML imposes opaque abstractions that prevent direct ANE programming and do not support on-device training.

Read Original
Open
O open S save B back M mode