Skip to content
Provenance Brief
Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

The reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks.

Read Original

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

TLDR

The reliability of multilingual Large Language Model (LLM) evaluation is currently compromised by the inconsistent quality of translated benchmarks.

Artifacts
Paper PDF
Open
O open S save B back M mode