Skip to content
Provenance Brief
Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

What Language is This? Ask Your Tokenizer

Language Identification (LID) is an important component of many multilingual natural language processing pipelines, where it facilitates corpus curation, training data analysis, and cross-lingual evaluation of large…

Read Original

What Language is This? Ask Your Tokenizer

TLDR

Language Identification (LID) is an important component of many multilingual natural language processing pipelines, where it facilitates corpus curation, training data analysis, and cross-lingual evaluation of large…

Artifacts
Paper PDF
Open
O open S save B back M mode