The core task of the TidyLang Challenge is utterance-level Spoken Language Recognition (SLR) under controlled speaker-overlap conditions. Systems must predict the language of each utterance and/or score language verification trial pairs.
Submission: Participants must submit results for the closed-condition. The open-condition is optional.
Final evaluation (per condition) includes two tasks:
During validation (development phase), both an identification set and an enrollment-based verification set are provided so participants can evaluate both tasks locally. See the Evaluation Plan and Baseline Systems for protocols and trial formats.
The TidyLang Challenge consists of two phases:
Development Phase: This phase is conducted offline by participants. You use the provided training and validation datasets to develop, train, and tune your systems locally. You can experiment with different approaches and evaluate on the validation set without any online submission.
Evaluation Phase: This phase involves online ranking on the CodaBench website. The link to the challenge on CodaBench will be shared when the evaluation phase opens. At that time, the evaluation set and submission procedure will be announced. Rankings will be determined based on performance on the evaluation set, and participants can view their position on the leaderboard.
The TidyLang Challenge is built upon a curated data partition derived from the Mozilla Common Voice (MCV) corpus: Tidy-X. This dataset is specifically designed to emphasize language switching and contains multilingual speakers.
Important: The use of any other data from the MCV corpus is strictly forbidden. Only the official Tidy-X training and validation partitions may be used from MCV.
Training/validation splits: The training and validation portions used in this challenge are different from the original splits of the Tidyvox dataset. Participants must follow the official manifest provided in the baseline repository: training_manifest.txt.
| Dataset | # Spkr | # Lang | # Utt | Duration (h) | Domain |
|---|---|---|---|---|---|
| Tidy-X (Total) | 4,474 | 40 | 321,711 | 457 | Read |
| Tidy-X: Train | 3,666 | 40 | 262K | 370 | Read |
| Tidy-X: Valid | 808 | 40 | 60K | 87 | Read |
Details about the evaluation set (e.g., number of languages, speakers, or trials) are not disclosed before the evaluation phase. This ensures a fair and unbiased benchmark. We will release the evaluation data and the evaluation trial pair lists when the evaluation phase opens.
Strict restriction on Common Voice data: The only data permitted from the Mozilla Common Voice (MCV) dataset is the official Tidy-X training (and validation) partition. The use of any other data from the MCV corpus is strictly forbidden in both conditions.
Pre-trained models: Participants are free to use publicly available pre-trained/SSL models (e.g., XLS-R, Whisper, wav2vec2, HuBERT, WavLM). All pre-trained models must be explicitly declared in the system description paper.