Multilingual Speech-to-Text Benchmark

Upload an audio file, select one or more models, and optionally provide reference text. The app benchmarks WER, CER, RTF, and Time Taken for each model.

Upload Audio File (16kHz recommended)

Select Models

AudioX-North (Jivi AI) IndicConformer (AI4Bharat) MMS (Facebook)

Reference Text (Optional for WER/CER)

Reference Text (optional, paste supported)

Upload an audio file and select models to begin...

Results Comparison

📋 Copy Results

Copy-Paste Friendly Results

💡 Tips:

Reference Text: Paste your ground truth text to calculate WER/CER metrics
Copy Results: Use the copy button in the results section to copy formatted results
AI4Bharat Model: Automatically uses Hindi language with RNNT decoding
Supported Formats: WAV, MP3, FLAC, M4A (16kHz recommended for best results)