T5 ASR Grammar Corrector Project
A post-ASR correction model trained on 90 million noisy / clean pairs, designed to fix typical speech recognition errors in (near) real-time. It helps clean up transcriptions from ASR systems like Whisper, Nemo etc, improving readability and grammatical correctness with minimal latency.
Designed to: Improve readability and professionalism in transcripts Make ASR outputs usable for customer service, legal, and healthcare Helping non-native speakers interpret ASR more easily Supports real-time captioning and assistive technologies
Fixes common ASR issues:
- Homophones (e.g., βtheirβ vs βthereβ)
- Subject-verb agreement
- Verb tense errors
- Missing auxiliaries and articles
- Contractions and prepositions
- Pronoun misuse
- Repeated or corrupted words
Fast inference (<50ms on GPU)
Easy to plug into real-time ASR pipelines
HuggingFace-compatible model loading
license: mit language:
- en base_model:
- google-t5/t5-small pipeline_tag: text2text-generation tags:
- ASR,
A lightweight grammar correction model fine-tuned from t5-small and t5-base, specifically designed to correct common errors in automatic speech recognition (ASR) outputs β including homophones, verb tense issues, contractions, duplicated words, and more. Optimized for fast inference in (near) real-time ASR pipelines.
- Small model:
t5-small - Base model:
t5-base - Fine-tuned on: 90 million synthetic (noisy β clean) sentence pairs
- Training objective: Correct ASR-style transcription errors into clean, grammatical English
- Framework: Hugging Face Transformers + PyTorch
| Model | Type | Precision | Latency (s/sample) | VRAM (MB) | BLEU | ROUGE-L | Accuracy (%)ΒΉ | Token Accuracy (%)Β² | Size (MB) |
|---|---|---|---|---|---|---|---|---|---|
| dj-ai-asr-grammar-corrector-t5-base | HF | fp32 | 0.1151 | 24.98 | 78.92 | 90.31 | 44.62 | 90.39 | 5956.76 |
| dj-ai-asr-grammar-corrector-t5-small | HF | fp32 | 0.0648 | 6.27 | 76.47 | 89.54 | 39.59 | 88.76 | 1620.15 |
| dj-ai-asr-grammar-corrector-t5-small-streaming | HF | fp32 | 0.0634 | 14.77 | 76.25 | 89.61 | 39.90 | 88.54 | 1620.65 |
- Accuracy is a measure of how well the model performs across the full sentence. That is, a prediction is only counted as "correct" if the entire corrected sentence exactly matches the reference sentence. So if the model corrects 1 out of 2 errors, but the final output does not exactly match the expected sentence, it's counted as a fail.
- Token Accuracy is a measure of how well the model performs at the token level.
| Use Case | β Supported | π« Not Recommended |
|---|---|---|
| Post-ASR correction | β Yes | |
| Real-time ASR pipelines | β Yes | |
| Batch transcript cleanup | β Yes | |
| Grammar education tools | β Yes | |
| Formal document editing | π« | Model may be too informal |
| Multilingual input | π« | English-only fine-tuning |
- Homophone mistakes (
theirβthey're) - Subject-verb disagreement (
he goβhe goes) - Verb tense corruption (
i seenβi saw) - Missing auxiliaries (
you goingβare you going) - Contraction normalization (
she is notβshe isn't) - Repeated words (
i i wantβi want) - Misused articles/prepositions/pronouns
Input (noisy ASR):
Models have been trained on DJ-AI Custom Dataset which includes over 90 million real and synthetic ASR errors and corrected texts pairs. The models are based on T5 pretrained models.
https://huggingface.co/dayyanj/dj-ai-asr-grammar-corrector-small
https://huggingface.co/dayyanj/dj-ai-asr-grammar-corrector-small-streaming
https://huggingface.co/dayyanj/dj-ai-asr-grammar-corrector-base
DEMO: https://huggingface.co/spaces/dayyanj/dj-ai-asr-grammar-corrector-demo
MIT License.