← Back to Engcall
Speaking Analysis
Upload an audio file or record yourself — powered by Whisper + GPT-4o
Problem
Students wanted more detailed feedback on their speaking — pronunciation, grammar patterns, vocabulary range. But our tutors were already at capacity handling daily 1:1 sessions, and writing thorough reviews for every student wasn't scalable.
Solution
I built an automated speaking analysis pipeline. Students upload or record their English audio, Whisper transcribes it, and GPT-4o analyzes grammar, vocabulary, fluency, and coherence — delivering instant, detailed feedback that would have taken a tutor 20+ minutes to write.
Considerations
- Two-step pipeline (transcription → analysis) so each stage can fail independently
- In production, uses AssemblyAI for speaker diarization — separating tutor and student audio before analysis