← Back to Engcall

Students wanted more detailed feedback on their speaking — pronunciation, grammar patterns, vocabulary range. But our tutors were already at capacity handling daily 1:1 sessions, and writing thorough reviews for every student wasn't scalable.

I built an automated speaking analysis pipeline. Students upload or record their English audio, Whisper transcribes it, and GPT-4o analyzes grammar, vocabulary, fluency, and coherence — delivering instant, detailed feedback that would have taken a tutor 20+ minutes to write.

  • Two-step pipeline (transcription → analysis) so each stage can fail independently
  • In production, uses AssemblyAI for speaker diarization — separating tutor and student audio before analysis