← Back to Engcall

Engcall is a phone-based English conversation service — the entire product depends on reliable, low-latency voice calls between Korean students and Filipino tutors. I needed a call system that could handle daily 20-minute sessions with consistent audio quality, even on unstable mobile networks. Third-party meeting tools like Zoom or Google Meet weren't an option — they add friction, require separate accounts, and don't integrate into the booking and session management flow I was building.

I chose Agora SDK for the WebRTC layer. It handles the hard parts of real-time audio — NAT traversal, codec negotiation, adaptive bitrate — while giving me full control over the call lifecycle through its API. The architecture is straightforward: the Spring Boot backend generates short-lived RTC tokens and manages session state, while the React frontend handles the Agora connection and UI.

  • Token generation — When a student or tutor enters the call room, the backend generates an Agora RTC token (expires in 1 hour) scoped to their course channel. Channel names follow the format engcall_s{studentId} to ensure each pair gets an isolated audio channel.
  • Session management — The backend tracks each participant's state (OFFLINE → WAITING → IN_CALL) and the overall session status (CREATED → IN_PROGRESS → ENDED). This runs entirely over REST — no WebSockets.
  • Connection flow — Both participants enter the call room and poll the session endpoint every second. When both are WAITING, the frontend auto-connects through Agora. A heartbeat ping fires every 5 seconds so the backend can detect disconnections (10-second timeout).
  • Call links — Tutors receive AES-GCM encrypted URLs containing their identity, so they can join without logging in. The backend decrypts and validates the token before granting access.
  • Polling vs WebSocket — I deliberately chose HTTP polling over WebSockets. For a two-person call room, the overhead is negligible, and polling is far simpler to deploy, debug, and scale on a single EC2 instance. The 1-second poll interval is fast enough that participants never notice a delay.
  • Noise suppression — Tutors often call from shared offices in the Philippines. I integrated Agora's AI denoiser extension (WASM-based) to suppress background noise in real time. This made a significant difference in call quality without any server-side processing.
  • Time window validation — Calls are only allowed within a ±5 minute window of the scheduled lesson time. The backend validates course status, lesson day, and long-leave schedules before issuing a token — preventing unauthorized access and accidental early/late joins.
  • Feedback loop — After each call, both participants can rate quality and satisfaction (1-5). This data helps me identify network issues and improve the experience over time.