
ToneTutor launches free AI HSK speaking test with real‑time tone grading
John Lee’s ToneTutor records Mandarin speech in the browser, transcodes it locally, and uses Gemini 2.5 Flash to grade tone and grammar on the HSK scale. The React‑FastAPI app runs on Google Cloud Run and is free for three sessions per user.
ToneTutor went live on June 28, 2026 as a free three‑minute Mandarin speaking test that returns an HSK level estimate and a list of tone or grammar weak points [Dev.to]. The front‑end captures audio with the Web Audio API, converts the WebM stream to PCM (LINEAR16) in the browser, and streams the transcript to a FastAPI service on Google Cloud Run. The back‑end wraps the PCM bytes in a WAV header, forwards them to Gemini 2.5 Flash for speech‑to‑text, then prompts the model with a rubric that checks tone accuracy, grammar correctness, and vocabulary range. The model’s response is parsed into a numeric HSK level (1‑6) and a short feedback paragraph, displayed in real time.
What shipped
- Frontend: React + TypeScript UI, Web Audio API‑based transcoding, live transcript pane.
- Backend: FastAPI on Cloud Run, Firestore for session persistence, Gemini 2.5 Flash for low‑latency transcription and grading [Google AI Blog].
- User experience: Three‑minute recording, instant score card, free for three sessions per user, shareable URL for each result.
Why it matters
ToneTutor makes self‑assessment practical by delivering an HSK‑K speaking score in under three minutes, eliminating the need for a human examiner or paid tutor. Performing WebM‑to‑LINEAR16 conversion in the browser sidesteps iOS Safari’s lack of WebM support and removes a server‑side transcoding step, cutting latency and cloud‑compute costs. Gemini 2.5 Flash’s sub‑200 ms response time proves that a flash model can be steered with a simple prompt to perform niche language grading without fine‑tuning, showing production readiness for similar tasks.
Editor’s take
The stack—browser‑side audio processing, serverless FastAPI, and a flash‑grade LLM—offers a repeatable blueprint for language‑learning products. However, reliance on Google’s proprietary Gemini service locks developers into Cloud Run pricing and limits experimentation with open‑source alternatives.
Reader poll
Which platform would you trust for AI‑powered language assessment?
- Google Cloud + Gemini
- Open‑source stack (Whisper + custom grading)
- Self‑hosted on‑premise solution
- Human tutors only
Subscribe to the broadcast.
Daily digest of the day's most important tech news. No fluff. Engineering signal only.
// delivered via substack · double-opt-in confirmation


