How I Built a Production-Grade Adaptive Study Coach on Amazon Nova

Inspiration
Millions of students sit high-stakes exams every year: GMAT, GRE, SAT, GAMSAT, LSAT, IELTS, TOEFL, UCAT, NCLEX-RN, CFA. Many of them study the wrong things. Not because they're lazy, but because their tools have no idea who they are. They review topics they've already mastered, miss the patterns behind their mistakes, and drift off between questions while static question banks serve the next card.
In a world of infinite tabs and notifications, students sit down to study and lose focus within minutes, not because they're “too distracted” but because nothing in their prep environment responds to them as an individual.
I know this because I lived it. I was a strong student growing up, the kind of kid who picked things up fast and didn't need to try too hard. But as I got older, the world got louder. I loved doing multiple things at once, jumping between interests, chasing whatever felt exciting in the moment.
Somewhere along the way I stopped being able to focus for long stretches. Not because I stopped being capable, but because I didn't know what I wanted, and without that clarity, motivation just evaporates. You sit down to study and twenty minutes later you're three tabs deep in something completely unrelated.
Testera's focus tracker was built for exactly that moment. I wanted to build a coach that actually pays attention, because I needed one myself.
It knows what you're bad at. It remembers what you'll forget. It speaks back.
That's Testera.
What it does
Testera is an adaptive test prep platform powered by Amazon Nova that learns how you think and focuses on what actually hurts your score.
Tera is an agentic AI study coach that adapts your practice in real time using an IRT-based engine and focus tracking. Traditional platforms tell you what you got wrong. Tera tells you why, and makes it less likely you will make the same mistake twice.
| Feature | What it means for the student |
|---|---|
| Adaptive question engine | Questions get harder or easier based on your live ability estimate, no wasted time on topics you've mastered |
| Tera (AI coach) | Agentic companion powered by Nova 2 Lite with tool use, proactively checks your weak topics, generates calibrated questions, and fetches your study plan before every reply. Tool calls render as pills in the chat so the agentic behaviour is visible |
| IELTS Speaking Examiner | Speech-to-speech AI examiner powered by Nova 2 Sonic, follows the full Part 1/2/3 format, streams audio back in real time, scores with 4-criterion band feedback |
| Snap a Problem | Photo any textbook page or whiteboard, Nova multimodal vision identifies the concept and generates an original calibrated practice question in seconds. Multimodal embeddings connect the image to your weak-topic profile |
| Animated Tera | 11 distinct emotion states driven by CSS keyframe animations, bouncing when thinking, sparkling when you nail a question, floating z's when things go quiet |
| Focus tracker | Detects tab switches, idle time, and session drift, surfaces focus metrics so you can see where your attention actually goes |
| Writing scorer | IELTS Task 1/2 and GAMSAT Section 2 essays scored by Nova with band-level feedback in seconds |
| Error diagnosis | Pattern detection across your session, spots recurring traps before you sit the real exam |
| Live score trajectory | Ability estimate mapped to real scales: GMAT 200–800, GRE 130–170, IELTS 1–9 |
| Active workspace | Scratchpad, calculator, formula sheet in a bottom drawer, mimics a real exam desk |
| Gamification | XP, streaks, achievements, and micro-celebrations to keep momentum going |
Engineering at a Glance
| Metric | Value |
|---|---|
| Exams supported | 6 (GMAT, GRE, SAT, GAMSAT, LSAT, IELTS) |
| Nova models used | Nova 2 Lite · Nova 2 Sonic · Nova Multimodal Embeddings |
| Nova capabilities | Text generation · streaming · speech-to-speech · multimodal vision · tool use / function calling |
| Adaptive engine | 3PL IRT · Newton–Raphson MLE · Bayesian prior N(0,1) |
| Spaced repetition | SM-2 algorithm · Ebbinghaus retention curve · per-topic ease factor (min 1.3) |
| Weak-area trigger | ≥ 3 errors on same topic → remediation at θ − 0.8 · SM-2 ease < 1.8 |
| Tera agentic coach | Nova tool use loop · 3 tools · streaming SSE · tool-use pills · auth users only |
| IELTS Speaking | Nova 2 Sonic bidirectional stream · Part 1/2/3 structure · 4-criterion band scoring · Polly fallback |
| Snap a Problem | Nova multimodal image → structured question · 6 exam types · multimodal embeddings (384-d) |
| Study Planner | Nova 2 Lite · IRT θ + SM-2 weak topics + target score → week-by-week schedule · 24 hr cache |
| Writing scorer | Nova 2 Lite · IELTS Task 1/2 + GAMSAT Section 2 · band-level feedback |
| Gamification | XP · streaks · achievements · toast notifications |
| Auth | JWT · Google OAuth · Email OTP (Amazon SES) |
| Frontend | Next.js 14 App Router · TypeScript · Tailwind · PWA (service worker) |
| Fallback question bank | 53 curated questions across all 6 exams |
| Test suite | 1,202 passing tests |
| AWS region | eu-west-1 (Ireland), GDPR jurisdiction · Nova Sonic: us-east-1 |
System Overview
Testera is a full-stack adaptive learning platform. The frontend is a Next.js 14 PWA deployed on Vercel. The backend is a FastAPI service running on AWS ECS Fargate, backed by Amazon Aurora PostgreSQL Serverless v2 and Amazon Bedrock (Nova 2 Lite) for AI inference.
Bayesian θ estimation
Question priority scoring
Prompt construction
IELTS / GAMSAT scoring
Session error patterns
XP · streaks · badges
Tera AI coach
Serverless v2 · Multi-AZ
OTP · transactional
Container images · CI/CD
Architecture on AWS
All AWS resources in eu-west-1 (Ireland), close to the primary user base and within GDPR jurisdiction.
- Frontend: Next.js 14 App Router, TypeScript, Tailwind CSS, deployed on Vercel. All AI calls go through the FastAPI backend, no Bedrock credentials ever touch the browser.
- Backend: FastAPI in Docker on ECS Fargate. Rolling deploys via GitHub Actions → ECR →
ecs update-service --force-new-deployment. - Database: Aurora Serverless v2 scales ACUs up and down automatically, including near-zero when idle. Keeps costs low without sacrificing cold-start performance.
Adaptive Engine: IRT in Depth
Each question has three IRT parameters: discrimination a, difficulty b, and guessing c. After every answer, the student's ability estimate θ is updated using Newton–Raphson MLE with a Bayesian prior N(0,1). The prior prevents wild swings from a single lucky or unlucky answer.
θ is then used to pick the next question via a weighted priority score:
P(q) = w₁·W(q) + w₂·D(dq, θ) + w₃·R(tq)
- W(q), error rate on this topic over recent attempts (surface weak areas)
- D(dq, θ), how close the question's difficulty is to current θ (zone of proximal development)
- R(tq), recency penalty (avoid repeating questions just seen)
Weights are tuned per exam, GMAT gets heavier weakness targeting, IELTS gets more difficulty progression. If a student accumulates errors on the same topic, the selector switches to remediation mode and rebuilds confidence before pushing harder.
Score mapping
| Exam | Scale |
|---|---|
| GMAT | 200–800 |
| GRE | 130–170 |
| IELTS | 1–9 |
| SAT | 400–1600 |
| GAMSAT | 30–90 |
| LSAT | 120–180 |
Amazon Nova Integration
Testera uses all four major Nova capabilities, text generation, speech-to-speech, multimodal understanding, and tool use.
1. Nova 2 Lite: Core intelligence
Every question, explanation, study plan, and writing score runs through eu.amazon.nova-2-lite-v1:0 via the Bedrock Converse API in eu-west-1 (GDPR jurisdiction). Nova 2 Lite was chosen for three reasons: it handles UK English (colour, favour, analyse) natively, essential for IELTS and GAMSAT users, it returns strict JSON reliably, and at ~€0.17/active user/month it makes affordable test prep sustainable.
Streaming: Tera's chat responses stream token-by-token using converseStream → Server-Sent Events → browser. Zero waiting, natural “thinking” effect.
Student types → POST /companion/stream → converseStream → SSE chunks → browser renders liveTera is an agentic system. For authenticated users, Tera runs a Nova tool-use loop before every reply. She has three tools:
| Tool | What it does |
|---|---|
get_weak_topics | Pulls the student's SM-2 weak topics in real time |
generate_question | Generates a calibrated practice question on demand |
get_study_plan | Fetches or generates the student's personalised Nova study plan |
Nova decides autonomously which tool (if any) to call. Each call renders as a pill above Tera's reply, e.g. ✦ Tera checked your weak topics, so the agentic behaviour is visible inside the actual product. Anonymous users get the plain streaming path.
Question generation: Exam-specific system prompts with explicit difficulty calibration (1–10 scale), distractor quality rules, and dual-critic verification. Nova returns strict JSON, no regex hacks. If Nova returns malformed JSON, the circuit breaker falls back to the cached question bank.
DIFFICULTY CALIBRATION:
- Difficulty 7/10 means ~32% of test-takers get it wrong.
- Use advanced concepts with 3+ step reasoning.
QUALITY REQUIREMENTS:
1. Only ONE answer can be correct (unambiguous)
2. All distractors must be plausible common errors
3. Self-contained, no external references needed
4. Consistent numbers and values throughoutStudy Planner: POST /users/study-plan sends the student's IRT θ + SM-2 weak topics + target score to Nova and gets a concrete week-by-week daily schedule back. Cached 24 hours per user.
Writing scorer: Students submit IELTS or GAMSAT essays. Nova evaluates them against official band descriptors and returns a structured response: band score, per-criterion breakdown, specific improvements, and a model answer.
Tera's coaching logic
Below is a simplified illustrative version, the production prompt includes exam-specific vocabulary, structured JSON schemas, and additional guardrails.
TERA_SYSTEM_PROMPT = """
When a student makes a mistake:
1. Acknowledge what they got right first
2. Identify the specific error pattern, not just "wrong answer"
3. Give one actionable tip, not a lecture
4. Offer a follow-up question on the same concept
Be concise. Never condescending.
Domain: {exam_type}. Use appropriate vocabulary and scoring conventions.
"""{exam_type} is replaced at runtime with the student's selected exam (e.g. IELTS, GMAT). Domain vocabulary (GMAT critical reasoning traps, GAMSAT Section III patterns, IELTS band descriptors) roughly doubles output quality on the same model.
2. Nova 2 Sonic: IELTS Speaking Examiner
Route: /speaking, the only AI-powered IELTS speaking practice tool that uses real speech-to-speech.
Student holds mic button → browser captures PCM16 audio (16 kHz, mono)
→ POST /api/speaking/turn (multipart)
→ invoke_model_with_bidirectional_stream (amazon.nova-2-sonic-v1:0, us-east-1)
→ Nova Sonic processes speech, generates examiner response
→ LPCM 24 kHz audio streamed back → browser plays immediatelyThe examiner follows the official IELTS Speaking format across three parts:
- Part 1, 5 personal questions on familiar topics
- Part 2, Cue card given, 1-minute prep timer, candidate speaks 1–2 minutes
- Part 3, Abstract discussion questions related to the Part 2 topic
At the end, Nova 2 Lite evaluates the full conversation transcript and returns a band score breakdown (Fluency & Coherence, Lexical Resource, Grammatical Range, Pronunciation) with specific, actionable feedback.
Fallback: If Nova Sonic is unavailable, the examiner role falls back to Nova 2 Lite text generation + AWS Polly TTS, the student experience is never interrupted. Voice: Ruth (British English neural), appropriate for IELTS.
3. Nova Multimodal: Snap a Problem
Route: /snap, photo any textbook page, whiteboard, or diagram and get a calibrated practice question in seconds.


Student uploads / cameras an image
→ POST /api/snap/question
→ Nova 2 Lite converse() with image bytes + exam_type
→ Structured JSON: topic, difficulty, question, 4 options, correct_index, explanation
→ Student answers inline, sees full explanationNova reads the image, identifies the mathematical concept or logical principle shown, and generates an original practice question inspired by (not copying) that content, calibrated to the chosen exam (GMAT, GRE, SAT, GAMSAT, LSAT, IELTS).
Nova multimodal embeddings (amazon.nova-2-multimodal-embeddings-v1:0, 384 dimensions) also run silently on each image to enable similarity search, connecting a student's photo to the most relevant weak areas in their profile.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | Next.js 14 App Router, TypeScript, Tailwind CSS, PWA (service worker) |
| Backend | FastAPI (Python 3.11), ECS Fargate, Docker |
| Database | Amazon Aurora PostgreSQL Serverless v2 |
| AI (text + tool use) | Amazon Bedrock, Nova 2 Lite (eu.amazon.nova-2-lite-v1:0, eu-west-1) |
| AI (speech-to-speech) | Amazon Nova 2 Sonic (amazon.nova-2-sonic-v1:0, us-east-1) |
| AI (multimodal embeddings) | Nova multimodal embeddings (amazon.nova-2-multimodal-embeddings-v1:0, 384-d) |
| TTS fallback | Amazon Polly (Ruth, British English neural, streaming PCM) |
| Amazon SES (OTP, transactional) | |
| Container registry | Amazon ECR |
| Frontend hosting | Vercel (Edge CDN) |
| Load balancer | AWS ALB + ACM |
| CI/CD | GitHub Actions → ECR → ECS rolling deploy |
Lessons Learned
Here's what actually mattered:
- Prompt engineering beats model size. Adding exam-specific vocabulary and structured JSON schemas to the system prompt roughly doubled output quality on the same Amazon Nova 2 Lite model. A smarter prompt is almost always cheaper than a bigger model.
- IRT is worth the complexity. A simple difficulty ladder would have been faster to build. But the 3PL model gives you a real ability estimate, one you can map to actual exam scales and explain to a student. That credibility matters.
- Speech-to-speech changes what's possible. Nova Sonic's bidirectional streaming, PCM16 in, LPCM 24 kHz out, makes the IELTS Speaking Examiner feel like a real interview. TTS bolt-ons cannot replicate this; the latency difference is immediately noticeable.
- Visible agentic behaviour builds trust. Tool-use pills showing ✦ Tera checked your weak topics are a one-line UI change that transforms a black-box AI into something students understand and trust. Transparency is a product feature, not just a compliance checkbox.
- Multimodal input removes friction. Students photograph textbook pages they're already looking at. The image → structured question pipeline meets them where they are rather than asking them to reframe the problem into text first.
- Circuit breakers are not optional. Bedrock will occasionally be slow or unavailable. Building fallbacks before launch, question bank, Polly TTS for Sonic, cached study plans, meant zero student-facing failures.
- Aurora Serverless v2 cold starts are real. Near-zero ACU scaling is great for cost, but the first query after idle can be slow. Warm-up pings on a schedule solved it without giving up the savings.
- Focus tracking changes how students feel about the product. It's not the flashiest feature, but students who see their own attention data become more invested in improving it. Behavioural feedback loops are underrated in edtech.
Acknowledgements
Thanks to the AWS Nova Hackathon team for the opportunity and to the IRT research community, the 3PL model and Newton–Raphson ability estimation that power Testera's adaptive engine have decades of rigorous work behind them.
High-stakes exams. All four Nova capabilities. Live at testera.org.
Responsible AI
All practice questions and explanations are AI-generated original content, not reproductions of official exam items.
- No proprietary or licensed exam questions are fed into Nova.
- Questions are clearly presented as AI-generated practice material, not official exam items.
- Testera does not imply endorsement by ETS, GMAC, College Board, ACER, or any other exam body.
- Authenticated users can flag any question as incorrect or low-quality, every report is reviewed.
- Testera is for practice only, not official scoring or academic advice.
IELTS and GAMSAT: Practice questions are AI-generated originals written to reflect the style and cognitive demands of each exam. They are not sourced from, affiliated with, or endorsed by the British Council, IDP, Cambridge Assessment English (IELTS) or ACER (GAMSAT). Testera's IELTS writing scorer evaluates essays against publicly documented band descriptors as a study aid, it does not produce official band scores. See testera.org/terms for full details.
Sources & References

Anya Chueayen
Founder of Aqta. Before this, I worked on integrity at social media platforms, the unglamorous side of AI where human behaviour, edge cases, and ethics collide at scale. That work convinced me that responsible AI needs infrastructure, not just good intentions. Based in Dublin, closely watching how regulation is reshaping what we build and how.
If you're interested in the governance side of AI systems like this, these two pieces go deeper.
Related Articles
The Human Supply Chain Behind AI
The invisible labour that powers AI systems, and why it matters for governance.
Who's Accountable When Healthcare AI Makes a Mistake?
Ireland's Medical Council says doctors remain responsible for AI decisions. But how can they be confident in tools they don't fully understand?