Executive Summary

Measuring how children develop language is exacting work. It requires listening carefully, transcribing accurately, applying consistent scoring rules, and doing it again — for every student, every assessment cycle, across classrooms, clinics, and research studies. Done manually, that process takes hours. It varies between scorers. It delays the insights that teachers and therapists need to act early.

Syntameter was built to change that. Their vision was an AI platform that could analyze children’s speech and writing samples, classify sentence structures, compute standardized language metrics, and return a complete, shareable report — automatically, consistently, and fast.

Aegasis Labs designed and built the platform end-to-end. We translated Syntameter’s research-grounded requirements into a production system: a modular AI pipeline combining speech recognition, OCR, and NLP-based syntactic parsing, wrapped in a teacher-friendly interface and deployed on secure, scalable cloud infrastructure. The result cut assessment processing time by approximately 85%, with complete reports delivered in under 90 seconds.

About Syntameter

Syntameter is an education-focused startup building tools to understand and measure how children develop language. Their users span classrooms, therapy clinics, and research programs — teachers tracking grammar development across a class, speech-language therapists monitoring individual progress, researchers running IRB-approved studies, and parents who want to understand how their child is doing.

What connects all of those users is a shared frustration with the existing process. Manual transcription and hand-scoring of speech or writing samples is the current standard. It works, in the sense that it produces results. But it’s slow, labor-intensive, and prone to scorer variability — which matters enormously when you’re trying to measure growth over time or compare outcomes across a study cohort.

Syntameter’s founding team came from education and language research. They knew exactly what the problem was. What they needed was a technical partner who could turn a well-defined research vision into a reliable, deployable product. That’s the engagement Aegasis Labs took on.

The Challenge

Precision Work That Doesn’t Scale

Language assessment at the syntactic level is not a simple task. Classifying whether a child’s utterance is a simple sentence, a compound sentence, or a complex one requires understanding clause boundaries, subordinating conjunctions, dependency relationships between words — and doing that analysis consistently, regardless of who’s doing the scoring or what language the child is speaking.

Manually, that’s a trained professional’s work. It takes time. And it introduces variability: two experienced scorers applying the same rubric to the same sample will sometimes reach different conclusions. For research purposes, that inconsistency is a methodological problem. For classroom use, it’s a practical one — assessments that take hours don’t happen weekly.

Three specific constraints defined the challenge.

 

  • Speed and consistency had to be solved together. Automating the process only creates value if the automated results are accurate enough to trust. An AI pipeline that’s fast but inconsistent doesn’t replace manual scoring — it just produces unreliable numbers faster.
  • The platform had to work across languages. A significant portion of the student populations Syntameter serves speak English, Spanish, or Mandarin/Cantonese at home. Most existing scoring tools are built for a single language and break down with multilingual input. Syntameter needed language-aware pipelines for all three from the start.
  • Research-grade compliance wasn’t optional. Schools and universities running IRB-approved studies have strict requirements around consent, data handling, and audit trails. Any platform serving that use case had to meet those standards by design — not as an afterthought.
  •  

Underneath all three was a product design challenge. The professionals who would use this platform daily are teachers and therapists, not data scientists. A technically sound system that’s difficult to use in a classroom doesn’t get used. The interface had to be as carefully considered as the AI pipeline behind it.

 

The Solution

An End-to-End AI Assessment Platform, Built for the Classroom and the Research Lab

Aegasis Labs designed and built Syntameter from the ground up — UI/UX, data architecture, NLP engines, scoring models, and cloud deployment. The goal was to remove every manual step from the assessment workflow without removing any of the rigor.

The platform accepts two types of input: live speech recorded directly in the app, and written samples uploaded as images or PDFs. From there, the pipeline takes over.

How It Works 

 

  • A teacher or therapist records a speech sample or uploads a writing sample. On mobile or desktop, the intake process takes under a minute.
  • For speech, the ASR engine transcribes the audio and segments it into utterances using pause detection tuned for child speech patterns. For written samples, the OCR engine converts the image or PDF into clean, analyzable text.
  • The NLP pipeline — built on spaCy with custom rule-based logic for clause boundaries, conjunctions, and subordinate structures — parses each utterance and classifies it: Simple, Compound, Complex, or Compound-Complex.
  • The scoring engine computes standardized metrics: Variability (Shannon-index-based), Quality, Non-Repetitious Mean Length of Utterance, and a Composite score. These are the same measures researchers and clinicians already rely on — now calculated automatically and consistently every time.
  • The results appear in a readable transcript annotated with morpheme counts, lexical inventory, sentence-type tags, and confidence indicators. Progress dashboards show growth over time, flag students who may need intervention, and give administrators a class or district-level view. Reports are shareable with parents in one click.

The full process — intake to report — completes in under 90 seconds.

 

What Was Built

  • Speech and text intake: Live speech recording and audio upload via ASR; image and PDF upload via OCR. Both pipelines convert inputs into clean, segmented text ready for linguistic analysis, across English, Spanish, and Mandarin/Cantonese.
  • Utterance segmentation engine: A linguistically grounded segmentation layer that splits transcribed content into meaningful units using pause detection and punctuation rules, tuned specifically for child speech and multilingual input.
  • Sentence-type classification: spaCy-based dependency parsing with custom rule logic for clause boundaries, conjunctions, and subordination — producing consistent Simple, Compound, Complex, and Compound-Complex labels at utterance level.
  • Standardized scoring engine: Automatic computation of Variability, Quality, Non-Repetitious MLU, and Composite metrics using reproducible mathematical models, so scores are comparable across sites, studies, and time points.
  • Annotated transcripts: Auto-generated transcripts with morpheme counts, lexical inventory, sentence-type tags, and confidence indicators — structured for both classroom use and research export.
  • Progress dashboards: Interactive visualizations built with D3.js and Chart.js showing individual growth trajectories, class-level views, benchmark comparisons, and intervention flags for administrators.
  • Multilingual pipelines: Language-aware segmentation and parsing for English, Spanish, and Mandarin/Cantonese, handling both speech and written input consistently across all three.
  • IRB-compliant data handling: Consent flows, auditable sharing permissions, role-based access for teachers, researchers, and parents, and encrypted storage — built to meet research ethics requirements from the first line of architecture.
  • Scalable cloud backend: AWS Lambda for elastic execution, S3 for encrypted storage, event-driven processing for ASR/OCR and NLP jobs, CloudWatch monitoring, and daily automated backups.
  • Teacher-friendly frontend: A React application built around real classroom workflows — quick recording and upload, one-click report sharing, and accessible progress visualizations that parents can actually read.

 

Technologies The platform was built on a modern, production-grade stack selected for reliability at scale:

  • Languages & Frameworks: Python, FastAPI, Flask, React
  • NLP & AI: spaCy, custom syntactic scoring models, ASR/OCR integrations
  • Database: PostgreSQL
  • Cloud: AWS Lambda, S3, SAM, CloudWatch
  • Testing & CI/CD: Pytest, automated pipelines
  • Visualization: D3.js, Chart.js
  • Security & Compliance: IRB consent management, encrypted storage

 

The Results

Syntameter launched as a production platform used by real teachers, therapists, and researchers — not a pilot or a proof of concept. The outcomes it delivered against the core problems it was built to solve are concrete and sourced directly from rollout data.

 

  • Fast assessments: Assessment processing time dropped by approximately 85%. What previously required manual transcription and hand-scoring — a process measured in hours — now completes in under 90 seconds. That shift doesn’t just save time. It changes what’s operationally possible: weekly assessments become realistic where monthly ones were already a stretch.
  • High accuracy: ~95% OCR accuracy and >90% ASR-based utterance segmentation across English, Spanish, and Mandarin/Cantonese. Those numbers matter because the platform’s value depends entirely on whether educators and researchers can trust the scores it produces.
  • Proven adoption: 80% of teachers ran weekly assessments in the first rollout.
  • Parent engagement: 60% of parents opened shared reports within 24 hours.
  • Research-grade compliance: End-to-end IRB-compliant workflows with consent tracking and an auditable history.
  • Rich, actionable insights: Automatic transcripts with sentence types, morpheme counts, lexical inventory, and clear Variability/Quality/MLU/Composite scores.
  • Live dashboards: Real-time progress views for classrooms and districts, with benchmarks and flags for students who may need intervention.
  • Multilingual by design: Supports English, Spanish, and Mandarin/Cantonese for both speech and text samples.
  • Scalable and reliable: Cloud-native architecture (AWS Lambda + S3) handles thousands of samples with automated backups and monitoring.
  • Secure data handling: Encrypted storage, least-privilege access, and daily backups to protect sensitive student information.

 

Build Your AI Product with Aegasis Labs

Syntameter came to us with a research-grounded vision, a technically demanding domain, and users — teachers, therapists, parents — who needed the product to work simply and reliably in their daily routines. That combination requires more than engineering competence. It requires understanding how to translate domain expertise into AI behavior that holds up under real-world conditions.

That translation is what Aegasis Labs does across every engagement — from AI strategy through production deployment.

Ready to Build? If you’re building an intelligent system that needs to perform accurately in a high-stakes domain, visit aegasislabs.com to start the conversation.

  • Category:
    AI and Machine Learning Software Development
  • Client:
    Syntameter
  • Location
    UKA
  • Industry:
    SaaS
  • Stack
    Python, FastAPI, Flask, React, PostgreSQL, AWS Lambda, D3.js, Chart.js

Cressi: The AI-Powered Shopping Assistant