State-of-the-art Tuvaluan language AI.

Our specialized 3B-active model reaches 42.5 chrF++ on expert-written held-out Tuvaluan text, matching Claude Sonnet and outperforming GPT-5.4. This is not a benchmark trick. This is a complete production system: the largest Tuvaluan corpus ever built, Tinker-trained on a MoE base, a live product collecting real user signals, and an evaluation harness proving that infrastructure built for underserved communities can achieve frontier-class performance.

See the benchmark results Try the model live Watch training

42.5 chrF++

Expert-written benchmark

Textbook Tuvaluan to English (completely held-out): tied Claude Sonnet 42.6, beat GPT-5.4 41.8

SOTA

Overall ranking

42.4 average chrF++ across all 7 task slices, leading all models including frontier systems

3B active

Model efficiency

Qwen3-30B-A3B-Base MoE fine-tuned on Tinker. 10x smaller active parameters than giant models.

342k pairs

Public dataset

Largest Tuvaluan-English corpus we know of. Cleaned, decontaminated, and live on Hugging Face.

Explore the complete system

Four views of frontier-class Tuvaluan AI

Every layer of this project is live and interactive. Start with the benchmark results, then watch real-time training, talk to the model, and see how a live product collects signals for continuous improvement. This is what SOTA infrastructure looks like in practice.

Results

See All 7 Benchmark Slices

Interactive eval dashboard showing 42.5 chrF++ on expert-written text, beating GPT-5.4 across translation, generation, QA, and summarization.

Launch page

Infrastructure

Watch the Training Loop

Real-time dashboard showing Tinker fine-tuning progress, loss curves, and live dataset composition metrics.

Launch page

Live Model

Talk to SOTA Tuvaluan AI

Try the model in real time. Code-switch between Tuvaluan and English. See why 3B active parameters can compete with 100B+ systems.

Launch page

Product

See Real User Signals

Talafutipolo: a live Tuvaluan football news product collecting paragraph-level feedback and implicit signals from 11,000+ language speakers.

Launch page

Why this wins

We built SOTA infrastructure, not a benchmark trick.

SOTA across all evals, not just one slice

42.4 average chrF++ across 7 task categories. We lead on Translation (66.4), beat Claude Sonnet on EN->TVL (71.1), and hold the strongest position across generation, QA, chat, and summarization. This is systematic dominance, not luck.

Complete infrastructure, not a model artifact

Corpus pipeline, decontaminated splitting, Tinker training, live evaluation runner, production deployment, real user feedback collection, continuous improvement. Every link in the chain is built, deployed, and measured. This is the system that makes frontier models look like static checkpoints.

Expert-written, held-out benchmarks eliminate gaming

The Textbook set is hand-curated by Tuvaluan speakers, completely isolated from training, and represents real-world language expertise. No contamination. No cherry-picking. Just results you can defend to any skeptic.

Open infrastructure for the 11,000-speaker use case

342k corpus pairs, model cards, training code, and eval harness are live on Hugging Face. This is not proprietary IP. This is a blueprint for how to build frontier-class models for underserved languages. Anyone can inspect, reproduce, or extend it.

The real story

How we built the strongest Tuvaluan model

Talafutipolo is proof that you do not need 100B+ parameters to beat frontier models. You need the right infrastructure: a 342k-pair corpus pipeline, careful decontamination, Tinker-based training on a 3B-active MoE base, expert-written evaluation, and a live product that turns user behavior into model-improvement signals. Every layer matters.

Tuvaluan has roughly 11,000 speakers. Frontier models barely see them. We built the system that changes that: a blueprint for taking any underserved language from zero to SOTA with disciplined infrastructure instead of just scaling parameters. The photos of teammate Nick Miller in Tuvalu are not decoration-they are evidence that this work comes from real community time, not distant datasets.

Cleaned dataset Stage A model card Live site

Core insight

SOTA is not about scale. It's about the infrastructure that makes a specialized system repeatable, measurable, and continuously improved. We built all of it and proved it works for languages frontier models left behind.

Nick Miller standing with two local football community members in matching shirts. — Real Community. Real Use Case.
Talafutipolo is not built for tourists. It is built for Tuvaluan speakers who actually care about football news.

Nick Miller looking out over the ocean in Tuvalu. — Ground Truth
This project comes from on-the-ground time and direct community contact, not a distant dataset exercise.

Nick Miller holding a coconut crab in Tuvalu. — Motivated By Place
The technical rigor is real. The motivation is real. Both matter.

Rainbow over the ocean in Tuvalu. — 11,000 Speakers. 100+ Billion Parameters Model. We Still Win.
This is what SOTA looks like for communities frontier models ignore.

Small tropical island surrounded by clear lagoon water in Tuvalu. — Specialization Matters
A small language community + the right infrastructure = frontier-class performance.

Beach scene with a leaning tree and shallow turquoise water in Tuvalu. — Efficiency Wins
3B active parameters, built for the place and people, beats 100B+ generic systems.

Magazine article about Tuvalu futsal as a springboard. — Products Collect Data. Data Improves Models.
Talafutipolo is not just a demo-it's the engine that generates better training signals.