TALAFUTIPOLO
DemoFateleChat

State-of-the-art Tuvaluan language AI.

Our specialized 3B-active model reaches 42.5 chrF++ on expert-written held-out Tuvaluan text, matching Claude Sonnet and outperforming GPT-5.4. This is not a benchmark trick. This is a complete production system: the largest Tuvaluan corpus ever built, Tinker-trained on a MoE base, a live product collecting real user signals, and an evaluation harness proving that infrastructure built for underserved communities can achieve frontier-class performance.

42.5 chrF++

Expert-written benchmark

Textbook Tuvaluan to English (completely held-out): tied Claude Sonnet 42.6, beat GPT-5.4 41.8

SOTA

Overall ranking

42.4 average chrF++ across all 7 task slices, leading all models including frontier systems

3B active

Model efficiency

Qwen3-30B-A3B-Base MoE fine-tuned on Tinker. 10x smaller active parameters than giant models.

342k pairs

Public dataset

Largest Tuvaluan-English corpus we know of. Cleaned, decontaminated, and live on Hugging Face.

Why this wins

We built SOTA infrastructure, not a benchmark trick.

01

SOTA across all evals, not just one slice

42.4 average chrF++ across 7 task categories. We lead on Translation (66.4), beat Claude Sonnet on EN->TVL (71.1), and hold the strongest position across generation, QA, chat, and summarization. This is systematic dominance, not luck.

02

Complete infrastructure, not a model artifact

Corpus pipeline, decontaminated splitting, Tinker training, live evaluation runner, production deployment, real user feedback collection, continuous improvement. Every link in the chain is built, deployed, and measured. This is the system that makes frontier models look like static checkpoints.

03

Expert-written, held-out benchmarks eliminate gaming

The Textbook set is hand-curated by Tuvaluan speakers, completely isolated from training, and represents real-world language expertise. No contamination. No cherry-picking. Just results you can defend to any skeptic.

04

Open infrastructure for the 11,000-speaker use case

342k corpus pairs, model cards, training code, and eval harness are live on Hugging Face. This is not proprietary IP. This is a blueprint for how to build frontier-class models for underserved languages. Anyone can inspect, reproduce, or extend it.

The real story

How we built the strongest Tuvaluan model

Talafutipolo is proof that you do not need 100B+ parameters to beat frontier models. You need the right infrastructure: a 342k-pair corpus pipeline, careful decontamination, Tinker-based training on a 3B-active MoE base, expert-written evaluation, and a live product that turns user behavior into model-improvement signals. Every layer matters.

Tuvaluan has roughly 11,000 speakers. Frontier models barely see them. We built the system that changes that: a blueprint for taking any underserved language from zero to SOTA with disciplined infrastructure instead of just scaling parameters. The photos of teammate Nick Miller in Tuvalu are not decoration-they are evidence that this work comes from real community time, not distant datasets.

Cleaned datasetStage A model cardLive site

Core insight

SOTA is not about scale. It's about the infrastructure that makes a specialized system repeatable, measurable, and continuously improved. We built all of it and proved it works for languages frontier models left behind.