10 YEARS BUILDING // NOW OPEN SOURCE

OPEN SOURCE AI
FOR 50 MILLION
VOICES.

Burmese-first language models, datasets, and tools — built by the community, for the community. Constructing the digital heritage infrastructure that our community should own.

THE ORIGIN.

They built the revolution in English. We are rebuilding it for us.

For 50 million voices, the AI revolution arrived with a "Tokenization Tax." Big Tech models charge 13x more to speak Burmese, while the few working alternatives remain locked behind closed doors.

We didn't wait for permission. We started in 2016, building Burmese NLU in Yangon long before the ChatGPT hype. Our engine processed 100 million conversations for giants like Samsung and Unilever—proving it could be done when others ignored us.

Now, we are taking control. We are open-sourcing everything. Led by an EB-1A "Extraordinary Ability" founder—recognized by the U.S. for rising to the top of this field—we are building the public utility we wish we had a decade ago.

We aren't just building models. We are securing our digital heritage.

[ TIMELINE // ORIGIN_STORY ]
2016

Pre-LLM Era

Built from scratch in Yangon. Before transformers existed.

2022

LLM Revolution

The breakthrough was in English. Burmese was an afterthought.

2026

Sovereignty Era

Open-sourcing everything. We own our future.

DECADE OF BUILDING BATTLE-TESTED

Language Sovereignty.

Language access is a human right. Language sovereignty is how we protect it.

When your conversations flow through foreign APIs, you're renting access to your own language. We're building infrastructure that belongs to the community — open, private, permanent.

Every model we release. Every dataset we share. Yours to deploy on your terms.

No API lock-in

Apache 2.0 forever

Your data stays yours

No extraction

Runs offline

On your hardware

Fully customizable

Fine-tune for your needs

[ COMPARISON ]
Big Tech Own It
Terms change tomorrow Apache 2.0 forever
Your prompts train them Data stays yours
Requires internet Runs offline
One-size-fits-all Fine-tune it
Pricing increases Free forever

THE INFRASTRUCTURE.

What You Can Deploy Today

Available

Base Burmese LLM

Llama-2-7B fine-tuned on 52K Burmese instructions. GGUF quantized for local deployment.

PEFT fine-tuning Access File
Open Source

Echopod Companion

Chatbot-based tool for crowdsourcing translation data. How we build datasets with the community.

Crowdsourced data Access File
Published

Burmese Handwritten Digits

Myanmar's MNIST. 60K training samples, 27.5K test samples. Foundation for Burmese OCR.

OCR benchmark Access File
Live Tool

Tokenizer Comparison Tool

See the 1300% tokenization tax in action. Interactive visualization of how LLMs handle Burmese.

THE MOVEMENT.

This isn't just about one language. It's a blueprint for any community that wants to own its AI future.

In Development

Building

Burmese Foundation Model

Next-gen LLM, trained from the ground up for Burmese.

Building

Conversational Bilingual Dataset

Burmese-English parallel corpus, production-quality.

Research

1B Token Intent Dataset

Billion-token intent classification dataset.

Regional Expansion

Ethnic Minority Languages Ongoing
Khmer 16M speakers
Lao 7M speakers

Global Peers

We're not alone. Around the world, communities are building language AI they own.

Masakhane

AI for African languages, by African researchers. 2000+ languages.

Te Hiku Media

Protecting Māori data sovereignty in New Zealand.

Different languages. Same principle.

THE ASK.

This infrastructure exists because people contributed. Here's how you can be part of it.

For Developers

Deploy It

Run Burmese AI on your own infrastructure. Apache 2.0. Fork it. Fine-tune it. Ship it.

For Burmese Speakers

Contribute Data

Help build the datasets that train these models. Use Echopod to contribute translations.

Try Echopod

For Organizations

Partner With Us

Fund development. Collaborate on research. Bring this infrastructure to your community.

Contact Us