Traverba: A Flutter App Running 3 ASR Models, a Local LLM, and Camera OCR — All On-Device

I built Traverba — a real-time translator that runs 100% on your phone with no cloud APIs. Voice recognition, translation across 108 languages, camera OCR in 92 scripts, screen translation overlay, file transcription, and offline group chat over Bluetooth for up to 7 people.

The entire app is built with Flutter + native platform code for the heavy lifting. Here is what the app does, why Flutter was the right choice, and the biggest challenges I hit shipping it to production on both iOS and Android.

## What the App Does
Traverba has five core features, all running offline:

**Live Voice Translation** — Speak in one language, hear and read the translation in another. Three modes: in-app, floating system audio capture (translates audio from Zoom/Teams/Meet), and floating microphone. Auto read-aloud in the target language.

**Camera Translation** — Point your camera at a menu, sign, form, or document. Two OCR engines: Fast mode (ML Kit) for quick reads, Thorough mode (PaddleOCR) for complex scripts like CJK, Arabic, Cyrillic, and Devanagari. Supports 92 languages. The app replaces foreign text with translated text directly on the image.

**Screen Translation** — A floating overlay that translates text inside any app. Watching anime, reading Korean news, playing a Japanese game, or receiving a WeChat message? Tap the overlay button and the text is translated in place. No app switching.

**File Transcription** — Import audio or video files (mp3, m4a, wav, mp4) and get timestamped, side-by-side source + translated transcripts.

**Offline Group Chat** — Up to 7 people connect via Bluetooth mesh. Each person speaks their language. Every message is translated into every participant’s language in real time. No internet. No server. Each phone handles its own ASR and translation independently.

All of this runs locally on the phone’s processor. No API keys, no server costs, no data leaving the device.

## Why Flutter

I evaluated Flutter, React Native, and fully native (separate Swift + Kotlin codebases) at the start.

**Cross-platform from a single codebase was non-negotiable.** A solo developer maintaining two native codebases for an app this complex was not realistic. The Dart UI layer — settings, chat bubbles, transcript views, onboarding tour, download sheets, language pickers — is roughly 60% of the total code. Writing that twice would have doubled the project timeline.

**Flutter’s platform channel system made native integration practical.** The AI-heavy work (ASR inference, LLM translation, OCR, Bluetooth mesh, floating overlays) runs in native Swift/Kotlin through platform channels and method channels. Flutter does not try to be the AI runtime — it is the UI and orchestration layer that connects the native engines.

**Performance was good enough.** The concern with Flutter for this kind of app is overhead — every millisecond matters when you are chaining ASR → translation → TTS in a conversation. In practice, the Flutter layer adds negligible latency because the heavy computation happens in native code. The Dart layer handles state management, routing, and UI updates, which it does well.

**Hot reload accelerated UI iteration dramatically.** With five distinct feature surfaces (voice, camera, screen, file, group chat), each with multiple states and modes, the ability to iterate on UI without rebuilding was a significant productivity multiplier.

## The Architecture: Flutter as Orchestrator

The app follows a clear split:

**Flutter/Dart layer handles:**

- All UI (Material 3, adaptive layouts for phone and tablet)

- State management and feature routing

- Language/locale selection and user preferences

- Download management for optional models

- Coin economy and subscription gating

- Onboarding guided tour

**Native layer handles (via platform channels):**

- ASR inference (sherpa-onnx / whisper.cpp)

- LLM translation (Gemma via on-device runtime)

- Traditional ML translation (ONNX NMT models)

- Camera OCR (ML Kit + PaddleOCR)

- Text-to-speech

- Bluetooth mesh for group chat

- Floating overlay (screen translation + system audio capture)

- Memory budget management and model loading/eviction

The bridge between these layers is a set of platform channels with well-defined contracts. The Dart side sends commands (“start ASR for Cantonese,” “translate this text from Japanese to English,” “capture screen and OCR”). The native side returns results asynchronously.

## Three ASR Models, Automatic Routing
No single speech recognition model covers all languages well. Rather than accepting poor quality for some languages, the app ships three specialized models:

- **Parakeet** English-optimized. Bundled with the app for instant English ASR with no download.

- **Whisper.cpp** Broad multilingual coverage across 30+ languages.

- **Qwen3-ASR** CJK-optimized with dedicated Cantonese support.

When the user selects a source language, the Dart routing layer picks the best model and tells the native side which engine to activate. The user never sees or manages model selection.

The tricky part is memory. These models range from 500MB to 1.9GB in RAM. Loading all three simultaneously would crash most phones. A memory budget planner (written in Dart, querying native memory stats) tracks what is loaded and evicts the least-recently-used model when a new one needs to load. A backend preference store remembers which compute backend (NPU, GPU, CPU) worked on each specific device, so subsequent runs skip failed backends.

## 108 Languages via Two Translation Tiers

Translation happens through two paths:

**Tier 1 — Traditional NMT models (61 languages):** Fast, low memory, good quality for well-resourced language pairs. ONNX models running via sherpa-onnx on both platforms.

**Tier 2 — On-device Gemma LLM (47 additional languages):** For languages without dedicated NMT models (Yoruba, Amharic, Lao, Myanmar, and others), a quantized Gemma 2B model handles translation locally. Slower than Tier 1, but it extends coverage to languages that traditional models serve poorly.

The Dart layer manages which tier to use based on the language pair. The user sees a unified experience.

## Camera OCR: Dual Engine, 92 Languages
Camera translation runs two engines selectable by the user:

**Fast mode** uses Google ML Kit’s on-device text recognition. Quick, handles Latin, Cyrillic, and CJK well. Good for real-time viewfinder use.

**Thorough mode** uses PaddleOCR running on-device. Better accuracy for complex layouts, mixed scripts, curved text, and low contrast. Uses a coin gate (5 coins per capture) since it is more compute-intensive.

One lesson learned the hard way: iOS only bundles the Latin OCR script model by default. CJK, Devanagari, and Arabic require explicitly adding script-specific pods to the Podfile. The failure is completely silent — ML Kit returns empty results for unsupported scripts with no error. If you are building multilingual OCR on iOS with ML Kit, check your Podfile.

## Floating Overlay: Native, Not Flutter
The screen translation and system audio features run as native overlays outside of Flutter’s rendering surface. On Android, this is a system overlay service written in Kotlin. On iOS, it uses screen capture APIs in Swift.

These overlays communicate back to the Flutter translation engine via platform channels. The architectural challenge is lifecycle management — the overlay must stay alive while the main Flutter activity may be backgrounded.

One critical lesson: on iOS, native bridge instances must be strongly retained (static or held by a singleton). If the bridge is an instance variable that nothing retains, ARC will garbage-collect it and all handler closures silently become no-ops. I lost days to a download progress indicator stuck at 0% because the bridge was being collected. No crash, no error, just silent failure.

## Bluetooth Group Chat

The offline group chat uses a Bluetooth mesh protocol. Each phone acts as both receiver and relay. Messages broadcast to all connected devices. Each device runs its own ASR and translation independently — there is no “host” or “server” phone.

Flutter handles the chat UI, message state, and participant management. The Bluetooth stack is fully native (CoreBluetooth on iOS, Android Bluetooth API) bridged through platform channels.

Total pipeline latency (voice → ASR → translate → broadcast → display) is about 1 second. The Dart UI uses optimistic display (show “translating…” immediately on send) so the conversation feels responsive.

## Numbers

- **Languages:** 108 total (61 ML + 47 AI)

- **ASR languages:** 30 on-device speech recognition

- **OCR languages:** 92 camera text recognition

- **App base size:** ~150MB + downloadable models

- **Platforms:** iOS + Android from single Flutter codebase

- **Dart/Flutter code:** ~60% of total codebase (UI, state, routing, business logic)

- **Native code:** ~40% (ASR, LLM, OCR, Bluetooth, overlays, memory management)

- **Team size:** 1

## What Flutter Got Right for This Project

**Single codebase for complex UI.** Five feature surfaces, multiple modes each, download sheets, onboarding tour, adaptive layouts — writing this once instead of twice was the difference between shipping and not shipping as a solo developer.

**Platform channels are well-designed.** The bridge between Dart and native code is clean, well-documented, and reliable. For an app where the heavy compute is native but the user experience is Flutter, this is exactly the right architecture.

**Hot reload for rapid iteration.** Tuning layouts, testing states, iterating on UX flows — hot reload cut iteration time from minutes to seconds across hundreds of UI changes.

**The ecosystem is mature.** State management (Riverpod), navigation, localization (117 locales), adaptive design — the Flutter ecosystem has production-quality solutions for all of these.

## What Was Hard

**Memory pressure is Flutter’s blind spot.** Flutter does not expose fine-grained memory control. When you are co-loading multiple native ML models alongside the Flutter engine, you need to manage memory at the native level and surface it back to Dart. There is no built-in Flutter API for “how much RAM is my app using” or “what is the system memory pressure.”

**Native overlay lifecycle.** Floating overlays that operate outside the Flutter surface are architecturally complex. The Flutter activity may be backgrounded while the overlay is active. Keeping the bridge alive, managing state synchronization, and handling platform-specific lifecycle events required significant native code that Flutter cannot abstract.

**iOS build complexity.** CocoaPods with ML Kit script pods, entitlements (increased-memory-limit for co-loading ASR + LLM), App Tracking Transparency timing, code signing — the iOS build pipeline has many moving parts that Flutter’s tooling does not fully manage.

## Try It

Traverba is free on both platforms. Voice, camera, screen translation, and text translation are all free and unlimited. Premium ($1.99/month) unlocks group chat, file transcription, and meeting summaries.

Available on [Google Play](https://play.google.com/store/apps/details?id=com.anthropos.traverba) and [App Store](https://apps.apple.com/app/traverba-translator/id6738243109). Learn more at [traverba.com](https://www.traverba.com).