featured image

Mindfulness Coach: Building a Fully Offline, On-Device AI Coaching App with React Native and Google MediaPipe

A deep dive into the architecture and engineering of a privacy-first mobile mindfulness coach that runs entirely on-device using React Native and Google MediaPipe's on-device LLM capabilities.

Published

Thu Oct 30 2025

Technologies Used

React Native Expo Typescript MediaPipe
View on GitHub

Live Demo

Loading video...

A Personal Wellness Coach That Lives Entirely on Your Phone

This is a production-grade React Native application that delivers an interactive mindfulness coaching experience — no internet connection required, no data ever leaving the device. Built for wellness-conscious users who value privacy, the app pairs an on-device large language model with a philosophically grounded coaching persona rooted in both Buddhist and Stoic traditions. Users can hold multi-session conversations, pick up context from past sessions, and receive streaming, adaptive responses — all from a model running locally on their mobile hardware.

The Privacy Gap Nobody Talks About in Wellness Tech

Most AI-powered wellness apps treat sensitive user data as an acceptable trade-off for capability. Your moments of anxiety, grief, or vulnerability are transmitted to a remote server, logged, potentially used for training, and subject to data breaches you’ll never hear about. For users navigating genuine emotional difficulty, that trade-off is quietly corrosive to trust. There’s also the connectivity problem: the moments when someone most needs a grounding exercise — on a flight, in a rural area, during a network outage — are exactly the moments a cloud-dependent app goes dark. Building a coaching tool that is both private-by-design and always-available isn’t a nice-to-have; it’s the only defensible product decision.

Your Coach, On Your Terms: What the App Actually Does

  • Persistent, multi-session conversations — Users build a relationship with the coach over time. Each conversation is stored locally and retrievable from a chat history view, with auto-generated titles and session previews so past contexts are never lost.

  • Streaming, adaptive responses — The coach’s replies appear word-by-word in real time, mimicking the natural cadence of conversation and dramatically reducing the perception of latency on a resource-constrained device.

  • Contextually aware coaching — The prompt engine dynamically adjusts the coach’s posture based on detected user needs: topic emphasis (anxiety, focus, grief, relationships), time-of-day context, and the user’s self-reported emotional state. The persona fluidly blends Buddhist mindfulness principles with Stoic frameworks depending on what the moment calls for.

  • Quick-action guided exercises — Pre-built coaching prompts let users instantly launch structured exercises (breathing practice, reflection prompts, body scans) without needing to know how to phrase a request.

Engineering the Stack: Why Every Choice Points Back to the Device

Part A — The Stack

LayerTechnology
FrameworkReact Native via Expo (managed + EAS builds)
NavigationExpo Router (file-system routing, drawer + tabs)
On-Device LLMexpo-llm-mediapipe (Google MediaPipe, Gemma 1B ~1.5 GB quantized)
Storagereact-native-mmkv (synchronous, C++ key-value store)
UINativeWind (Tailwind CSS) + Gluestack UI component library
AnimationsReact Native Reanimated + Legendapp Motion
StateReact Context + custom hooks (no Redux/Zustand)
TestingJest with react-native preset, @testing-library/react-native
Build/DeployEAS (Expo Application Services)

Part B — The Decision Matrix

expo-llm-mediapipe over a remote API. The entire product thesis is offline-first privacy. Using a remote LLM endpoint would be architecturally incoherent with that goal. MediaPipe’s on-device inference pipeline, combined with a quantized Gemma 1B model, delivers acceptable latency on mid-range Android hardware while keeping the binary footprint manageable. The trade-off — a ~1.5 GB one-time model download — is consciously surfaced in a dedicated model-setup onboarding flow rather than hidden from the user.

react-native-mmkv over AsyncStorage. Chat history retrieval and settings reads happen on the main thread during navigation transitions. AsyncStorage’s asynchronous, JS-bridge-dependent I/O introduces frame drops at exactly those moments. MMKV’s synchronous C++ implementation eliminates that class of jank entirely. For an app whose UX depends on feeling calm and responsive, this is not a micro-optimization — it’s a user experience decision.

Expo Router + Context over a global state library. The screen graph is relatively shallow (drawer → chat, history, settings, model-setup). Reaching for Redux or Zustand would introduce unnecessary indirection. Instead, two root-level React Contexts (LLMContext, AppInitializationContext) provide global LLM state without prop drilling, while each screen composes purpose-built hooks (useChat, useModelManager, useChatHistory). The architecture scales naturally to the actual complexity of the problem.

Taming a Native Streaming API That Doesn’t Know When It’s Done

The most technically subtle challenge in the project lives inside the LLM inference layer. The expo-llm-mediapipe native module fires incremental partial-response events as the model generates tokens, but it does not emit an explicit “generation complete” signal. This creates a fundamental ambiguity: from JavaScript’s perspective, a silent event bus is indistinguishable from a model that has genuinely finished generating versus one that is simply pausing mid-thought.

The solution is a self-healing completion heuristic. A periodic check monitors the timestamp of the most recently received token. If no new token has arrived within a two-second window, the system treats generation as complete, resolves the pending response promise, and commits the full streamed content to the message store. This interval-based observer pattern runs alongside a global timeout guard and an abort controller, giving the system three independent levers — natural completion, silence detection, and user-initiated cancellation — to cleanly terminate a generation without leaving the UI in a hung state. The entire flow is wrapped in retry logic that distinguishes between retriable transient failures and terminal errors (out-of-memory, model not initialized), ensuring the app degrades gracefully rather than crashing silently.

What Building This Actually Taught Me

Privacy is an architecture decision, not a checkbox. The choice to run inference on-device cascades into nearly every other technical decision in the project — the storage engine, the model format, the build pipeline, the UX of the onboarding flow. Committing to a privacy constraint early forced more rigorous thinking than any feature requirement would have.

The gap between “works on my machine” and “ships on device” is where the real engineering lives. React Native’s native module ecosystem is mature but unforgiving. The moment a library requires a custom dev client — as expo-llm-mediapipe and react-native-mmkv both do — the entire local development and CI/CD workflow must be reconsidered. EAS build profiles, simulator vs. physical device testing strategies, and native module mocking for unit tests all become first-class concerns rather than afterthoughts.

Streaming UX is a product feature disguised as an engineering problem. The decision to stream tokens to the screen rather than wait for a complete response wasn’t just a latency optimization — it fundamentally changes how the interaction feels. A 4-second wait for a complete response feels slow. The same 4 seconds spent watching text appear feels like thinking. The psychological dimension of the streaming implementation is as important as the technical one.

Where This Goes Next

MediaPipe sensor integration. The architecture already abstracts the sensing layer in anticipation of real-time posture and gesture detection. The highest-value next step is wiring actual MediaPipe pose landmark data into the prompt engine, so the coach can respond to detected physical cues — slumped posture, shallow breathing patterns, prolonged stillness — and offer adaptive prompts without the user needing to self-report their state.

Platform-native model optimization. The current model pipeline targets a generic quantized format. Separate Android (NNAPI / GPU delegate) and iOS (Core ML / Metal) inference paths would unlock hardware acceleration on flagship devices, pushing response latency down to the sub-two-second range that makes conversation feel truly natural.

Personalization and session analytics. The data to build a longitudinal coaching profile already exists in the local message store. A lightweight on-device analytics pass — tracking recurring topics, session frequency, and emotional patterns over time — would enable the coach to proactively reference past conversations and adapt its style to individual users, moving the product from a stateless chatbot toward a genuinely relational coaching tool.

Try It Out

Check out the source code on GitHub.

We respect your privacy.

← View All Projects

Related Tutorials

    Ask me anything!