featured image

The Native Module That Never Says "Done": Coordinating Async Events, Refs, and React State in a Streaming LLM Hook

A deep dive into the architecture of a React hook that integrates with a native streaming LLM module lacking completion signals. Learn how to use refs for mutable state, manage event listener lifecycles, and coordinate with React's rendering model to build a leak-proof, responsive streaming experience.

Published

Sun Nov 02 2025

Technologies Used

React Native LLM
Advanced 31 minutes

Purpose

The Problem

You’re integrating a native streaming LLM module into a React hook. The module fires events for each token it produces, but — crucially — it never fires a “generation complete” event. Your hook must:

  1. Accumulate streaming tokens into a growing response string
  2. Detect when generation has silently finished
  3. Support user-initiated cancellation mid-stream
  4. Clean up native event listeners on every possible exit path (completion, cancellation, error, component unmount)
  5. Surface the right React state (generating, idle, error) to the UI without triggering redundant re-renders during the streaming hot path

Get any of these wrong and you either leak native memory, leave the UI stuck in a generating state forever, or flood the React reconciler with hundreds of state updates per second.

The Solution

We’ll dissect hooks/useLLM.ts — the hook that drives every LLM interaction in the app. It bypasses a broken SDK-provided hook entirely and talks directly to the ExpoLlmMediapipe native module. This is a masterclass in when to reach for useRef instead of useState, how to manage event listener lifecycles in async contexts, and how to coordinate a native event emitter with a Promise-based async function.

You’ll understand why refs outperform state in streaming contexts, how to write a listener-cleanup system that is leak-proof across all control flow paths, and how to reason about React’s rendering model when integrating with non-React async systems.

What You Must Already Know: Refs, the Event Loop, and Native Bridge Concepts

Knowledge Base:

  • useRef vs useState — when each triggers re-renders and why that matters
  • useCallback and useMemo for stable function references
  • JavaScript’s event loop, microtask queue, and macrotask queue
  • The React Native native module event emitter pattern (addListener / remove)
  • AbortController and the Fetch/async cancellation API
  • useEffect cleanup functions

Environment (from package.json):

react                       19.1.0
react-native                0.81.5
expo-llm-mediapipe          ^0.6.0
react-native-reanimated     ~4.1.0
typescript                  ~5.9.2

Five Moving Parts, One Coherent System: The State Machine Behind the Hook

stateDiagram-v2
    [*] --> idle : hook mounts
    idle --> initializing : initialize(modelName) called
    initializing --> idle : model handle acquired, isReady = true
    initializing --> error : native module throws
    idle --> generating : generateResponse() called
    generating --> idle : generation complete (Promise resolves)
    generating --> idle : stopGeneration() called (abort signal)
    generating --> error : native error event received
    error --> idle : initialize() called again / retry
    idle --> [*] : component unmounts, cleanup fires
    generating --> [*] : component unmounts, cleanup fires (in-flight request abandoned)

Analogy: Think of the hook as an air traffic controller managing a single runway. The model handle (modelHandleRef) is the runway — once acquired, it’s reused for every flight (inference request). Each inference request gets a unique flight number (requestId). Radio communications (token events) from the runway come over the scanner (addListener). When a flight lands (Promise resolves), the controller tears down that flight’s radio channel and marks the runway available again. If the controller’s tower is destroyed (component unmounts), all active channels are cut regardless of whether a flight is in progress.

Twelve Weeks of Bugs Prevented: Walking Through the Hook’s Critical Decisions

Decision 1: Refs for Mutable Values, State for Render-Triggering Values

The hook opens with a deliberate separation of concerns:

export function useLLM(): UseLLMReturn {
  // STATE: These values drive UI re-renders — they MUST be state
  const [inferenceState, setInferenceState] = useState<InferenceState>('idle');
  const [error, setError] = useState<string | null>(null);
  const [isReady, setIsReady] = useState(false);
  
  // REFS: These values are needed by async callbacks but should NEVER
  // trigger re-renders when they change. Putting these in state would
  // cause the component to re-render on every token event.
  const modelHandleRef = useRef<number | null>(null);
  const requestIdCounterRef = useRef(0);
  const abortControllerRef = useRef<AbortController | null>(null);
  const partialListenerRef = useRef<NativeModuleSubscription | null>(null);
  const errorListenerRef = useRef<NativeModuleSubscription | null>(null);

🔵 Deep Dive: During streaming, the native module fires an onPartialResponse event for each token — potentially hundreds of times per second. If modelHandleRef were useState, each assignment inside the event callback would schedule a re-render. At 30 tokens/second, that’s 30 reconciliation passes per second for a value the UI doesn’t even display. useRef stores a mutable box whose .current value can be read by any closure (including async ones and event listeners) without the assignment ever notifying React.

Decision 2: The Unmount Safety Net

Before any generation logic, the hook installs a cleanup effect that runs when the component unmounts:

useEffect(() => {
  // This cleanup function runs ONLY when the component unmounts.
  // It's the last line of defense against leaked native resources.
  return () => {
    if (modelHandleRef.current !== null) {
      // releaseModel is async — we call it as fire-and-forget here
      // because we can't await inside a cleanup function.
      ExpoLlmMediapipe.releaseModel(modelHandleRef.current).catch(err => 
        console.error('[useLLM] Error releasing model:', err)
      );
    }
    // Remove all event listeners — these hold references to JS closures
    // that in turn hold references to this hook's scope. Without removal,
    // the garbage collector cannot reclaim any of it.
    if (partialListenerRef.current) {
      partialListenerRef.current.remove();
    }
    if (errorListenerRef.current) {
      errorListenerRef.current.remove();
    }
  };
}, []); // Empty deps: run cleanup only on unmount

Decision 3: Request IDs as Event Filters

The native module is a global event emitter — every onPartialResponse event goes to every listener in the app. Without filtering, a second inference request could receive events from a previous one:

const generateResponse = useCallback(async (messages, options, onToken) => {
  // Monotonically increasing counter stored in a ref — no re-render on increment
  const requestId = ++requestIdCounterRef.current;

  partialListenerRef.current = ExpoLlmMediapipe.addListener(
    'onPartialResponse',
    (event: PartialResponseEventPayload) => {
      // DOUBLE FILTER: request ID must match AND model handle must match.
      // The handle check guards against a race where the hook re-initializes
      // with a new model before the old request's events have all fired.
      if (event.requestId !== requestId || event.handle !== modelHandleRef.current) return;

      // Abort check: if stopGeneration() was called, the signal is already
      // aborted. We still receive events until the native side processes the
      // cancellation — silently discard them.
      if (abortControllerRef.current?.signal.aborted) return;

      // Accumulate chunk into the full response string (in closure scope)
      fullResponse += event.response;
      
      // Pass the CHUNK (not the full accumulated string) to the UI callback.
      // The UI is responsible for appending to its own display buffer.
      onToken(event.response);
    }
  );

Decision 4: Awaiting a Promise That Resolves When the Native Side Completes

Unlike the LLMService class which uses a silence-heuristic for completion detection, the useLLM hook relies on generateResponseAsync itself resolving when generation is finished:

  // This await BLOCKS until the native module signals completion.
  // While awaiting, the event listener above is firing with individual tokens.
  // Both things happen "concurrently" because event callbacks run between
  // microtask queue flushes — the await suspends this function, giving the
  // event loop room to process incoming token events.
  await ExpoLlmMediapipe.generateResponseAsync(
    modelHandleRef.current,
    requestId,
    prompt
  );

  // Execution resumes here AFTER all token events have fired.
  // fullResponse now contains the complete generated text.
  
  // Clean up listeners immediately — don't leave them attached
  if (partialListenerRef.current) {
    partialListenerRef.current.remove();
    partialListenerRef.current = null;
  }

  setInferenceState('idle');
  return fullResponse;

Decision 5: The Cleanup-on-Error Guarantee

The catch block must mirror the happy-path cleanup exactly — the most common source of listener leaks is a cleanup path that only runs on success:

  } catch (err) {
    // This block runs for: network errors, OOM, model corruption, timeouts.
    // ALL of them must clean up listeners — otherwise they stay attached
    // until the component unmounts, receiving events from future requests.
    if (partialListenerRef.current) {
      partialListenerRef.current.remove();
      partialListenerRef.current = null;
    }
    if (errorListenerRef.current) {
      errorListenerRef.current.remove();
      errorListenerRef.current = null;
    }
    
    setError(err instanceof Error ? err.message : 'Failed to generate response');
    setInferenceState('error');
    throw err; // Re-throw so the caller (useChat) can handle it
  }

Why This Hook Can Handle 30 Tokens/Second Without Janking the UI: The JS Event Loop and React’s Batching

🔵 Deep Dive: When generateResponseAsync is awaited, the JavaScript engine suspends the generateResponse function’s execution and yields back to the event loop. Native token events arrive as macrotasks (via the native-to-JS bridge). The event loop processes them one at a time, calling the onPartialResponse callback for each.

Inside the callback, fullResponse += event.response updates a closure variable, not React state. This means no re-render is scheduled. Only onToken(event.response) crosses back into the caller’s concern — and in useChat, that callback calls setStreamingMessage(prev => prev + token), which does call setState.

React 19’s automatic batching means multiple setState calls within the same macrotask (which all token callbacks for a given JS turn would be) are batched into a single re-render. This is why streaming feels smooth: React consolidates the token updates rather than re-rendering for each one.

Memory Perspective: The event listener closure captures requestId, onToken, and the fullResponse mutable variable. As long as the listener is attached, none of these can be garbage collected. For a 1000-token response, fullResponse grows to roughly 4-6 KB — trivial. The real risk is the listener itself staying attached indefinitely if cleanup is missed, keeping that entire closure scope alive.

Four Ways This Can Break and How to Prevent Each One

1. The “ghost listener” race condition: If generateResponse is called again before the first call’s cleanup runs (possible if the user sends a second message extremely quickly), two listeners for different requestId values will be active simultaneously. The requestId filter prevents cross-contamination, but both listeners are attached.

🔴 Danger: The hook doesn’t guard against concurrent generateResponse calls. A defensive implementation would check if (inferenceState === 'generating') throw new Error('Already generating') at the top of generateResponse — which LLMService does with its isGenerating boolean guard.

2. Component unmount during active generation: If the component unmounts while await generateResponseAsync is suspended, the useEffect cleanup runs, removing the listeners. The native module continues generating, but events are silently discarded. The awaited Promise from the unmounted context is effectively abandoned — it will resolve or reject, but the .then / .catch handlers reference the now-unmounted hook’s scope. In React’s StrictMode (development), effects run twice, which can cause the cleanup to fire prematurely and leak the second listener registration.

3. Native module handle mismatch: If initialize() is called again while a generation is in progress (e.g., the user navigates to model settings and re-initializes), modelHandleRef.current is overwritten with a new handle number. The existing listener’s double-filter (event.handle !== modelHandleRef.current) would then reject all events from the in-flight request, leaving generateResponse awaiting forever.

🔴 Danger: Always call cleanup() (which calls stopGeneration()) before calling initialize() again.

4. AbortController without native-side cancellation: stopGeneration() sets the abort signal, which causes the listener to silently discard future events. However, the native module is not told to stop — it keeps generating and firing events (which are now discarded). This wastes battery and CPU until the native module naturally completes. A production implementation would call a native cancelGeneration(requestId) method if the module exposes one.

You Can Now Integrate Any Streaming Native Module Into React: The Async-Native Coordination Skill

You’ve mastered the hardest class of React Native integration problem: coordinating a fire-and-forget native event emitter with Promise-based async functions and React’s rendering model. The skills you now have:

  • When to use useRef vs useState — and why the streaming hot path demands refs
  • How to write listener cleanup that is leak-proof across success, error, cancellation, and unmount
  • Why request IDs are mandatory when sharing a global native event emitter
  • How React 19’s automatic batching interacts with streaming token callbacks
  • How to reason about race conditions between multiple async call paths sharing mutable ref state

This pattern is directly transferable to any React Native integration with a streaming native module: audio processing, video frame pipelines, Bluetooth device events, or any other native system that produces a stream of events without an explicit termination signal.

We respect your privacy.

← View All Tutorials

Related Projects

    Ask me anything!