featured image

10 Milliseconds or Failure: Architecting a Real-Time Sensor Pipeline Across Two Threads Without Locking Up the UI

A deep dive into the engineering of a real-time IMU processing pipeline that runs on a worker thread while communicating with a Qt Quick UI on the main thread. Learn how to use `std::atomic` for lock-free shutdown signaling, Qt's queued signals for cross-thread communication, and self-correcting timing loops to maintain a 100 Hz sample rate without ever blocking the UI thread.

Published

Mon Apr 20 2026

Technologies Used

C++ IMU Qt
Advanced 31 minutes

The Impossible Constraint: 100 Hz Sampling and 60 FPS Rendering on One CPU

The Sentinel Fall Detector must do two things simultaneously that are fundamentally at odds. First, it must read sensor data from the I2C bus, run a Kalman filter, execute fall detection logic, and perform FFT-based gait analysis — all within a strict 10-millisecond window, 100 times per second, with no jitter. Second, it must render a Qt Quick dashboard with 3D rotations, animated status text, and a live force meter at a smooth frame rate. If the sensor loop blocks the UI thread, the dashboard freezes. If the UI blocks the sensor loop, readings are missed and the fall detection algorithm receives corrupted timing data.

This tutorial examines how imusensor.cpp solves this with a dedicated worker thread, std::atomic for lock-free shutdown signaling, careful cross-thread communication via Qt’s signal-slot mechanism, and a manual timing loop that self-corrects for processing overhead — all without a single mutex.

What You Need to Follow This Concurrency Deep Dive

Knowledge Base:

  • Solid understanding of C++ classes, lambdas, and member function pointers
  • Familiarity with threading concepts: what a thread is, why shared mutable state is dangerous, what a race condition means
  • Basic Qt knowledge: signals and slots, QObject, Q_PROPERTY
  • Understanding of the previous two tutorials (ring buffer and Kalman filter)

Environment:

  • C++17 compiler
  • Qt 6 (modules: Core, Gui, Quick)
  • Linux with I2C support (for actual hardware; the architecture principles apply universally)
  • CMake 3.16+ with CMAKE_AUTOMOC ON

Two Worlds, One Object: The Thread Architecture of ImuSensor

The central challenge is that ImuSensor is a single C++ object that must live in two worlds simultaneously: its methods are called from the main (UI) thread, but its sensor loop runs on a worker thread. These threads share data — the m_pitch, m_roll, and m_totalAccel member variables that the QML UI reads, and the m_running flag that controls the worker’s lifetime.

sequenceDiagram
    participant Main as Main Thread (UI)
    participant Worker as Worker Thread (Sensor)
    participant I2C as I2C Bus (Hardware)

    Main->>Worker: QThread::create(lambda) → start()
    Note over Main: UI renders at display refresh rate

    loop Every 10ms
        Worker->>I2C: write(register address)
        I2C-->>Worker: read(12 bytes: gyro + accel)
        Worker->>Worker: Kalman filter update
        Worker->>Worker: detectFall()
        Worker->>Worker: FFT gait analysis (every 128 samples)
        Worker->>Worker: Write m_pitch, m_roll (every 2nd sample)
        Worker-->>Main: emit pitchChanged() [queued signal]
    end

    Main->>Worker: m_running.store(false) [atomic]
    Worker-->>Main: Thread exits → wait() → delete

The analogy: Think of a factory with two departments. The assembly line (worker thread) runs at a fixed pace, stamping out a product every 10 milliseconds. The showroom (UI thread) displays the latest products to visitors at its own pace. A conveyor belt (Qt signals) carries finished products from the assembly line to the showroom. The factory’s emergency stop button (std::atomic<bool>) can be pressed by anyone and is read by the assembly line on every cycle — no need to stop the line to check it.

Line by Line Through the Pipeline: From Thread Launch to Graceful Shutdown

Step 1: Launching the worker thread

The sensor loop cannot run on the main thread because it contains a while(true) loop with a 10ms sleep. That would freeze the Qt event loop and make the UI unresponsive. Instead, startSensing() creates a new QThread with a lambda that captures this and runs processSensorLoop() on the new thread.

void ImuSensor::startSensing() {
    if (m_running) return;  // Prevent double-start

    if (!initI2C()) {
        emit statusUpdated("Error: I2C Init Failed");
        return;  // Never launch the thread if hardware isn't ready
    }

    m_running = true;  // Set BEFORE thread starts — the thread checks this immediately
    m_workerThread = QThread::create([this]{ processSensorLoop(); });
    m_workerThread->start();
    emit statusUpdated("Monitoring (Advanced)...");
}

🔴 Danger: Setting m_running = true before start() is critical. If the thread starts and enters the loop before m_running is set, it would see false and exit immediately. The std::atomic<bool> guarantees that the worker thread sees the true value even though it was written on a different thread — no mutex required.

Step 2: The timing-critical sensor loop

This is the heart of the system. The loop must complete all processing within 10 milliseconds. It measures its own execution time and sleeps only for the remaining time, self-correcting for variable processing overhead.

void ImuSensor::processSensorLoop() {
    while (m_running) {  // Atomic read — checked every iteration
        auto start = std::chrono::steady_clock::now();

        // --- All processing happens here (I2C, Kalman, detection, FFT) ---
        // [See below]

        // Self-correcting timing: sleep only for the remaining budget
        auto end = std::chrono::steady_clock::now();
        auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(
            end - start).count();
        int sleepTime = 10 - elapsed;
        if (sleepTime > 0) QThread::msleep(sleepTime);
    }
}

🔵 Deep Dive: std::chrono::steady_clock is used instead of system_clock because steady_clock is monotonic — it never jumps backward due to NTP adjustments or daylight saving changes. In a timing-critical loop, a backward time jump would produce a negative elapsed time and an excessively long sleep. steady_clock is guaranteed to only move forward.

Step 3: Raw I2C reads and unit conversion

The sensor provides 12 raw bytes: 6 for the gyroscope (X, Y, Z as signed 16-bit integers) and 6 for the accelerometer. Each pair of bytes must be combined into a signed integer, then scaled by the sensor’s sensitivity factor to produce physical units.

        char reg = LSM_OUTX_L_G;       // Register address for gyro X low byte
        write(i2c_file, &reg, 1);       // Tell the sensor which register to read from
        char data[12];
        if (read(i2c_file, data, 12) == 12) {  // Read all 12 bytes in one burst

            // Combine two bytes into a signed 16-bit integer (little-endian)
            int16_t gx_raw = (data[1] << 8) | (uint8_t)data[0];

            // Convert to physical units
            // Gyro: 2000 dps range → 70 mdps/LSB → convert to rad/s
            cur.gx = gx_raw * 0.070 * (M_PI / 180.0);

            // Accel: ±8g range → 0.244 mg/LSB → convert to g
            cur.ax = ax_raw * 0.000244;

Step 4: Throttled UI updates via cross-thread signals

The sensor loop runs at 100 Hz, but updating the UI at 100 Hz is wasteful — the display refreshes at 60 Hz and the QML property bindings have overhead. A simple counter throttles updates to 50 Hz (every second sample).

            // Only update QML-visible properties on even iterations
            if (m_analysisCounter++ % 2 == 0) {
                m_pitch = cur.pitch;
                m_roll = cur.roll;
                m_yaw += cur.gz * (DT * 2.0);
                m_totalAccel = cur.total_accel;

                // These signals cross the thread boundary
                emit pitchChanged();
                emit rollChanged();
                emit yawChanged();
                emit sensorUpdated();
            }

🔵 Deep Dive: When a signal is emitted from a thread different from the receiver’s thread, Qt automatically uses a queued connection. The signal’s arguments are copied into an event object and posted to the receiver thread’s event queue. The receiver’s event loop (in this case, the main thread’s QCoreApplication::exec()) processes it asynchronously. This means the worker thread is never blocked waiting for the UI to handle the signal — it posts the event and immediately continues to the next sensor read. The Q_PROPERTY values (m_pitch, etc.) are written by the worker thread and read by the main thread. Because they are double types (8 bytes, typically atomic on 64-bit architectures) and the writes are periodic with signal notifications, the worst case is that the UI reads a value from the previous cycle — a 10ms discrepancy that is invisible to the user.

Step 5: Graceful shutdown without deadlock

Stopping the system requires coordinating between two threads. The main thread sets the flag; the worker thread sees it, exits its loop, and the main thread joins it.

void ImuSensor::stopSensing() {
    m_running = false;  // Atomic write — worker will see this within 10ms
    if (m_workerThread) {
        m_workerThread->quit();     // Signal the thread's event loop to stop (if any)
        m_workerThread->wait();     // Block until the thread actually exits
        delete m_workerThread;      // Clean up the QThread object
        m_workerThread = nullptr;   // Prevent double-delete
    }
}

🔴 Danger: The wait() call is a blocking operation — the main thread stops until the worker finishes. If the worker is in the middle of a 10ms cycle, the main thread blocks for up to 10ms. This is acceptable during shutdown but would be catastrophic if called from within a rendering frame.

The Race Conditions That Don’t Exist (And the One That Almost Does)

Why std::atomic<bool> is sufficient

The m_running flag is the only synchronization primitive in the entire system. It works because the communication pattern is simple: one thread writes false exactly once, and the other thread reads it on every loop iteration. std::atomic guarantees that this read-write pair is free from data races without the overhead of a mutex.

A mutex-based approach would require the worker thread to lock, check the flag, unlock — 100 times per second. On a Raspberry Pi’s ARM Cortex-A76, an uncontended pthread_mutex_lock/unlock pair costs approximately 50–100 nanoseconds. The atomic load costs approximately 2–5 nanoseconds. Over a million iterations per day, this difference is negligible in absolute terms but eliminates an entire class of potential deadlock bugs.

The double-write on m_pitch and friends

The m_pitch, m_roll, m_yaw, and m_totalAccel properties are written by the worker thread and read by the main thread through Q_PROPERTY getters. These are plain double values — not std::atomic<double>. On most 64-bit architectures (including the Raspberry Pi 5’s ARM Cortex-A76), a naturally aligned double write is atomic at the hardware level. However, the C++ standard does not guarantee this.

In practice, the worst-case scenario is a torn read: the main thread reads m_pitch while the worker is mid-write, producing a value that is neither the old nor the new estimate. For a display-only property that is overwritten 50 times per second, a single corrupted frame is invisible. For safety-critical decisions (fall detection), the system never reads these properties — detectFall() operates on the local SensorData cur variable, which exists entirely within the worker thread’s stack.

The static variable in detectFall()

void ImuSensor::detectFall(const SensorData &cur) {
    // ...
    static int potentialFallTimeout = 0;  // ← Shared across ALL calls

The static local variable potentialFallTimeout persists across calls. Because detectFall() is only ever called from processSensorLoop(), which only ever runs on the worker thread, this is safe — there is only one caller. However, if detectFall() were ever called from a second thread (e.g., a unit test running in parallel), the static would become a shared mutable state without synchronization. A member variable (m_potentialFallTimeout) would be the safer design.

I2C file descriptor sharing

The i2c_file descriptor is opened on the main thread (in initI2C(), called from startSensing()) but used exclusively on the worker thread (in processSensorLoop()). This is safe because the open completes before the thread starts, establishing a happens-before relationship. If initI2C() were to be callable during active sensing, the file descriptor could be overwritten mid-read — a catastrophic race.

The timing budget under overload

If the processing within the loop takes more than 10ms (e.g., due to an I2C bus stall or OS scheduling delay), sleepTime becomes negative, and the if (sleepTime > 0) guard skips the sleep entirely. The next iteration starts immediately, which means the effective sample rate drops temporarily. The Kalman filter compensates because DT is a constant — but this means the filter’s time model is slightly wrong during overload frames. For brief overruns, the error is negligible. For sustained overruns, the orientation estimate would accumulate error proportional to the timing discrepancy.

You Now Know How to Build a Deadline-Driven Sensor Pipeline With Thread-Safe Communication

This tutorial covered the engineering required to run a real-time processing pipeline alongside a responsive GUI on resource-constrained hardware. You learned how std::atomic eliminates mutexes for simple flag-based coordination, why steady_clock is non-negotiable for timing loops, how Qt’s queued signal connections provide zero-blocking cross-thread communication, and where the boundaries of thread safety lie when sharing plain data members between threads. These patterns — dedicated worker threads with atomic shutdown, self-correcting timing loops, throttled UI updates — appear in any system that bridges the gap between deterministic processing and interactive display: robotics control, audio engines, game physics, and industrial monitoring.

We respect your privacy.

← View All Tutorials

Related Projects

    Ask me anything!