Automating AI Agent Diagnostics with n8n on TrueNAS Scale

When you’re running Ollama or any other local LLM container, silent failures are the worst possible failure mode. A GPU memory overallocation can quietly force the model to fall back to CPU inference — your server’s performance tanks, your queries take ten times as long, and nothing in the UI tells you it happened. You only notice days later when you wonder why everything feels slow.

The fix isn’t Prometheus and Grafana. For a home lab, that’s overkill. What I wanted was a lightweight workflow that checks the logs every five minutes, looks for known failure signatures, and pings Discord if it finds one. That’s n8n’s wheelhouse.

This tutorial walks through deploying n8n on TrueNAS Scale with a PostgreSQL backend, and building a workflow that SSH-es into the host, scrapes Ollama’s logs, parses them in a sandboxed JavaScript environment, and fires a Discord webhook on failures.

What You’re Working With

Host OS: TrueNAS Scale (Debian-based) with a ZFS pool for dataset isolation
Orchestration: Docker via TrueNAS Apps or Dockge
Workflow Engine: docker.n8n.io/n8nio/n8n:latest
Database: postgres:15-alpine for execution state storage
Monitoring Target: A running ollama container
Alert Endpoint: A Discord webhook URL

You need to be comfortable with local networking, basic Docker Compose, and JavaScript at a beginner level.

Event-Driven Monitoring Instead of Constant Polling

The architecture I’m running is simple on purpose. Every five minutes, a cron trigger fires, SSH-es into TrueNAS, pulls the last 500 lines of Ollama’s logs, and passes them through a JavaScript parser. If the parser finds any known failure keywords, it formats a payload and fires a Discord webhook. If nothing matches, execution halts immediately — no downstream nodes run, no database writes accumulate.

graph TD
    A[Cron Node: 5 Min Interval] -->|Triggers| B(SSH Node: Authenticate to TrueNAS)
    B -->|Executes| C[docker logs --tail 500 ollama]
    C -->|Passes stdout| D{V8 Sandbox Code Node}
    D -->|Parses String| E{Regex/Keyword Match}
    E -->|No Match found| F[Return Empty: Halt Execution]
    E -->|Match Found| G[Format JSON Payload]
    G -->|POST Request| H((Discord Webhook Node))

Think of it as a night watchman that only radios headquarters when something is actually wrong — not one that narrates every quiet hour of a shift.

Phase 1: Persistent ZFS Datasets and Infrastructure

If the n8n container is rebuilt, I can’t lose stored credentials, saved workflows, and execution history. All of that needs to live on ZFS, not inside Docker’s internal storage.

The Docker Compose below maps everything that matters to explicit host paths:

version: '3.8'

services:
  n8n:
    image: docker.n8n.io/n8nio/n8n:latest
    container_name: n8n_core
    restart: unless-stopped
    ports:
      - "5678:5678"
    environment:
      - N8N_HOST=10.99.0.191 
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - NODE_ENV=production
      - WEBHOOK_URL=http://10.99.0.191:5678/
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=n8n_db
      - DB_POSTGRESDB_USER=n8n_user
      - DB_POSTGRESDB_PASSWORD=${DB_PASSWORD}
      - N8N_ENCRYPTION_KEY=${ENCRYPTION_KEY}
    volumes:
      - /mnt/tank/configs/n8n/data:/home/node/.n8n
    depends_on:
      - n8n_db

The N8N_ENCRYPTION_KEY matters more than it looks. n8n stores your SSH passwords and API keys in its PostgreSQL database, and it encrypts them at rest with this key. If you lose the key, you lose access to every credential in every workflow — you’d have to re-enter all of them. Store it somewhere permanent before you ever run this.

Phase 2: Building the Workflow

Once n8n is accessible at http://10.99.0.191:5678, create a new workflow with three core nodes:

1. Schedule Trigger: Set the interval to every 5 minutes.

2. SSH Node: Create a credential with your TrueNAS IP, username, and a password or private key. The command to run:

docker logs --tail 500 ollama-volume-1

Don’t run docker logs ollama without --tail. Container logs can grow to gigabytes. Pulling the full log file into an n8n node will exhaust the container’s heap memory and crash your orchestration engine. Always limit scope with --tail.

3. Code Node: This is where the actual logic lives.

Phase 3: The Log Parser in the V8 Sandbox

The naive approach would be to stringify the entire SSH output and run a single regex. This works, but it destroys the JSON structure and breaks if the SSH node ever returns metadata alongside stdout.

The cleaner approach: map over the input items, extract stdout specifically, then check against a keyword list:

// Extract standard output from the SSH node's JSON payload
const logItems = $input.all();
const logInput = logItems.map(item => JSON.stringify(item.json)).join(' ');

// Known failure signatures from Ollama's log output
const fallbackKeywords = [
    "falling back to cpu",
    "defaulting to cpu",
    "out of memory",
    "no compatible gpu",
    "could not allocate gpu memory"
];

// .some() short-circuits as soon as a match is found
const isFailing = fallbackKeywords.some(keyword => logInput.toLowerCase().includes(keyword));

if (isFailing) {
    return [{
        json: {
            alert: true,
            message: "CRITICAL: Ollama has fallen back to CPU inference or encountered GPU memory issues."
        }
    }];
}

// Returning an empty array halts workflow execution in n8n
return [];

The empty array return is the key behavior. When a Code Node returns [], n8n considers the execution complete with no output — nothing downstream runs. No Discord message, no database write. This is intentional: the workflow only does work when there’s actually something to report.

How n8n Runs This JavaScript

n8n is built on Node.js. When the execution engine hits a Code Node, it doesn’t eval() your script — that would expose the host system to arbitrary command execution. Instead, it uses Node’s vm module to create an isolated V8 JavaScript context.

The sandbox injects a specific API: $input, $json, and $('node_name') methods. Your code has no access to the TrueNAS filesystem, can’t make external HTTP requests (unless NODE_FUNCTION_ALLOW_EXTERNAL is set), and can’t read environment variables. The output must strictly follow the [{ json: { ... } }] structure — if you return anything else, the pipeline breaks.

The database angle matters for performance. n8n persists execution state in PostgreSQL. If you let an unfiltered 50 MB log file pass through every node, that’s stored on every run. At a 5-minute interval, that’s over 14 GB of database writes per day, which will eventually stall ZFS I/O. The --tail limit and the empty array early-exit keep the stored payloads small.

Three Things That Will Break This in Production

Least-privilege SSH credentials. The n8n workflow SSH-es into TrueNAS and runs Docker commands. If someone compromises your Tailscale network and reaches the n8n UI, they can execute arbitrary host commands through that SSH credential. Never use root or your primary admin account. Create a dedicated n8n-monitor user, add them to the docker group, and nothing else. If those credentials are abused, the blast radius is limited to Docker socket interactions.

Concurrency limits on the Node.js thread. Standard n8n runs on a single Node.js event loop. If you scale to 20 workflows all firing SSH commands and running JavaScript parsers simultaneously, event loop lag accumulates and other webhooks start missing their windows. Once you’re beyond a dozen active workflows, switch to Queue Mode: deploy Redis and dedicated n8n worker containers to distribute execution.

The broken monitoring script problem. What if the ollama-volume-1 container crashes entirely, or the SSH handshake drops a packet? The SSH node throws an error, and execution halts before reaching Discord. You get silence — which looks identical to “everything is fine.” Attach an Error Trigger Node to a separate workflow that fires a “Pipeline Offline” alert whenever any node execution fails. Without this, your monitoring system can break silently and you’ll never know.