featured image

STORM DAT: Automating Government Document Compliance So Analysts Can Focus on What Matters

A web-based application for automating government document compliance, reducing manual review time and improving accuracy.

Published

Thu Feb 19 2026

Technologies Used

Python Flask Docker Whisper
View on GitHub

Live Demo

Loading video...

A Mission-Critical Tool in 500 Words or Less

STORM DAT (Document Analysis Tool) is a full-stack data analysis platform built for defense test engineers who need to process, categorize, and report on thousands of test results from Verification, Validation, and Accreditation campaigns. It ingests raw Excel workbooks and Word documents, runs them through a multi-stage analysis pipeline, and produces publication-ready charts, cross-referenced issue tables, and compliance-checked documentation — turning what was once days of manual spreadsheet work into a single upload-and-analyze workflow.

When a Single Misplaced Marking Can Halt a Deliverable

Anyone who has worked inside a government or defense environment knows the weight a document carries. A technical report destined for a program office is not just prose — it is a controlled artifact. Security markings must appear in precise locations. Every acronym must be defined before its first abbreviated use, never duplicated, never orphaned. Body text must conform to specific font families and point sizes. Tables follow their own typographic rules. Headers and footers carry classification banners that, if absent or incorrect, can delay an entire deliverable review cycle.

In large-scale defense programs, formal qualification testing generates an enormous volume of data. Test Summary files, Run for Record logs, and Comment Logs each arrive in slightly different Excel formats with inconsistent column headers, free-text failure comments, and issue tracker references buried in narrative prose. An analyst tasked with producing a VV&A report must manually sift through these files, extract every defect ID, cross-reference it against Azure DevOps for priority, categorize each failure by root cause, and compile pass/fail statistics — all while ensuring no issue slips through the cracks.

The problem compounds when comparing missile test data across multiple flight logs, where rows don’t align neatly and timing discrepancies hide in millisecond-level deltas. And when it comes time to review a 200-page test document for acronym compliance, font consistency, and security markings, even the most meticulous engineer will miss something on page 147. These aren’t hypothetical pain points — they are the daily reality of test teams operating under strict accreditation timelines.

The traditional approach to this problem is painfully manual: a senior analyst opens the document, visually scans every paragraph, cross-references an acronym list in a separate spreadsheet, and flags issues by hand. It is tedious, error-prone, and — critically — it pulls experienced professionals away from the analytical work they were hired to do. The cost is not just time; it is cognitive load spent on compliance mechanics instead of mission-relevant thinking.

Automated Compliance, Annotated Results, and AI Transcription

STORM DAT replaces that manual burden with a structured, repeatable analysis pipeline:

  • Security Marking Validation — The tool inspects every section header and footer against a configurable set of approved classification banners (CUI, UNCLASSIFIED, SECRET, TOP SECRET, and their variants), surfacing any deviations instantly.

  • Typographic Standards Enforcement — Each paragraph and table cell is checked against mandated font and sizing rules (12-point Arial for body text, 10-point Arial for tables), with non-conforming runs flagged at the exact character position.

  • Acronym Lifecycle Auditing — The engine cross-references document content against a provided acronym reference list, detecting duplicate definitions, undefined acronyms, abbreviations used before their full-form introduction, and potential new acronyms not yet cataloged.

  • AI-Powered Screen Recording and Transcription — A built-in media capture module records screen activity and audio, then routes the audio stream through OpenAI’s Whisper model to produce timestamped transcriptions — useful for documenting walkthroughs, test demonstrations, or review sessions without leaving the platform.

  • Automated Defect Extraction and Classification — The platform scans free-text failure comments using 29 distinct issue-type classifiers, automatically extracts Helix and ADO defect IDs via pattern recognition, and fetches each work item’s priority directly from Azure DevOps. The result is a fully cross-referenced issue table that would take hours to compile by hand.

  • Missile Test Data Alignment — When comparing two or three flight test logs side-by-side, the system uses similarity-scored fuzzy matching to align rows that represent the same test event, even when naming conventions differ between files. It then computes timing deltas down to the microsecond and flags any variance exceeding a defined threshold — surfacing synchronization anomalies that would be invisible in a manual comparison.

  • Intelligent Document Compliance Review — The acronym sweep engine parses Word documents paragraph-by-paragraph, validating that every abbreviation is properly defined on first use, flagging duplicate definitions, detecting undefined acronyms, checking font and sizing consistency against specification requirements, and verifying that every header and footer carries the correct security classification marking. Findings are color-coded directly in the output document for immediate visual review.

  • Speech-to-Text Test Narration — A built-in recording interface captures audio during test execution and transcribes it using a neural speech recognition model, creating searchable text records of verbal observations that would otherwise be lost.

Engineering a Compliance Engine: Architecture and the Reasoning Behind the Stack

The Stack

LayerTechnology
BackendPython 3.12, Flask, Gunicorn
Data ProcessingPandas, NumPy, SciPy
VisualizationMatplotlib
Document I/Opython-docx, openpyxl, XlsxWriter
AI/MLOpenAI Whisper, PyTorch
External IntegrationAzure DevOps REST API
InfrastructureDocker, HTTPS/SSL, GitLab CI

The Decision Matrix

Python + Flask over heavier frameworks — The application’s value lives entirely in its backend processing logic, not in complex client-side interactions. Flask’s minimal footprint keeps the dependency surface small and the deployment artifact lean — a meaningful advantage when the target environment may be an air-gapped network where every dependency must be vetted. Python’s ecosystem for document manipulation (python-docx, openpyxl, Pandas) is unmatched; no other language offers the same depth of library support for programmatic Word and Excel processing.

OpenAI Whisper as a local model, not an API call — Transcription runs on a self-hosted Whisper model loaded into memory at application startup, not through a cloud endpoint. This is a deliberate architectural decision driven by the operational environment: classified or controlled networks often cannot reach external APIs. By bundling the model locally and loading it once as a singleton, the application eliminates network dependency for transcription while amortizing the expensive model-load cost across all requests in a session.

Gunicorn with extended timeouts — Audio transcription on a medium-sized Whisper model is computationally expensive. The deployment configuration sets a 600-second worker timeout — ten minutes — to accommodate long-running transcription jobs without prematurely killing the process. Combined with four workers and two threads, this configuration balances concurrency for lightweight document analysis requests against the heavier resource demands of media processing.

Pandas as the Core Data Engine — The entire analysis pipeline operates on DataFrames. Test data arrives as Excel, gets parsed into tabular structures, gets filtered and categorized through vectorized operations, and gets written back out as formatted Excel. Pandas is the natural lingua franca for this kind of columnar data transformation, and its tight integration with both openpyxl (for reading) and XlsxWriter (for writing with rich formatting) means the platform can round-trip Excel files without losing fidelity.

Matplotlib with the agg Backend — Visualization in this context is strictly generative: the server produces pie charts and bar charts as static PNG artifacts, with no interactive rendering required. Setting Matplotlib to its non-interactive agg backend eliminates any dependency on a display server, making it equally reliable in a Docker container, a CI pipeline, or a headless production server.

Splitting Runs: Solving the Granularity Problem in Word Document Annotation

The most intellectually demanding engineering challenge in the project is not the analysis itself — it is the annotation. Word documents represent text internally as “runs,” which are contiguous segments sharing identical formatting properties. A single paragraph may contain one run or dozens, and run boundaries rarely align with the boundaries of a finding.

Consider a paragraph where only a single acronym — three characters buried in the middle of a 200-character run — needs to be highlighted in yellow. The naive approach (highlight the entire run) would produce a visually misleading result. The correct approach requires surgically splitting the run into three segments (before, target, after), applying the highlight only to the target segment, preserving every formatting property (font family, size, color, bold, italic) across all three new runs, and then removing the original run from the document’s underlying XML tree — all without corrupting the document structure.

This run-splitting algorithm operates at the intersection of string position tracking and XML element manipulation. The system maintains a character-position counter as it walks through each run in a paragraph, identifies the exact run and offset where a finding begins and ends, reconstructs the paragraph’s run sequence with the new segments inserted, and transfers formatting attributes programmatically. The result is pixel-perfect annotation that looks as though a human reviewer placed each highlight by hand — but produced in milliseconds across hundreds of findings simultaneously.

Reflections: What Building a Compliance Tool Teaches You

Configuration-driven design pays compound interest. Early in the project, I made the decision to externalize security markings, allowed file types, size limits, and environment configurations into a dedicated configuration module rather than scattering constants throughout the codebase. That single choice simplified every subsequent feature addition — when a new classification banner needed to be supported, it was a one-line configuration change, not a code modification requiring regression testing.

Security is architecture, not a feature. In a tool that handles controlled documents, security cannot be an afterthought bolted on before deployment. Input validation, filename sanitization, HTML escaping, Content Security Policy headers, and restricted file-type whitelists are woven into the application’s middleware and utility layers from the ground up. This defense-in-depth approach means that no single layer’s failure exposes the system — a principle that applies far beyond government software.

Solve the workflow, not just the task. The decision to integrate screen recording and transcription alongside document analysis was not a technical exercise — it was a product insight. Analysts who review documents also conduct walkthroughs, record demonstrations, and produce meeting notes. By housing both capabilities in one platform, the tool reduces context-switching and consolidates the analyst’s digital workspace. The best software does not just automate a task; it reshapes the workflow around it.

Configuration-driven flexibility beats rigid schemas. Test data formats in defense programs are not standardized — they vary by contractor, by program phase, and sometimes by individual test engineer. Rather than enforcing a strict input schema and rejecting anything that doesn’t conform, the platform uses a dictionary of column header variants for each logical field. A “pass/fail” column might be labeled “Test Pass/Fail,” “Overall Pass Fail,” or “PrSM 103.4735 Test SIT Pass/Fail,” and the system will recognize all of them. This defensive design philosophy — match what you can, log what you can’t, never crash — is what makes the difference between a tool that works in a demo and one that works in production.

Pipeline orchestration should merge as it goes. An early architectural decision was to merge multi-file results incrementally during processing rather than collecting all outputs and merging in a post-processing step. This means the system can handle an arbitrary number of input files without holding all intermediate results in memory simultaneously, and it produces a single unified output regardless of how many files were uploaded. It’s a subtle choice, but it keeps the pipeline linear and predictable.

Compliance checking is a product feature, not a nice-to-have. In regulated environments, a document with an incorrect security marking or an undefined acronym isn’t just sloppy — it’s a finding in an accreditation review. Building seven distinct compliance checks directly into the document analysis engine, each with its own visual color code, turns a tedious manual review into a one-click operation. The insight here is that understanding your user’s regulatory context is as important as understanding their functional requirements.

What Comes Next: The Roadmap Beyond Version One

Batch Processing and Comparative Analysis — The current architecture processes documents one at a time. A natural evolution is batch upload support, allowing analysts to sweep an entire deliverable package in a single action, with a consolidated findings dashboard that highlights cross-document inconsistencies (e.g., an acronym defined differently in two companion documents).

Custom Rule Authoring — The compliance rules are currently embedded in the analysis engine. Exposing a rule-builder interface — where users can define their own font requirements, marking formats, or acronym policies — would transform the tool from a single-purpose analyzer into a configurable compliance platform adaptable to any organization’s document standards.

Persistent History and Trend Reporting — Today, each analysis session is stateless; findings are generated, downloaded, and forgotten. Introducing a lightweight persistence layer would enable trend tracking over time — showing whether a team’s document quality is improving across successive drafts, which error categories are most common, and where targeted training might reduce recurring defects.

Vector-powered semantic search — The codebase already contains the scaffolding for a ChromaDB-backed vector store with sentence-transformer embeddings. The logical next step is connecting this to the analysis pipeline so that engineers can semantically search across historical test results, finding similar failure patterns and past resolutions without knowing the exact terminology.

Asynchronous processing for large uploads — The current architecture processes files synchronously within the request lifecycle, with Gunicorn’s 600-second timeout as the upper bound. Introducing a task queue would allow the platform to accept large multi-file uploads, process them in the background, and notify users upon completion — improving both the user experience and server resource utilization.

Structured reporting and export — While the platform generates Excel tables, HTML views, and annotated Word documents, there is an opportunity to produce a consolidated VV&A report artifact that combines charts, issue tables, and compliance findings into a single deliverable — reducing the final mile of manual report assembly that still exists downstream of the tool.

Try It Out

Check out the source code on GitHub.

We respect your privacy.

← View All Projects

Related Tutorials

    Ask me anything!