Your Flask App Accepts File Uploads — But Can It Survive a Malicious One?

Every File Upload is an Attack Surface

Every web application that accepts file uploads is a target. An attacker can upload a file named ../../../etc/passwd, inject a 10-gigabyte payload to exhaust server memory, or slip in an .exe disguised as a .docx. Most beginner tutorials teach you how to receive a file. Very few teach you how to distrust it.

In STORM DAT, the application accepts Word documents, Excel spreadsheets, audio recordings, and video files from users operating in sensitive government environments. A single validation gap could mean a path traversal exploit writing files to arbitrary server directories, or an oversized upload crashing a containerized deployment with limited disk space.

What You Need Before You Start: Python, Flask, and a Healthy Paranoia

Knowledge Base:

Basic Python: functions, classes, exception handling, os.path operations
Flask fundamentals: routes, request.files, Blueprint, @app.after_request
HTTP basics: what headers are and why browsers read them

Environment:

Python 3.12 (as specified in the project’s Dockerfile)
Flask 3.1.0
Werkzeug 3.1.3 — ships with Flask, provides secure_filename
Bleach 6.2.0 — HTML sanitization

pip install flask==3.1.0 werkzeug==3.1.3 bleach==6.2.0

Three Walls: A Mental Model for Layered Validation

Think of file upload security as a medieval castle with three concentric walls. Each wall stops a different class of attack. If an attacker breaches one wall, the next catches them.

Wall 1 — Validators (validators.py): Checks what the file claims to be (extension) and how large it actually is (byte measurement). Rejects anything outside the whitelist.

Wall 2 — Sanitizers (security.py): Strips the filename of any path traversal characters and neutralizes HTML content that might be rendered later. The file name itself is treated as user input.

Wall 3 — Security Headers (security_headers.py): Even after the file is accepted, the HTTP response is armored with headers that prevent the browser from doing anything unexpected — no MIME sniffing, no framing, no inline script execution outside the CSP policy.

Building Wall 1: Extension Whitelisting and Size Checks

Before writing any validation logic, STORM DAT establishes a domain-specific exception. Using a ValidationError instead of a generic ValueError lets calling code catch validation failures distinctly from other errors and return user-friendly messages without leaking internal details.

class ValidationError(Exception):
    """Custom exception for validation errors"""
    pass

Extension whitelisting uses a whitelist strategy, not a blacklist. The distinction matters. A blacklist tries to enumerate every dangerous extension (.exe, .sh, .bat, …) and will always miss something. A whitelist enumerates only what is explicitly allowed and rejects everything else by default.

def validate_file_extension(filename, allowed_extensions):
    if not filename:
        raise ValidationError("No filename provided")

    # Extract extension and normalize to lowercase
    ext = os.path.splitext(filename)[1].lower()

    if ext not in allowed_extensions:
        allowed = ', '.join(sorted(allowed_extensions))
        raise ValidationError(
            f"File type '{ext}' not allowed. Allowed types: {allowed}"
        )
    return True

I chose os.path.splitext over string splitting because it correctly handles filenames like report.backup.docx — it returns .docx, not .backup. The extension is lowercased to prevent .DOCX or .Docx from bypassing the check. The allowed set is defined in config.py as {'.docx', '.xlsx'} for documents and {'.wav', '.webm'} for media.

File size measurement without loading the file. This is where many beginners make a mistake: they read the entire file into memory to check its size. The seek-tell pattern measures the file without allocating a single byte of content memory.

def validate_file_size(file_obj, max_size_mb):
    if not file_obj:
        raise ValidationError("No file provided")

    # Move the cursor to the end of the file stream
    file_obj.seek(0, os.SEEK_END)

    # .tell() returns the cursor's byte position — which is now the file size
    size_bytes = file_obj.tell()

    # CRITICAL: reset the cursor so the file can still be read later
    file_obj.seek(0)

    size_mb = size_bytes / (1024 * 1024)
    if size_mb > max_size_mb:
        raise ValidationError(
            f"File size {size_mb:.1f}MB exceeds maximum {max_size_mb}MB"
        )
    return True

file_obj.seek(0, os.SEEK_END) moves the file cursor to 0 bytes from the end. file_obj.tell() then reports the cursor’s absolute position, which equals the total byte count. This works because Flask’s FileStorage wraps a standard Python file-like object that supports seeking. If you forget file_obj.seek(0) afterward, every subsequent .read() or .save() call produces an empty result — one of the most common upload bugs in Flask applications.

Under the hood, file_obj.tell() maps to the C standard library’s ftell(), which queries the kernel’s file descriptor table. The kernel tracks the offset as a single integer — no data is copied into userspace. This is why seek-tell has O(1) memory complexity even for multi-gigabyte files, versus the naive file.read() + len() approach which allocates the entire file in memory.

The size limits are driven by configuration:

MAX_FILE_SIZE_MB = {
    'document': 50,
    'media': 500
}

Composite validators bundle the checks for each file category so routes don’t repeat the validation calls:

def validate_document_upload(file_obj):
    if not file_obj or not file_obj.filename:
        raise ValidationError("No file provided")

    validate_file_extension(file_obj.filename, config.ALLOWED_UPLOAD_EXTENSIONS)
    validate_file_size(file_obj, config.MAX_FILE_SIZE_MB['document'])
    return True

Building Wall 2: Path Traversal Prevention

A filename like ../../../../etc/shadow is a classic path traversal attack. STORM DAT applies two layers:

from werkzeug.utils import secure_filename as werkzeug_secure_filename

def sanitize_filename(filename):
    if not filename:
        return None

    # Layer 1: Werkzeug strips path separators, special chars, and Unicode tricks
    safe_name = werkzeug_secure_filename(filename)

    # Layer 2: os.path.basename removes any remaining directory components
    safe_name = os.path.basename(safe_name)

    return safe_name if safe_name else None

Don’t rely on secure_filename alone. Werkzeug’s implementation handles most attack vectors, but adding os.path.basename as a second pass ensures that even if a future Werkzeug version introduces a regression, no directory component survives. Defense in depth means assuming every individual layer can fail.

Building Wall 3: Security Headers on Every Response

The final layer doesn’t touch the file at all — it hardens every HTTP response. An @app.after_request hook injects headers before the response leaves the server:

@app.after_request
def set_security_headers(response):
    # Prevent MIME sniffing — browser must respect Content-Type
    response.headers['X-Content-Type-Options'] = 'nosniff'

    # Block framing — prevents clickjacking
    response.headers['X-Frame-Options'] = 'DENY'

    response.headers['Permissions-Policy'] = (
        'geolocation=(), '
        'microphone=(self), '
        'camera=(self)'
    )

    # HTTPS enforcement in production only — don't break the dev server
    if not app.config.get('DEBUG', False):
        response.headers['Strict-Transport-Security'] = (
            'max-age=31536000; includeSubDomains'
        )

    csp_directives = [
        "default-src 'self'",
        "script-src 'self' 'unsafe-inline' https://code.jquery.com",
        "object-src 'none'"
    ]
    response.headers['Content-Security-Policy'] = '; '.join(csp_directives)

    return response

Concurrent Uploads, Error Leakage, and the `finally` Cleanup

Flask with Gunicorn runs multiple worker processes. Each worker handles its own request independently — two simultaneous uploads don’t share validation state. But if two users upload files with the same name, they’ll write to the same path in src/static/uploads/. The video upload route mitigates this by generating UUID-based filenames (uuid.uuid4().webm), but the document upload route uses the sanitized original filename — a potential collision vector under concurrent load.

Error message leakage is a separate concern. The ValidationError messages include the rejected extension (“File type ‘.exe’ not allowed”) but never include internal paths, stack traces, or server configuration. In route handlers, the broad except Exception returns only a generic message: “An error occurred while processing the document.” Never include the reason a filename was sanitized in user-facing output — telling an attacker that ../../etc/passwd was “stripped of path components” confirms that path traversal is being attempted against a Linux filesystem.

In routes.py, uploaded files are deleted in a finally block after processing:

finally:
    try:
        if file_path and os.path.exists(file_path):
            os.remove(file_path)
    except Exception as cleanup_error:
        current_app.logger.warning(f"Failed to cleanup: {cleanup_error}")

This ensures temporary files don’t accumulate on disk even if analysis raises an exception. Leftover files are a security liability — they could be accessed by other routes or through directory traversal in a misconfigured server.

The four principles that compose this defense: whitelist rather than blacklist, measure without loading (seek-tell), sanitize the name as aggressively as the content, and harden the response independent of the request. No single layer is trusted to catch everything — each assumes the others have already failed.

Your Flask App Accepts File Uploads — But Can It Survive a Malicious One?

Technologies Used

Every File Upload is an Attack Surface

What You Need Before You Start: Python, Flask, and a Healthy Paranoia

Three Walls: A Mental Model for Layered Validation

Building Wall 1: Extension Whitelisting and Size Checks

Building Wall 2: Path Traversal Prevention

Building Wall 3: Security Headers on Every Response

Concurrent Uploads, Error Leakage, and the `finally` Cleanup

Related Projects

STORM DAT: Automating Government Document Compliance So Analysts Can Focus on What Matters

Your Flask App Accepts File Uploads — But Can It Survive a Malicious One?

Technologies Used

Every File Upload is an Attack Surface

What You Need Before You Start: Python, Flask, and a Healthy Paranoia

Three Walls: A Mental Model for Layered Validation

Building Wall 1: Extension Whitelisting and Size Checks

Building Wall 2: Path Traversal Prevention

Building Wall 3: Security Headers on Every Response

Concurrent Uploads, Error Leakage, and the finally Cleanup

Related Projects

STORM DAT: Automating Government Document Compliance So Analysts Can Focus on What Matters

Concurrent Uploads, Error Leakage, and the `finally` Cleanup