On this page
- Purpose
- The Problem
- What You Need in Your Toolkit: Async Python and FastAPI’s Request Lifecycle
- The Airport Analogy: Ground Crew, Terminal, and Passengers
- The Three-Part Machinery: Settings, Registry, and Injection
- Why app.state Beats a Global Variable: Memory Layout and Thread Safety
- When Startup Fails, When the Registry is Missing, and the Test Override Trick
- You Now Know How to Wire Expensive Shared State into FastAPI Without Global Variables
Purpose
The Problem
An ML inference API has a constraint that a standard REST API does not: its most expensive resource is not a database connection or a network socket — it is the model artifact itself. Loading a scikit-learn pipeline from disk is a blocking I/O operation followed by object deserialization and memory allocation. Do it inside a request handler and you are paying that cost on every single request, often adding hundreds of milliseconds of latency. VitalCheck loads eight models and two Parquet reference tables at startup. Naively reloading them per-request would make the API unusably slow.
The solution requires three interlocking FastAPI patterns working together: the Lifespan Context Manager (the right place to run startup logic), the Singleton Registry (the right data structure to hold shared state), and Dependency Injection (the right mechanism to hand that state to route handlers without coupling them to global variables). Mastering this combination is a prerequisite for building any production-grade FastAPI service that manages expensive shared resources.
We will build an application-scoped singleton that loads all artifacts
once, stores them on app.state, and injects them into route handlers via FastAPI’s
Depends() system — without a single global variable or per-request I/O.
What You Need in Your Toolkit: Async Python and FastAPI’s Request Lifecycle
Knowledge Base:
- Python classes and
__init__methods - What a context manager is (the
withstatement and__enter__/__exit__) - Basic async/await syntax in Python
- What FastAPI route handlers look like (
@router.post(...)) - Optional but helpful: understanding of what “application state” means in a web framework
Environment (from pyproject.toml):
Python >= 3.11
fastapi >= 0.115.0
uvicorn >= 0.34.0
joblib >= 1.4.0
onnxruntime >= 1.21.0
pandas >= 2.2.0
pydantic-settings >= 2.7.0
🔵 Deep Dive: FastAPI is built on Starlette, which runs on ASGI — the Asynchronous Server Gateway Interface. ASGI separates the web server (Uvicorn) from the application (FastAPI). The server manages the event loop; the application defines what runs on it. Lifespan events are the ASGI-level hook for running code before and after the application handles any requests.
The Airport Analogy: Ground Crew, Terminal, and Passengers
Think of the VitalCheck startup process as an airport coming online before the first flight.
The Ground Crew (the lifespan function) sets everything up before passengers arrive —
loading fuel (models), stocking the terminal (reference data), and checking all systems.
The Terminal (app.state) is the shared facility that stores everything the ground crew
prepared. The Passengers (incoming requests) never interact with the ground crew directly;
they use what the terminal provides. The Check-in Desk (the get_registry dependency
function) is a fixed counter that passengers walk up to — it always hands them what they need
from the terminal.
sequenceDiagram
participant Uvicorn as Uvicorn (ASGI Server)
participant Lifespan as lifespan() context manager
participant Registry as ModelRegistry
participant AppState as app.state
participant Handler as Route Handler
participant Depends as get_registry()
Uvicorn->>Lifespan: startup signal
Lifespan->>Registry: ModelRegistry()
Lifespan->>Registry: .load_all(models_dir, reference_dir)
Note over Registry: Loads 8 .pkl files,<br/>1 ONNX session,<br/>2 Parquet tables
Lifespan->>AppState: app.state.registry = registry
Lifespan-->>Uvicorn: yield (ready to serve)
loop Per Request
Uvicorn->>Handler: POST /api/v1/risk/diabetes
Handler->>Depends: Depends(get_registry)
Depends->>AppState: request.app.state.registry
AppState-->>Depends: ModelRegistry instance (already loaded)
Depends-->>Handler: registry
Handler->>Registry: registry.diabetes.predict_proba(...)
Handler-->>Uvicorn: VitalCheckResponse[...]
end
Uvicorn->>Lifespan: shutdown signal
Lifespan-->>Uvicorn: cleanup complete
The Three-Part Machinery: Settings, Registry, and Injection
We will build through the code in three focused chunks that map directly to the three files
involved: config.py, dependencies.py, and the route handler in risk.py.
Chunk 1 — Settings: The Configuration Singleton (app/config.py)
Before we can load models, we need to know where they live. pydantic-settings extends
Pydantic’s BaseModel to read values from environment variables or a .env file automatically.
from pathlib import Path
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
# This tells pydantic-settings where to look for env vars.
# 'extra="ignore"' means unknown env vars are silently ignored rather
# than raising a ValidationError — important in containerized environments
# where many unrelated vars may be set.
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
extra="ignore"
)
# Path fields — pydantic-settings converts the string "data/models"
# from the env var or .env file into a pathlib.Path automatically.
models_dir: Path = Path("data/models")
reference_dir: Path = Path("data/reference")
static_dir: Path = Path("static")
log_level: str = "info"
api_version: str = "1.0.0"
# Module-level singleton — created once per process.
_settings: Settings | None = None
def get_settings() -> Settings:
global _settings
if _settings is None:
# First call reads from environment / .env file.
# Every subsequent call returns the already-parsed object.
_settings = Settings()
return _settings
This is the Initialization-on-First-Use singleton pattern. It is thread-safe in CPython
because the GIL ensures that _settings = Settings() completes atomically for the initial
assignment. The pattern avoids module-level side effects at import time.
Chunk 2 — The Registry: Holding All Artifacts in One Place (app/dependencies.py)
ModelRegistry is a plain Python class that acts as a typed container for every loaded
artifact. All attributes start as None; load_all() populates them.
import joblib
import onnxruntime as ort
import pandas as pd
class ModelRegistry:
"""Holds all ML artifacts loaded once at startup."""
def __init__(self) -> None:
# All fields typed explicitly — None before load_all() is called,
# populated after. Using typed attributes means IDE tooling can
# autocomplete registry.diabetes without inspecting load_all().
self.diabetes: Any = None
self.heart: Any = None
self.stroke: Any = None
self.breast_cancer: Any = None
self.sleep: Any = None
self.life_expectancy: Any = None
self.insurance: tuple[Any, Any, Any] | None = None # (mean, q05, q95)
self.brain_tumor_session: ort.InferenceSession | None = None
self.fitbit_percentiles: pd.DataFrame | None = None
self.hospital_analytics: pd.DataFrame | None = None
self.registry_meta: dict[str, Any] = {}
The load_all method is where the blocking I/O happens — intentionally, exactly once:
def load_all(self, models_dir: Path, reference_dir: Path) -> None:
logger.info("Loading ML artifacts from %s", models_dir)
# joblib.load() deserializes a pickle-compatible binary file.
# Each .pkl here is a fitted sklearn Pipeline object.
self.diabetes = joblib.load(models_dir / "diabetes_pipeline.pkl")
self.heart = joblib.load(models_dir / "heart_pipeline.pkl")
self.stroke = joblib.load(models_dir / "stroke_pipeline.pkl")
self.breast_cancer = joblib.load(models_dir / "breast_cancer_pipeline.pkl")
self.sleep = joblib.load(models_dir / "sleep_pipeline.pkl")
self.life_expectancy = joblib.load(models_dir / "life_expectancy_pipeline.pkl")
self.insurance = joblib.load(models_dir / "insurance_pipeline.pkl")
# ONNX Runtime requires explicit session configuration.
# Thread counts are set to 1 — this is a deliberate memory/CPU tradeoff
# on a single-worker, 2GB VPS deployment (covered in Tutorial 3).
opts = ort.SessionOptions()
opts.inter_op_num_threads = 1
opts.intra_op_num_threads = 1
opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
self.brain_tumor_session = ort.InferenceSession(
str(onnx_path),
sess_options=opts,
providers=["CPUExecutionProvider"],
)
# Reference data loaded from Parquet — a columnar binary format
# that is dramatically faster to read than CSV for structured data.
self.fitbit_percentiles = pd.read_parquet(reference_dir / "fitbit_percentiles.parquet")
self.hospital_analytics = pd.read_parquet(reference_dir / "hospital_analytics.parquet")
Chunk 3 — The Lifespan and the Injection Point (app/dependencies.py, continued)
The lifespan context manager is where the registry is wired into the application:
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request
@asynccontextmanager
async def lifespan(app: FastAPI):
# Everything BEFORE yield runs at startup — before the first request.
settings = get_settings()
registry = ModelRegistry()
registry.load_all(settings.models_dir, settings.reference_dir)
# app.state is Starlette's built-in key-value store for application-scoped
# objects. It is attached to the ASGI app instance itself, not to any
# specific request. Every request can reach it via request.app.state.
app.state.registry = registry
logger.info("VitalCheck API startup complete")
yield # Application is now live and serving requests.
# Everything AFTER yield runs at shutdown.
logger.info("VitalCheck API shutting down")
# If registry objects held file handles or connections, you would close
# them here. sklearn pipelines and ONNX sessions are in-memory only.
The lifespan is passed to the FastAPI constructor in main.py:
app = FastAPI(
title="VitalCheck API",
version="1.0.0",
lifespan=lifespan, # <-- FastAPI calls this automatically at ASGI startup/shutdown
...
)
Finally, the dependency function that routes use to access the registry:
def get_registry(request: Request) -> ModelRegistry:
# request.app is the FastAPI application instance.
# request.app.state is the same Starlette State object we wrote to in lifespan().
# This is a single attribute lookup — effectively free.
return request.app.state.registry
And a route handler consuming it:
@router.post("/diabetes", response_model=VitalCheckResponse[DiabetesPrediction])
async def predict_diabetes(
req: DiabetesRequest,
# FastAPI resolves Depends(get_registry) by calling get_registry(request)
# and injecting the return value as the 'registry' parameter.
# The route handler never imports or references ModelRegistry directly.
registry: ModelRegistry = Depends(get_registry),
) -> VitalCheckResponse[DiabetesPrediction]:
prob, contributors = predict_risk(registry.diabetes, features)
...
Why app.state Beats a Global Variable: Memory Layout and Thread Safety
The Global Variable Trap
The anti-pattern this design replaces looks like this:
# DO NOT DO THIS
_registry = None
def load_models():
global _registry
_registry = ModelRegistry()
_registry.load_all(...)
# In route handler:
from app.some_module import _registry
This works in development but breaks in three ways in production:
- Import order dependency. If a route module is imported before
load_models()is called,_registryisNoneat import time. This is a class of bug that is genuinely hard to reproduce outside production. - No lifecycle guarantee. A global variable has no mechanism to ensure it is initialized before requests are served. The lifespan hook is a first-class ASGI guarantee.
- Testability. To test a route with a mock registry when using a global, you must monkeypatch
the module’s global — a fragile approach. With
Depends(get_registry), you can override the dependency in tests withapp.dependency_overrides[get_registry] = lambda: mock_registry.
Memory Implications
ModelRegistry holds 26+ MB of in-memory model artifacts. Because it is stored on app.state
and passed by reference through the dependency system, this memory is allocated exactly once.
get_registry returns the same object on every call — not a copy. Route handlers hold a
reference to the registry for the duration of a single request and then release it. The garbage
collector never sees these objects as candidates for collection because app.state maintains
a live reference.
🔵 Deep Dive: Starlette’s State object is simply a wrapper around a plain Python
dictionary. app.state.registry = registry is equivalent to app.state.__dict__['registry'] = registry.
The Request object exposes request.app as a reference to the ASGI application — not a copy.
This is why get_registry(request) is a dictionary lookup, not a function call that does any
real work.
When Startup Fails, When the Registry is Missing, and the Test Override Trick
What happens if load_all() raises an exception?
If any joblib.load() call fails (file missing, corrupted pickle, version mismatch), the
exception propagates up through lifespan(), which causes Uvicorn to log the error and
refuse to start. The application never enters a partially-initialized state where some models
are loaded and others are not. This is the correct behavior — a half-initialized registry
is more dangerous than a dead server, because it would serve some endpoints successfully and
fail others in unpredictable ways.
What if a model is None at request time?
Route handlers guard against this explicitly:
# From app/routers/imaging.py
if registry.brain_tumor_session is None:
raise HTTPException(status_code=503, detail="Brain tumor model not loaded")
This is the defensive check for the case where load_all() is called with a missing optional
model. In practice, the startup failure above prevents this from ever occurring in production,
but the 503 guard makes the failure mode explicit and debuggable rather than producing a
cryptic AttributeError inside inference code.
🔴 Danger: The @asynccontextmanager decorator on lifespan means the function must
contain exactly one yield. If you accidentally yield inside a loop or conditional, FastAPI
will receive an unexpected generator state and raise a RuntimeError at startup. Always
keep the lifespan body to the simple pattern: setup → yield → teardown.
The Test Override Pattern
Because the registry is injected via Depends, tests can swap it without touching production code:
# In your test file:
from fastapi.testclient import TestClient
from app.main import app
from app.dependencies import get_registry
mock_registry = MockModelRegistry() # your test double
app.dependency_overrides[get_registry] = lambda: mock_registry
client = TestClient(app)
response = client.post("/api/v1/risk/diabetes", json={...})
This is the canonical FastAPI testing pattern. dependency_overrides is a dictionary on the
app instance; FastAPI checks it before resolving any Depends() call and substitutes the
override if one is present. No monkeypatching, no import manipulation.
You Now Know How to Wire Expensive Shared State into FastAPI Without Global Variables
You have learned a complete three-layer pattern for managing application-scoped resources:
pydantic-settings+ singleton getter — type-safe, environment-aware configuration that reads once and caches forever.ModelRegistryclass +lifespancontext manager — a guaranteed-once initialization hook that stores artifacts onapp.statebefore the first request and provides a natural teardown point.Depends(get_registry)— a zero-cost dependency injection mechanism that decouples route handlers from the registry’s storage location, enabling clean test overrides.
The core skill transfer is this: app.state is the correct home for application-scoped
singletons in FastAPI, and Depends() is the correct mechanism to access them in handlers.
This pattern applies equally to database connection pools, HTTP clients, caches, feature flag
clients, and any resource that is expensive to create and safe to share across requests.