VitalCheck API: Building a Multi-Domain Health Intelligence Engine with Python, FastAPI, and Production-Grade ML

One API, Ten Domains

VitalCheck is a production-deployed, machine learning inference API that delivers real-time health risk assessments across ten medical and wellness domains — all through a single, unified REST interface. It transforms raw clinical and biometric inputs into probabilistic predictions: from diabetes and stroke risk to MRI-based brain tumor classification and WHO-derived life expectancy modeling.

The problem it solves is fragmentation. The modern health-tech ecosystem is full of siloed, single-purpose prediction tools. A startup building a wellness dashboard might integrate one vendor for cardiovascular risk, another for sleep analysis, and a third for insurance cost forecasting — each with different schemas, inconsistent confidence outputs, and incompatible response envelopes. VitalCheck collapses that into a single API contract, a single deployment footprint, and a unified semantic for how health risk is expressed.

What the API Covers

Multi-disease risk profiling across diabetes, heart disease, stroke, and breast cancer malignancy, each returning a calibrated probability, a discrete risk level (Low / Moderate / High / Critical), the top contributing features, and evidence-based recommendations. A /comprehensive endpoint runs all three cardiovascular and metabolic risk models in one request, returning a unified health profile with per-model inference timing.

Brain tumor classification accepts a base64-encoded T1-weighted MRI and classifies it into one of four categories (glioma, meningioma, pituitary tumor, or no tumor) using a fine-tuned MobileNetV2 model exported to ONNX with INT8 quantization. Results include per-class confidence scores and a binary tumor_detected flag.

Sleep disorder assessment predicts the likelihood of insomnia or sleep apnea from twelve biometric and lifestyle features, alongside a calculated lifestyle score on a 0–100 scale. A fitness analytics endpoint benchmarks FitBit activity data against a reference population, producing percentile rankings and a categorical fitness label.

The insurance cost estimator returns a predicted annual premium alongside a 90% prediction interval — not a point estimate, but a statistically honest range. Life expectancy modeling accepts nineteen WHO-aligned socioeconomic and health indicators with similar interval reporting. A pre-aggregated hospital analytics endpoint provides population-level admission statistics queryable by condition and age group.

Stack

Layer	Technology
Runtime	Python 3.11, FastAPI, Uvicorn (ASGI)
Data Validation	Pydantic v2
Tabular ML	scikit-learn 1.6, Gradient Boosting, Random Forest, SVM
Deep Learning Inference	ONNX Runtime 1.21 (CPU, INT8 quantized)
Data Processing	NumPy, Pandas, PyArrow (Parquet)
Image Handling	Pillow
Package Management	uv 0.6
Containerization	Docker (multi-stage), docker-compose
Testing	pytest, httpx, pytest-asyncio

FastAPI over Flask or Django REST Framework: FastAPI’s native integration with Pydantic v2 means every request model doubles as a schema-validated, auto-documented contract. The OpenAPI specification is generated from the same type annotations that enforce runtime validation. For an ML inference API where input shape correctness is a prerequisite to meaningful predictions, this tight coupling between schema definition and runtime enforcement matters architecturally.

ONNX Runtime for the deep learning model: training MobileNetV2 in PyTorch and exporting to ONNX decouples the training environment from the inference environment. The runtime image doesn’t need PyTorch — a dependency that alone accounts for hundreds of megabytes. Paired with INT8 post-training quantization, this shrinks the brain tumor model from 8.9 MB to 2.4 MB (a 73% reduction) with negligible accuracy degradation. On a 2GB VPS, this is a hard constraint, not an optimization.

Quantile Regression Ensemble for insurance cost: rather than returning a single point estimate — which gives users false precision — the insurance endpoint runs three separate Gradient Boosting models trained to predict the mean, 5th percentile, and 95th percentile respectively. The response surface communicates genuine uncertainty.

Running Eight Models on 2GB

The most interesting operational constraint in VitalCheck is memory. Deploying eight pre-trained models (26+ MB of serialized artifacts) on a single 2GB VPS, where each Uvicorn worker can consume up to 750MB, creates a tight budget that shapes every architectural decision.

I built a ModelRegistry singleton, initialized once during the FastAPI application’s lifespan startup event. All eight models — scikit-learn pipelines, ONNX sessions, and Parquet reference tables — are loaded into memory at boot time and held in a shared registry object that’s injected into each request handler via FastAPI’s dependency injection system. No model reloads per request. No disk I/O during inference. The registry is passed by reference, not copied.

For the ONNX brain tumor model, inference is CPU-bound and image preprocessing is non-trivial: a base64-encoded MRI must be decoded, converted to RGB, resized to 224×224 using Lanczos interpolation, normalized against ImageNet statistics, transposed to channel-first format, and batched before being passed to the runtime. To prevent this CPU-intensive preprocessing from blocking the async event loop, inference is offloaded to a thread pool executor, preserving the API’s ability to handle concurrent requests under load.

What Building This Taught Me About Applied ML

Schema design is model design. The most consequential decision had nothing to do with hyperparameter tuning. It was defining a single, shared VitalCheckResponse[T] generic envelope that wraps every prediction across every domain. Every response carries a request ID, an inference timestamp, a standardized disclaimer, and data source citations. Any client can build generic error handling, logging, and latency monitoring against one contract — regardless of which endpoint they’re hitting.

Calibration matters more than accuracy. Several models were tuned for real-world utility rather than benchmark performance. The stroke model operates on a dataset where only 4.8% of records are positive cases. A naive model trained to maximize accuracy would predict “no stroke” for every input and achieve 95.2% accuracy while being useless. Lowering the classification threshold from 0.5 to 0.30 — accepting more false positives in exchange for fewer false negatives — is a product decision masquerading as a technical one. Understanding that distinction is the core competency of applied ML engineering.

Honesty is a feature. The prediction interval on the insurance endpoint, the percentile context on the fitness endpoint, the explicit disclaimer on every response — these aren’t legal boilerplate. They’re the API reflecting an honest understanding of what probabilistic models can and cannot claim. Users who understand the limits of a tool use it correctly.

Where VitalCheck Goes Next

Authentication and rate limiting are the most immediate gaps. VitalCheck currently trusts all callers. An API key management system — or integration with an API gateway like Kong or AWS API Gateway — would enable per-consumer rate limiting, usage analytics, and tiered access.

All eight models are currently trained offline and deployed as static artifacts. A lightweight CI/CD-triggered retraining pipeline — retrain when new labeled data arrives, evaluate against a held-out validation set, promote to production only if cross-validated metrics improve — would close the loop from research to production.

The fitness analytics endpoint is benchmarked against a 33-user FitBit convenience sample from 2016 — a limitation honestly disclosed in the API response. Integrating a larger, more demographically representative reference dataset would make the percentile comparisons genuinely meaningful. The architecture already supports this through Parquet-backed reference lookups; only the data needs to improve.

VitalCheck API is open source under the MIT License. The full codebase, model registry, and API documentation are available in the repository.

Try It Out

Check out the live demo or explore the source code on GitHub.

VitalCheck API: Building a Multi-Domain Health Intelligence Engine with Python, FastAPI, and Production-Grade ML

Technologies Used

Live Demo

Demo Unavailable

One API, Ten Domains

What the API Covers

Stack

Running Eight Models on 2GB

What Building This Taught Me About Applied ML

Where VitalCheck Goes Next

Try It Out

Related Tutorials

Trading Precision for Speed: Implementing Model Quantization and Format Conversion for Clinical Medical Imaging

Your API Reloads the Model on Every Request: Here's the FastAPI Pattern That Fixes It for Good

Stop Copying Response Boilerplate Across Every Endpoint: Build a Typed Generic Envelope in Pydantic v2