On this page
- A Clinical-Grade REST API for the Modern Health Stack
- The Fragmentation Problem in Digital Health Intelligence
- Twelve Endpoints. One Contract. Zero Ambiguity.
- The Architecture Beneath the Surface: Pragmatic Choices at Every Layer
- The Stack
- The Decision Matrix
- The Hardest Problem: Running Eight Models Without Running Out of Memory
- What Building This Taught Me About Production ML
- Where VitalCheck Goes Next
A Clinical-Grade REST API for the Modern Health Stack
VitalCheck is a production-deployed, machine learning inference API that delivers real-time health risk assessments across ten medical and wellness domains — all through a single, unified REST interface. Designed for health-tech platforms, wellness applications, and research tools, it transforms raw clinical and biometric inputs into actionable probabilistic predictions: from diabetes and stroke risk to MRI-based brain tumor classification and WHO-derived life expectancy modeling. This is not a demo. It is a fully containerized, documented, and monitored API built to production standards.
The Fragmentation Problem in Digital Health Intelligence
The modern health-tech ecosystem is littered with siloed, single-purpose prediction tools. A startup building a wellness dashboard might integrate one vendor for cardiovascular risk, another for sleep analysis, and a third for insurance cost forecasting — each with different schemas, authentication models, inconsistent confidence outputs, and incompatible response envelopes. The engineering overhead of stitching these together is substantial, and the inconsistency in how “risk” is communicated across vendors creates a genuinely poor user experience. There is a meaningful gap between what clinical ML research produces and what a software team can practically consume through an API. VitalCheck was built to close that gap: a single API contract, a single deployment footprint, and a unified semantic for how health risk is expressed.
Twelve Endpoints. One Contract. Zero Ambiguity.
VitalCheck delivers health intelligence across a breadth of domains that would typically require multiple vendors or bespoke integrations:
-
Multi-Disease Risk Profiling: Dedicated endpoints for diabetes, heart disease, stroke, and breast cancer malignancy — each returning a calibrated probability, a discrete risk level (Low / Moderate / High / Critical), the top contributing features, and evidence-based recommendations. A
/comprehensiveendpoint runs all three cardiovascular and metabolic risk models in a single request, returning a unified health profile with per-model inference timing. -
Medical Image Classification: A brain tumor endpoint accepts a base64-encoded T1-weighted MRI and classifies it into one of four categories (glioma, meningioma, pituitary tumor, or no tumor) using a fine-tuned MobileNetV2 model exported to ONNX with INT8 quantization. The result includes per-class confidence scores and a binary
tumor_detectedflag. -
Wellness and Lifestyle Analytics: A sleep disorder assessment endpoint predicts the likelihood of insomnia or sleep apnea from twelve biometric and lifestyle features, accompanied by a calculated lifestyle score on a 0–100 scale. A fitness analytics endpoint benchmarks a user’s FitBit activity data against a reference population, producing percentile rankings and a categorical fitness label.
-
Economic and Population Health Modeling: An insurance cost estimator returns a predicted annual premium alongside a 90% prediction interval — not just a point estimate, but a statistically honest range. A life expectancy endpoint accepts nineteen WHO-aligned socioeconomic and health indicators and models country-level outcomes with similar interval reporting. A pre-aggregated hospital analytics endpoint provides population-level admission statistics queryable by condition and age group.
The Architecture Beneath the Surface: Pragmatic Choices at Every Layer
The Stack
| Layer | Technology |
|---|---|
| Runtime | Python 3.11, FastAPI, Uvicorn (ASGI) |
| Data Validation | Pydantic v2 |
| Tabular ML | scikit-learn 1.6, Gradient Boosting, Random Forest, SVM |
| Deep Learning Inference | ONNX Runtime 1.21 (CPU, INT8 quantized) |
| Data Processing | NumPy, Pandas, PyArrow (Parquet) |
| Image Handling | Pillow |
| Package Management | uv 0.6 |
| Containerization | Docker (multi-stage), docker-compose |
| Testing | pytest, httpx, pytest-asyncio |
The Decision Matrix
FastAPI over Flask or Django REST Framework: The choice is not arbitrary. FastAPI’s native integration with Pydantic v2 means that every request model doubles as a schema-validated, auto-documented contract. The OpenAPI specification is generated from the same type annotations that enforce runtime validation. For an ML inference API where input shape correctness is a prerequisite to meaningful predictions, this tight coupling between schema definition and runtime enforcement is architecturally sound — not a convenience feature.
ONNX Runtime for the Deep Learning Model: Training MobileNetV2 in PyTorch and exporting to ONNX decouples the training environment from the inference environment. The runtime image does not need PyTorch — a dependency that alone accounts for hundreds of megabytes. Paired with INT8 post-training quantization, this decision shrinks the brain tumor model from 8.9 MB to 2.4 MB, a 73% reduction, with negligible accuracy degradation. On a resource-constrained deployment (a 2GB VPS), this is not an optimization — it is a prerequisite.
Quantile Regression Ensemble for Insurance Cost: Rather than returning a single point estimate for insurance cost — which gives users false precision — the insurance endpoint runs three separate Gradient Boosting models trained to predict the mean, 5th percentile, and 95th percentile respectively. The response surface communicates genuine uncertainty. This reflects a product philosophy: a good ML API is honest about what it does not know.
The Hardest Problem: Running Eight Models Without Running Out of Memory
The most technically interesting constraint in VitalCheck is not algorithmic — it is operational. Deploying eight pre-trained models (26+ MB of serialized artifacts) on a single 2GB VPS, where each Uvicorn worker can consume up to 750MB, creates a tight memory budget that shapes every architectural decision downstream.
The solution is a ModelRegistry singleton, initialized once during the FastAPI application’s lifespan startup event. All eight models — scikit-learn pipelines, ONNX sessions, and Parquet reference tables — are loaded into memory at boot time and held in a shared registry object that is injected into each request handler via FastAPI’s dependency injection system. No model is reloaded per request. No disk I/O occurs during inference. The registry is passed by reference, not copied.
For the ONNX brain tumor model specifically, inference is CPU-bound and image preprocessing is non-trivial: a base64-encoded MRI must be decoded, converted to RGB, resized to 224×224 using Lanczos interpolation, normalized against ImageNet statistics, transposed to channel-first format, and batched before being passed to the runtime. To prevent this CPU-intensive preprocessing from blocking the async event loop, inference is offloaded to a thread pool executor, preserving the API’s ability to handle concurrent requests even under load. The constraint — single worker, limited cores, minimal memory headroom — became the forcing function for a clean, efficient inference pipeline.
What Building This Taught Me About Production ML
Schema design is model design. The most consequential design decision in this project had nothing to do with hyperparameter tuning. It was the choice to define a single, shared VitalCheckResponse[T] generic envelope that wraps every prediction across every domain. Every response carries a request ID, an inference timestamp in milliseconds, a standardized disclaimer, and a list of data source citations. This consistency means that any client consuming the API can build generic error handling, logging, and latency monitoring against a single contract — regardless of which endpoint they are hitting. The schema is the product.
Calibration matters more than accuracy. Several of the models in VitalCheck were deliberately tuned for real-world clinical utility rather than raw benchmark performance. The stroke model, for instance, operates on a dataset where only 4.8% of records are positive cases. A naive model trained to maximize accuracy would predict “no stroke” for every input and achieve 95.2% accuracy while being completely useless. The decision to lower the classification threshold from 0.5 to 0.30 — accepting more false positives in exchange for fewer false negatives — is a product decision masquerading as a technical one. Understanding that distinction is, perhaps, the core competency of applied ML engineering.
Honesty is a feature. The prediction interval on the insurance endpoint, the percentile context on the fitness endpoint, the explicit disclaimer on every response — these are not legal boilerplate or afterthoughts. They are the product reflecting an honest understanding of what probabilistic models can and cannot claim. Users who understand the limits of a tool use it correctly. An API that obscures uncertainty does not make better predictions; it just makes worse decisions easier.
Where VitalCheck Goes Next
Authentication and Rate Limiting: VitalCheck currently trusts all callers. The logical next layer is an API key management system — or integration with an API gateway like Kong or AWS API Gateway — to enable per-consumer rate limiting, usage analytics, and tiered access. Without this, the service cannot be safely exposed to public traffic at scale.
Model Retraining Pipeline: All eight models are currently trained offline and deployed as static artifacts. The natural evolution is a lightweight CI/CD-triggered retraining pipeline: when new labeled data becomes available, retrain, evaluate against a held-out validation set, and promote to production only if cross-validated metrics improve. Connecting the training scripts (currently isolated in the scripts/ directory) to an artifact registry and a deployment hook would close the loop from research to production.
Expanded Reference Populations: The fitness analytics endpoint is currently benchmarked against a 33-user FitBit convenience sample from 2016 — a limitation that is honestly disclosed in the API response. Integrating a larger, more demographically representative reference dataset would make the percentile comparisons genuinely meaningful and open the door to stratified benchmarks by age group, sex, or activity level. The architecture already supports this pattern through Parquet-backed reference lookups; only the data needs to improve.
VitalCheck API is open source under the MIT License. The full codebase, model registry, and API documentation are available in the repository.
Try It Out
Check out the live demo or explore the source code on GitHub.