Spaces:

digopala
/

ai-inference-architecture-healthcare

Running

App Files Files Community

ai-inference-architecture-healthcare / README.md

digopala

Update README.md

8934223 verified 29 days ago

preview code

raw

history blame contribute delete

4.75 kB

metadata

title: AI Inference Architecture for Healthcare
emoji: 🧠
colorFrom: indigo
colorTo: green
sdk: static
sdk_version: 1.0.0
app_file: index.html
pinned: true
tags:
  - healthcare
  - ai-inference
  - mlops
  - kubernetes
  - triton-inference-server
  - fastapi
  - hipaa-compliance
  - deep-learning
  - cloud-architecture
  - monitoring
description: >-
  Scalable, production-ready AI inference architecture for healthcare and pharma
  using Triton, FastAPI, and Kubernetes.

AI Inference Architecture for Healthcare

This project provides a scalable, production-ready AI inference architecture designed for healthcare and pharmaceutical applications. It integrates Triton Inference Server, FastAPI, and Kubernetes to support high-throughput model inference.

🚀 Key Features

Modular container-based architecture with FastAPI gateway
Supports NLP and CV models with optional preprocessing
Inference via Triton Inference Server using ONNX or TorchScript models
GitHub Actions-powered CI/CD pipeline to auto-deploy model updates
Kubernetes-based pod management, autoscaling, and volume mounting
Full observability stack: Prometheus + Grafana for metrics and monitoring
Compliant with HIPAA-aligned standards: secure APIs, logging, encryption

🧱 Architecture Overview

Healthcare/Pharma Clients → FastAPI Gateway → Optional Preprocessor → Triton Pod
       ↓                        ↓                            ↓             ↓
 Model Registry ← GitHub CI/CD Pipeline ← Kubernetes ← Monitoring (Prometheus + Grafana)

⚙️ Deployment Options

▶️ Local (Docker Compose)

docker compose up --build

☸️ Kubernetes (Production)

kubectl apply -f k8s.yaml
kubectl apply -f preprocessor.yaml
kubectl apply -f hpa.yaml

📦 Model Lifecycle

Train model locally or in pipeline (e.g., PyTorch/ONNX)
Push model to GitHub repository
GitHub Actions CI/CD triggers and pushes model to Model Registry
Kubernetes mounts model volume into Triton pod
Triton automatically reloads model

🔍 Monitoring and Observability

Metrics via Prometheus sidecar scraping port 8002 on Triton pod
Dashboards in Grafana track latency, throughput, failures

🧪 Sample Inference Request

curl -X POST http://localhost:8000/infer   -H "Content-Type: application/json"   -d '{"input": "Patient data or image here"}'

Enhancements Based on Peer Technical Review

Preprocessing Execution Model

The NLP/CV preprocessing stage runs as an independent Kubernetes microservice for isolation and scale. The FastAPI Gateway performs conditional routing:

content_type=image/* → CV preprocessor → Triton
content_type=text/* → NLP preprocessor → Triton
Already-normalized inputs → direct to Triton A lightweight schema-validation step remains in the gateway.

Model Lifecycle: Versioning, Promotion, Rollback

Models are versioned under /models/<name>/<version> (e.g., /models/ner/1).
CI/CD publishes to staging; promotion updates a release tag (e.g., current -> 2) for Triton to hot-reload.
Rollback re-points the tag to the last known-good (current -> 1).
Supports blue‑green (two deployments, Service selector switch) and canary (small % routed to a second Triton deployment).

Scalability & Resilience

HPA scales Triton pods based on CPU (and can extend to latency custom metrics).
Readiness/Liveness probes guard rollout and enable auto‑healing.
Gateway uses timeouts and retry on transient 5xx. If a pod is Unready, traffic shifts to healthy pods.

Security, Compliance & Audit

TLS in transit; optional mTLS inside cluster.
OAuth2/JWT at the gateway with per‑route scopes.
Audit logs (structured JSON with request_id) across gateway, preprocessors, and Triton; logs ship to ELK/Loki.
Optional PHI de‑identification in preprocessors; strict schema validation; data minimization and retention controls aligned to HIPAA/GDPR.

Data Flow & Validation

Gateway enforces MIME/JSON schema and rejects malformed/unauthorized requests.
Preprocessors normalize inputs (e.g., tokenize text, resize/normalize images).
Triton returns prediction JSON; gateway maps to a domain response schema and may redact fields per policy.

📌 See SECURITY.md for detailed security, compliance, and audit logging implementation.

📄 See preprocessor.yaml for deployment details of the NLP/CV preprocessing microservice.

📄 See hpa.yaml for Triton autoscaling configuration.

📂 File Reference

k8s.yaml → Triton deployment
preprocessor.yaml → NLP/CV preprocessing service
hpa.yaml → Horizontal Pod Autoscaler for Triton