Edge AI Inference for Bandwidth-Constrained Clinical Environments
Edge AI Inference for Bandwidth-Constrained Clinical Environments
Sankara Healthcare Foundation runs diabetic retinopathy screening camps in rural India where connectivity is unreliable and cloud round-trips are not acceptable latency. Kannoli AI was designed specifically for this constraint.
The constraint profile
A typical camp deployment has: 2G/3G connectivity averaging 512Kbps with 40% packet loss, shared between 3–5 devices. Cloud inference is not viable. The model must run locally, with results syncing opportunistically.
Model quantization choices
We evaluated INT8 and INT4 quantization of our EfficientNet-B3 backbone. INT8 with per-channel quantization delivered a 3.8x size reduction (from 22MB to 5.8MB) with less than 0.3% accuracy degradation on our validation set. INT4 showed unacceptable accuracy loss for clinical use.
The quantized model runs at 340ms per image on a Snapdragon 660 — well within clinical workflow requirements.
Cloudflare Workers as the inference boundary
For the subset of camps with reliable connectivity, we route inference through Cloudflare Workers AI, which runs inference at the nearest edge node rather than a central data center. This reduces round-trip latency from ~800ms (central cloud) to ~120ms (regional edge) for the majority of Indian camp locations.
The offline-first architecture treats Cloudflare Workers as an acceleration layer, not a dependency — the device degrades gracefully when connectivity drops.