Charles Cozad

Async image segmentation with SAM 3

An end-to-end async pipeline: a React frontend uploads an image plus a free-text concept prompt, a FastAPI service stores the raw image and dispatches work over NATS, and a Python worker running SAM 3 + OpenCV segments instances of that concept and returns the annotated result. MinIO in dev, real S3 in prod. Source on GitHub.

SAM 3 segmenting the concept 'cars' in a NYC street scene — 9 instances highlighted in the annotated view next to the original

Upload an image and a free-text concept prompt; the worker highlights matching instances. Here cars finds 9 instances, segmented in ~5 seconds on an NVIDIA T4.

The text prompt is the demo

Because the prompt is free text, the same imagewith a different concept picks out an entirely different set of instances. Here's the same street scene with buildings:

SAM 3 segmenting the concept 'buildings' in the same street scene — 13 building instances highlighted

Architecture

The API persists the upload and publishes a request on NATS; the worker segments and reports status and results back over NATS; the API applies those events to the job row. The worker never touches the database directly — communication is NATS-only, which keeps it swappable and replicable. Storage is env-switched: MinIO in dev, real AWS S3 in prod. The same API and worker images run in both; only env changes.

Architecture: a React SPA uploads to and polls a FastAPI service over HTTP; the API persists jobs to Postgres, stores raw images in MinIO/S3, and publishes segment.request on NATS; the GPU worker consumes the request, writes the annotated image to MinIO/S3, and publishes segment.status and segment.result back on NATS

For the conceptual background — when async queues earn their complexity, how to think about publishers, consumers, and brokers, and the design decisions that come with them (payload content, idempotency, retry policy) — see Increase System Responsiveness and Reliability With Queueing on my Substack.

Tech stack

  • SAM 3 (facebook/sam3.1 checkpoint) for text-prompted concept segmentation, served from a CUDA worker
  • OpenCV 4 for mask overlay rendering on the original image (OpenCV 5 has no PyPI wheels yet — the recipe in the repo explains the fallback)
  • FastAPI service with Alembic-managed Postgres for job state, persisting raw images and presigning URLs from object storage
  • NATS JetStream as the async message bus (segment.request, segment.status, segment.result)
  • MinIO for local dev object storage, AWS S3 in prod — env-switched, same code path
  • React + Vite + TypeScript frontend with optimistic UI on upload and ~2s polling for status
  • Docker Compose for local dev (make up) and prod (docker-compose.prod.yml + Caddy TLS reverse proxy)

Run it yourself

The infra and API run on any machine — clone, copy .env.example to .env, and make up. Uploads queue but stay pending without a GPU worker. For real segmentation, request access to the SAM 3 checkpoint on Hugging Face, drop an HF_TOKEN into your env, and make up-gpu on an NVIDIA host. No local GPU? The repo's test-on-AWS guide walks through running the dev stack on a rented GPU instance over an SSH tunnel.