Async image segmentation with SAM 3

An end-to-end async pipeline: a React frontend uploads an image plus a free-text concept prompt, a FastAPI service stores the raw image and dispatches work over NATS, and a Python worker running SAM 3 + OpenCV segments instances of that concept and returns the annotated result. MinIO in dev, real S3 in prod. Source on GitHub.

Source on GitHub →SAM 3 setup recipe

SAM 3 segmenting the concept 'cars' in a NYC street scene — 9 instances highlighted in the annotated view next to the original

Upload an image and a free-text concept prompt; the worker highlights matching instances. Here cars finds 9 instances, segmented in ~5 seconds on an NVIDIA T4.

The text prompt is the demo

Because the prompt is free text, the same imagewith a different concept picks out an entirely different set of instances. Here's the same street scene with buildings:

SAM 3 segmenting the concept 'buildings' in the same street scene — 13 building instances highlighted

Architecture

The API persists the upload and publishes a request on NATS; the worker segments and reports status and results back over NATS; the API applies those events to the job row. The worker never touches the database directly — communication is NATS-only, which keeps it swappable and replicable. Storage is env-switched: MinIO in dev, real AWS S3 in prod. The same API and worker images run in both; only env changes.

For the conceptual background — when async queues earn their complexity, how to think about publishers, consumers, and brokers, and the design decisions that come with them (payload content, idempotency, retry policy) — see Increase System Responsiveness and Reliability With Queueing on my Substack.

Tech stack

SAM 3 (facebook/sam3.1 checkpoint) for text-prompted concept segmentation, served from a CUDA worker
OpenCV 4 for mask overlay rendering on the original image (OpenCV 5 has no PyPI wheels yet — the recipe in the repo explains the fallback)
FastAPI service with Alembic-managed Postgres for job state, persisting raw images and presigning URLs from object storage
NATS JetStream as the async message bus (segment.request, segment.status, segment.result)
MinIO for local dev object storage, AWS S3 in prod — env-switched, same code path
React + Vite + TypeScript frontend with optimistic UI on upload and ~2s polling for status
Docker Compose for local dev (make up) and prod (docker-compose.prod.yml + Caddy TLS reverse proxy)

Run it yourself

The infra and API run on any machine — clone, copy .env.example to .env, and make up. Uploads queue but stay pending without a GPU worker. For real segmentation, request access to the SAM 3 checkpoint on Hugging Face, drop an HF_TOKEN into your env, and make up-gpu on an NVIDIA host. No local GPU? The repo's test-on-AWS guide walks through running the dev stack on a rented GPU instance over an SSH tunnel.