Async image segmentation with SAM 3
An end-to-end async pipeline: a React frontend uploads an image plus a free-text concept prompt, a FastAPI service stores the raw image and dispatches work over NATS, and a Python worker running SAM 3 + OpenCV segments instances of that concept and returns the annotated result. MinIO in dev, real S3 in prod. Source on GitHub.

Upload an image and a free-text concept prompt; the worker highlights matching instances. Here cars finds 9 instances, segmented in ~5 seconds on an NVIDIA T4.
The text prompt is the demo
Because the prompt is free text, the same imagewith a different concept picks out an entirely different set of instances. Here's the same street scene with buildings:

Architecture
The API persists the upload and publishes a request on NATS; the worker segments and reports status and results back over NATS; the API applies those events to the job row. The worker never touches the database directly — communication is NATS-only, which keeps it swappable and replicable. Storage is env-switched: MinIO in dev, real AWS S3 in prod. The same API and worker images run in both; only env changes.

For the conceptual background — when async queues earn their complexity, how to think about publishers, consumers, and brokers, and the design decisions that come with them (payload content, idempotency, retry policy) — see Increase System Responsiveness and Reliability With Queueing on my Substack.
Tech stack
- SAM 3 (
facebook/sam3.1checkpoint) for text-prompted concept segmentation, served from a CUDA worker - OpenCV 4 for mask overlay rendering on the original image (OpenCV 5 has no PyPI wheels yet — the recipe in the repo explains the fallback)
- FastAPI service with Alembic-managed Postgres for job state, persisting raw images and presigning URLs from object storage
- NATS JetStream as the async message bus (
segment.request,segment.status,segment.result) - MinIO for local dev object storage, AWS S3 in prod — env-switched, same code path
- React + Vite + TypeScript frontend with optimistic UI on upload and ~2s polling for status
- Docker Compose for local dev (
make up) and prod (docker-compose.prod.yml+ Caddy TLS reverse proxy)
Run it yourself
The infra and API run on any machine — clone, copy .env.example to .env, and make up. Uploads queue but stay pending without a GPU worker. For real segmentation, request access to the SAM 3 checkpoint on Hugging Face, drop an HF_TOKEN into your env, and make up-gpu on an NVIDIA host. No local GPU? The repo's test-on-AWS guide walks through running the dev stack on a rented GPU instance over an SSH tunnel.