I build production AI systems.
I also build the teams that ship them.
Twenty years in technology. Fifteen shipping mission-critical systems. Current focus: large language models, agentic AI, and the engineering enablement that lets a team move twice as fast next quarter.
About
I am a Principal Software Engineer based in California. Most recently I led engineering on a 0-to-1 multi-tenant AI communication insights platform. Now looking for the next interesting project and a long-term team. I want to solve hard problems people care about.
What I have shipped
- Production AI built on Graph RAG, LLMs, and agentic workflows
- Event-driven, serverless systems on AWS and Azure
- A React-based instant claim payout system that moved well over $1M
- Confidential compute on Intel SGX for medical AI use cases
- Multi-sensor data capture for Reality Labs at Meta
How I work with teams
- Six years as a people manager. Five-plus as a technical lead
- Grew an architecture practice from four to eleven engineers in two years
- Hundreds of engineers and aspiring engineers mentored over the years. A few examples:
- Mentoring a security professional through the move to independent consulting
- Helped a senior engineer retool for the AI era
- Coached a new engineer into becoming a celebrated lead
- Introduced a teenager to game development. They went on to major in computer science
- Repeat speaker at high school career days
Where I show up publicly
- LinkedIn: 79K impressions and 27K members reached over the last 12 months (linkedin.com/in/charlescozad)
- Substack: long-form essays for readers who want the detail. 4.3K views, 47 subscribers, both climbing (charlescozad.substack.com)
- Podcast: guest on the Build Different Podcast, episode 7 “What is Simplicity”, on why simple solutions are the hardest to build and how young engineers should approach AI tools
- Open-source reference designs at github.com/ccozad/ml-reference-designs
What I am working on
Three problem shapes I keep coming back to:
- Production AI systems. Graph RAG. Agentic workflows. Evaluation frameworks that grade output by code, by model, and by human, each catching a different class of mistake.
- Async offload for long-running operations. The pattern that accepts a user request in under a hundred milliseconds and runs a five-minute job behind it without losing the thread. This one comes up in conversations more than any other backend topic right now.
- Engineering enablement. Pairing on architecture. Code review with intent. Test-driven development. Clean statement-of-work hygiene. The unglamorous habits that compound across a team.
Below are three demos that show this in motion. Each runs in your browser.
The async-offload reference (in progress)
Almost every team I have spoken with this year is in some stage of figuring out how to make their AI features feel snappy without dropping the long-running work on the floor.
I am writing this one up as a public reference. Code first. Posts second.
Contact
- LinkedIn: linkedin.com/in/charlescozad
- GitHub: github.com/ccozad
- Substack: charlescozad.substack.com
- Etsy: etsy.com/shop/CozadFamilyShop (laser-engraved cutting boards, awards, and memorials. The machines and the process are part of the fun.)
A question
What is the long-running operation in your stack right now that users wait on more than they should? I would like to hear how you are thinking about it.
