The Actual Builder's Toolbox

Isometric city illustration of a builder's toolbox stack

A few months ago I was deep into MemoriA, an AI companion platform I have been building on an all-TypeScript, all-Cloudflare stack. I had a feature in my head: persistent memory with semantic recall, tied to a small UI for testing how recall behaved across sessions. In the older version of my working life, that idea would have spent two weeks in a Notion doc, another week as a sketch, and a third as a vague meeting with myself before I touched a keyboard.

Instead, I described the feature to Codex, let it draft the schema and storage layer in one thread, opened a second worktree to prototype the UI in Claude Code, and by that evening I had something I could click.

It did not work well. The recall logic was confused, the embeddings layer was over-engineered for what I needed, and the UI was uglier than I would admit in print. But it was real, and being real meant I could see what was wrong instead of arguing with myself about what might be wrong. That experience changed how I think about tools, which is why this chapter stays close to actual projects.

Most chapters about builder tools read like a parade of vendor logos. I want to do something more useful: describe the stack I actually use, the projects that pushed me toward each piece, and the moments where I changed my mind about how the pieces fit together. The functions matter more than the brand names. The brand names will change. The shape of the work, if I get this right, should outlast any one logo.

Visual map of the builder's toolbox stack

The stack as I actually use it: exploration, memory, review, and deployment in one chain.

The Bet Underneath the Stack

My stack reflects one bet: the value of any tool depends on what the next tool in the chain can do with its output.

A code generator that produces beautiful diffs but drops them into a context where nothing reviews them is not powerful. It is loud. A deployment platform that ships fast but cannot keep the work alive after the demo is not infrastructure. It is theater. The pieces only matter as part of a sequence.

I think about the toolbox as handoffs rather than as a list. An idea has to travel from the moment it is still vague through to something a real user can hit, and every step in between is a place where the work can either gain clarity or lose it. The right tool at each step is the one that hands its output to the next step in a form the next step can use.

That sounds obvious until you realize how many tools in this space are designed to be impressive in isolation.

I will walk through my stack roughly in the order the work moves: exploration, implementation, memory, workflow packaging, artifacts, public surface, and durable infrastructure. At each step I want to be clear about what I use, what I tried and dropped, and where I am still uncertain.

Exploration: Where the Idea Earns Its First Receipt

The thing I want from an exploration tool is not speed. It is evidence.

Prose lets an idea flatter itself. A workflow that touches code, opens a browser, and produces a screenshot or a failed test makes the idea answer for itself.

Google Antigravity earns its place in the stack because it sits between the conversation and the codebase in a way most tools do not. I can hand it a soft direction - "see if this kind of recall behavior is even worth building" - and it comes back with a plan, a small implementation, a browser walkthrough, and a list of things that did not work. That last part is the one I care about. A tool that only returns successes is not exploring; it is performing.

I caught myself mistrusting Antigravity for a while. The first walkthroughs felt too clean, and I worried I was being seduced by the artifact. So I made a rule: anything Antigravity produces in exploration mode does not get to harden into a decision until I have rebuilt at least one piece of it manually, in Codex or Claude Code, with full review. The artifact is permission to investigate further, not permission to commit.

That rule has saved me real time. When I was scoping the hf-papers-trends pipeline - classification, trend aggregation, forecasting on Hugging Face Daily Papers - Antigravity produced a plausible-looking pipeline in an afternoon. It would have shipped. It was also wrong about the shape of the input data in a way that would have cost me a week of corrections downstream. The artifact was useful precisely because it surfaced the wrongness early, on a small scale, where I could see it.

Premium chapter

Keep reading the full chapter

Enter the access code to open the complete chapter and keep your place here.

Chapter support

Supporting Bibliography

The bibliography is part of the full content. Enter the access code to open the sources for this chapter.

9 sources

Premium bibliography

Unlock the bibliography

Use the same code to open the references behind this chapter.