Last month I wrote about context drift: the quiet way AI sessions lose the thread of what you decided. I described two tools I was using to fight it. A re-entry block that loads at the start of every session. Session recaps that capture what happened before you close the thread.
I said I was still testing whether those tools would hold up under real pressure. They didn’t. Not completely.
The re-entry block and the recap were good at preserving what was built and why. But I kept running into a specific failure mode that neither tool addressed well enough. When I came back to a build after a few days off, the new session would confidently suggest an approach I had already tried and ruled out.
That is the most expensive kind of drift. Not losing a decision. Repeating a failure.
. . .
Where It Broke Down
I noticed the pattern during GigOps development. I spent a full session debugging the email intake routing. Replies from clients were arriving as new inquiries instead of being matched to the existing conversation. The problem was that Gmail thread IDs did not reliably match between the original message and the reply, so I had to build a sender-based fallback. That work touched multiple files, involved detailed console logging to trace the mismatch, and took real time to get right.
Two sessions later, a new thread started suggesting thread-ID-based matching as if it were a fresh idea. The AI wasn’t broken. My handoff just didn’t make the failed approach visible enough to prevent it.
My re-entry block had the architecture constraints. It had the current state. It had a “What NOT To Do Next” list at the bottom. But that list was a short summary, not a detailed record. It said things like “don’t rely on thread ID matching alone” without explaining what happened when I tried it. The next agent would see the prohibition but not the reasoning, and reasoning is what holds across sessions. Without it, the constraint looks arbitrary, and arbitrary constraints get questioned and overridden.
What I was missing was a proper graveyard.
. . .
What the Graveyard Is
The graveyard is the record of what you tried that did not work, what specifically went wrong, and why it is off the table. Not a list of errors. A list of ruled-out approaches with enough context that the next agent understands the reasoning, not just the outcome.
I formalized it as a section in my re-entry block template, between “What Was Done This Session” and “Files Touched.” Every entry follows a three-field format:
The approach: what you tried. What happened: the specific failure. An error message, unexpected behavior, or the reasoning that ruled it out. Why it’s off the table: the constraint or tradeoff that makes this approach wrong for this project, not just wrong in the moment.
That difference is the whole point. “Tried n8n, switched to Next.js API routes” tells you nothing. “Tried n8n for the automation backend. The free trial expired and deleted all workflow projects twice, with no recovery option. Rebuilt from scratch both times. Switched to self-contained Next.js API routes to eliminate external platform dependency” tells you everything. It also prevents the next session from suggesting n8n as if it were a fresh idea, because the reasoning makes clear the constraint is structural, not incidental.
“An empty graveyard is a red flag, not a sign of a clean session.”
The same discipline applies to design-phase decisions, not just implementation failures. Early in GigOps I chose Google Sheets over Supabase as the database layer. That was not a failed experiment. It was a deliberate tradeoff: the band leader who uses the system needs to open the spreadsheet and read the data directly. Row-level security and a SQL query layer would have added complexity without serving the actual user. That reasoning belongs in the decision log, not the graveyard. That distinction is worth enforcing. The graveyard is for things that were tried and broke. The decision log is for things that were weighed and chosen. Mixing them up weakens both.
The graveyard is required every session. If nothing failed, I still document what alternatives were considered and why they were not chosen. There is something that was weighed and set aside in every session. If you are not capturing that, the next agent has no way to distinguish between “we haven’t tried this yet” and “we tried this and it broke.”
The graveyard also carries forward. Entries from previous sessions stay unless the constraint that made them fail has changed. This is what makes it different from “What NOT To Do Next,” which is the quick-scan summary at the bottom of the re-entry block. Both exist. The graveyard has the full reasoning. “What NOT To Do Next” is the short list you glance at before starting work. They feed each other, but they are not the same thing.
. . .
How It Fits Into the Handoff
The re-entry block now has a clear structure that I think of in four layers, even though the template does not label them that way.
The first layer is the code snapshot. “Files Touched” and “Exact Changes Applied” in the template. What files were modified, what changed at the function or schema level, before and after state. What’s usually missing is specificity. “Updated the routing logic” is not a code snapshot. “Added sender-based fallback to handleIntakeRouting in routes/intake.ts because Gmail thread IDs do not reliably match between original and reply messages” is.
The second layer is the decision log. “What Was Done This Session” and “PM Decisions Made” in the template. Why each choice was made, what the constraints were, what the tradeoffs looked like. The rule I enforce is that reasoning is required, not optional. A decision without its reasoning is a fact that invites the next agent to second-guess it.
The third layer is the graveyard. The failed approaches with full context. This was the missing piece. It is now a required section in the template, and my handoff preflight catches it: question eight asks whether failure modes are documented, which includes both paths already ruled out and anticipated failure points for the current approach. If the answer is no, the handoff does not pass.
The fourth layer is the active task state. “Current Blocker / Active Work” and “Recommended Next Prompt” in the template. Where you stopped, what you were in the middle of, and what the exact next step is. The recommended next prompt has to be complete enough to paste into a new session without editing. It includes current state, exact errors, file paths, architecture constraints, what “done” looks like, and what not to do. If the next agent would need to ask a clarifying question before starting, the prompt is not ready.
. . .
The Preflight
I run a 10-question preflight before finalizing any handoff. It started as a sanity check and became the thing that catches the gaps I would otherwise miss.
The questions are concrete. Is current state clear? Are recent changes listed with specific file paths? Are architecture constraints explicit? Are failure modes documented? Is the next task narrowly bounded, meaning completable in one session? Is verification defined with explicit checkboxes including runtime verification, not just “it works”?
The distinction between “fixed in code” and “confirmed working in runtime” is one I enforce throughout. Those are not the same thing. Code that looks right is not verified. Verification means type check passes, no console errors, and the specific user-facing behavior was observed in a running application.
If any preflight question gets a “no,” I note what is missing and revise before finalizing. The handoff does not ship incomplete.
. . .
What Took Me the Longest to Formalize
The graveyard was the last piece I added to my handoff system. I had code snapshots early. Decision logs came naturally from years of documentation work. But writing down what failed, with enough detail to prevent repeating it, felt like overhead until I watched the same failure come back twice.
I think it gets skipped because it feels like recording a loss. The other sections document progress. The graveyard documents what didn’t work. It’s less satisfying to write, and it’s easy to convince yourself you’ll remember. You won’t. The next agent definitely won’t.
. . .
What I Am Still Figuring Out
The framework works well for structured dev sessions where I am building features and making architectural decisions. I have not yet stress-tested it against messier workflows: exploratory research, content drafts that go through multiple revisions, projects where the “architecture” is more conceptual than code-level.
I also have a separate validation tool for when I switch between AI models entirely, which runs a longer checklist focused on cross-layer dependencies. That is a different problem from session-to-session continuity, and I am still working out where the two tools overlap and where they diverge.
For now, the system is working. The re-entry block loads the full picture. The graveyard prevents repeated failures. The preflight catches gaps before they become bugs. And every session starts from a deliberate handoff instead of a reconstruction from memory.
If you read the last post and started using re-entry blocks, the next step is the graveyard. Document what you tried that did not work. Use the three-field format: what you tried, what happened, why it is off the table. Make it required, not optional. That one section will save you more time than anything else in your handoff.
“Not losing a decision. Repeating a failure.”
The handoff is where most AI work quietly falls apart. The graveyard is where you stop it from happening twice.
What is the most expensive bug you have introduced by retrying something that had already failed?
This is Latina-in-the-Loop — a running exploration of what it means to build, question, and steward intelligent systems in real time. Follow along.