The Process Storage Gap
Published:
You ask Claude Code to refactor your authentication system from basic auth to OAuth2. It responds brilliantly:
“I’ll approach this in 5 phases. First, extract interfaces to decouple the authentication provider. Second, implement the OAuth2Provider class. Third, update the user model for external provider IDs. Fourth, migrate existing sessions. Fifth, update API endpoints and remove deprecated code.”
Excellent. Systematic thinking. Clear dependencies. You’re impressed.
Phase 1 executes perfectly. The agent extracts clean interfaces, maintains backwards compatibility, writes comprehensive tests. You commit the changes. Phase 1 complete.
Phase 2 starts well. The OAuth2Provider implementation follows the interfaces precisely. But midway through, the agent suggests adding a caching layer that contradicts the stateless design principle you established in phase 1. When you point this out, it apologises and pivots to a different approach that would require changing the interfaces you just carefully extracted.
By phase 3, the degradation accelerates. The agent updates the user model but forgets the decision from phase 2 about how to handle users with both old and new authentication methods during migration. The implementation technically works but doesn’t align with the transition strategy it proposed in the original plan.
Phase 4 involves the agent suggesting a data migration that contradicts the phased rollout approach from the initial plan. By phase 5, you’re spending more time correcting course than you would have implementing it yourself.
This isn’t a Claude Code problem. It’s a pattern I’ve observed across AI agent implementations: Cursor, GitHub Copilot, Cline, they all exhibit the same degradation on complex multi-step work. The agent that brilliantly planned the approach can’t maintain that plan whilst executing it.
In my article on the database blind spot, I documented how AI agents default to text-based storage (markdown files, CSV) when relational databases would be 20 to 100 times faster. The capability exists: agents write excellent SQL when prompted. What’s missing is the instinct to recognise structural data problems.
This is the same pattern, but worse. Agents don’t just fail to recognise when they need structured storage for data. They fail to recognise when they need structured storage for process state. Just as they default to parsing frontmatter when they should use databases, they default to conversation context when they should use process control infrastructure.
The result? Predictable degradation on any work complex enough to require coordinating multiple steps across time. And just like the database gap, the workarounds reveal the solution: n8n workflows, Temporal orchestration, PROJECT_STATE.md files. Users are manually building the process storage infrastructure that agents should maintain natively.
Why This Happens
The root cause isn’t that agents can’t handle complexity. It’s that they’re using the wrong storage architecture for process control. Conversation context works brilliantly for single-turn interactions. It fails predictably for multi-step workflows.
Think about what happens in that authentication refactoring. The agent proposes a 5-phase plan. Excellent start. Then it begins executing phase 1 whilst holding the entire plan in context. Phase 1 completes successfully, adding implementation details, test results, and integration notes to the conversation. Phase 2 begins, and now the context contains: the original plan, all of phase 1’s implementation details, the current phase 2 work, and awareness of phases 3 through 5 still pending.
By phase 3, the context holds: original plan, phase 1 implementation archaeology, phase 2 implementation details, current phase 3 work, plus vague awareness of phases 4 and 5. The agent is trying to maintain everything simultaneously in an undifferentiated chronological stream. It’s using conversation context like a text file when it needs a database.
The Missing Storage Layers
What agents actually need are four distinct storage layers, each optimised for different aspects of process control:
-
Process definition storage: The workflow structure itself. Which steps exist, what depends on what, what constraints apply to all phases. This should persist unchanged whilst execution progresses. Instead, it gets buried under implementation chatter and gradually becomes inaccessible.
-
State tracking storage: Where we are right now. What’s complete, what’s in progress, what’s blocked, what’s next. This should update cleanly as work progresses: “Phase 2 complete. Phase 3 active.” Instead, determining current state requires scanning through accumulated conversation history.
-
Decision log storage: Why we made specific choices. “In phase 1, we chose stateless design because of X. This constrains all subsequent phases.” These decisions should remain queryable: “What constraints are active?” Instead, they’re scattered through conversation history, becoming invisible as new implementation details accumulate.
-
Scoped context storage: What’s relevant for the current work. Phase 4 needs specific outputs from phases 1 and 2, but not their implementation details. It needs active constraints from earlier decisions. It doesn’t need the debugging conversation from phase 2 or the edge cases discovered in phase 1. This scoping should happen automatically. Instead, everything accumulates together.
Why Conversation Context Fails as Process Storage
Conversation context is chronological. It accumulates everything in the order it happened. Process control needs structured, queryable storage.
Try asking an agent mid-way through a complex workflow: “What architectural constraints are still active?” The information exists somewhere in the conversation history, but there’s no mechanism to query for it. The agent has to scan chronologically through implementation details, discussions, and decisions, hoping to extract the relevant constraints without missing any or including outdated ones.
Or: “Show me just the incomplete steps.” Again, the information exists, but it’s interwoven with completed work details, current implementation context, and future planning. There’s no filter mechanism. The agent reconstructs state by replaying history rather than querying structured storage.
Or: “What does phase 4 specifically need from previous phases?” The dependencies exist, but they’re implicit in conversation flow rather than explicit in structured storage. The agent can’t scope context appropriately because everything is equally present in the chronological stream.
Worst of all: “Phase 2 is complete. I don’t need its implementation details anymore, just its key outcomes.” Conversation context has no archive mechanism. Completed work stays in context with the same weight as active work. The signal-to-noise ratio degrades with every completed phase.
The Cognitive Load Problem
Research on human task-switching found it can cost up to 40% of productive time due to cognitive load. Switching between tasks requires actively maintaining goal-relevant information whilst disengaging from one context and engaging with another. When people try to track too many concurrent concerns, working memory overloads and performance degrades measurably.
AI agents face the same fundamental constraint, just implemented differently. They’re trying to hold simultaneously: the overall plan structure, the current implementation details, all previous phase outcomes, active architectural constraints, discovered edge cases, pending work awareness, and integration requirements. All of this accumulates in one undifferentiated context stream.
Phase 1 works perfectly because the context contains just: the plan and phase 1 details. Phase 2 adds its details whilst trying to maintain phase 1 awareness. By phase 4, the context contains detailed information from phases 1, 2, and 3, current phase 4 work, and awareness of upcoming phases. The cognitive load doesn’t increase linearly. It compounds.
The agent can technically access all this information. Context windows of 200,000 tokens handle it comfortably. But processing massive undifferentiated context degrades reasoning quality. The right information drowns in a sea of formerly relevant information that should have been archived or summarised.
This is exactly the same problem I documented in the database blind spot. Agents default to storing data in text files, then parsing those files to answer queries. It technically works for small datasets. At scale, it collapses. You need structured storage with indexes, schemas, and query capabilities.
Process state is the same. Conversation context works for simple tasks. For complex multi-step workflows, you need structured process storage. The capability exists in agents to work with structured information. What’s missing is recognising when process complexity demands it.
The Database Parallel
In the database blind spot, I documented how AI agents default to text-based storage when relational databases would be 20 to 100 times faster. Ask an agent to track 200 product features across pricing tiers with dependency relationships, and it suggests creating markdown files with frontmatter metadata. The capability to design database schemas exists. Agents write excellent SQL when prompted. What’s missing is the instinct to recognise structural data problems.
The process storage gap is the same pattern applied to a different domain:
| Database Blind Spot | Process Storage Gap |
|---|---|
| Default: Text files, CSV, markdown | Default: Conversation context |
| Should use: Relational databases with schemas, indexes, queries | Should use: Process storage with state tracking, scoped context, decision logs |
| Missing: Recognition of structural data problems | Missing: Recognition of process control problems |
| Result: 20-100x slower queries, O(n) scans, maintenance nightmare | Result: Degradation on multi-step work, lost coherence, context pollution |
| Workaround: Users explicitly prompt “use a database” | Workaround: Users build external process infrastructure (n8n, PROJECT_STATE.md) |
| When it breaks: At approximately 1,000 records, performance collapses | When it breaks: At approximately 8-10 steps, coherence collapses |
Both gaps stem from the same root cause. In the database article, I wrote:
“This decision-making chain is rarely documented in the code that LLMs train on. What’s documented is the resulting schema and queries. The ‘why I chose this approach’ reasoning is lost.”
The same applies to process storage. Agents have seen thousands of code implementations. They’ve seen multi-step workflows execute. They’ve probably seen orchestration platform configurations. What they haven’t seen is the architectural thinking that precedes those choices: “I recognised this workflow has 15 steps with complex dependencies. That means I need structured process storage with state tracking, not just sequential execution in conversation context.”
The training data shows the resulting n8n workflow diagram or the Temporal code. It doesn’t show the senior engineer’s thought process: “This is complex enough that conversation context will fail. I need to separate process definition from state tracking from execution context. Let me structure this appropriately from the start.”
The Same Capability, Different Recognition
Here’s what makes both gaps particularly frustrating. The capability exists in both cases.
For databases: agents write sophisticated SQL queries, design normalised schemas, create proper indexes, implement complex joins. They have the technical knowledge. What’s missing is recognising: “This is a database problem” before being explicitly told.
For process storage: agents can track state when you manually create PROJECT_STATE.md files and prompt them to update it. They can maintain decision logs when you establish the structure. They can scope context when you explicitly create fresh conversation threads. The capability is there. What’s missing is recognising: “This is complex enough to need process storage” and initiating that structure autonomously.
Both gaps reveal the same limitation: agents excel at implementation once the approach is decided. They struggle with the meta-level question of which approach to choose. The architectural instinct, the pattern recognition that says “I’ve seen this category of problem before, and here’s the appropriate structure,” that’s what’s missing from current agent capabilities.
And in both cases, the workarounds reveal the solution. When users consistently build external scaffolding to compensate for missing agent capabilities, that demonstrates both the reality of the gap and the viability of the solution pattern. People wouldn’t invest in building n8n workflows or maintaining PROJECT_STATE.md files if conversation context was adequate. The fact that sophisticated users converge on these structured approaches independently validates that this is the right pattern agents should learn natively.
What Process Storage Should Look Like
To understand what’s missing, let’s examine what proper process storage would provide. This is what platforms like n8n and Temporal offer externally, and what agents should maintain natively. Using our authentication refactoring example, here’s what each storage layer should contain:
Process Definition Layer
The workflow structure itself: steps, dependencies, constraints. This persists throughout execution without being buried under implementation details.
Workflow: OAuth2 Migration
Steps:
1. Extract interfaces [depends: none]
2. Implement OAuth2Provider [depends: step 1]
3. Update user model [depends: step 1]
4. Migrate sessions [depends: steps 2, 3]
5. Update API endpoints [depends: step 4]
Architectural constraints:
- Maintain backwards compatibility (all steps)
- Stateless design (established step 1)
- Phased rollout (step 4 onwards)
This definition remains accessible regardless of how much implementation work has been completed. The agent can query: “What are the dependencies for step 4?” and get a precise answer without scanning conversation history. The constraints stay visible even when you’re deep in step 3’s implementation details.
State Tracking Layer
Current progress: what’s complete, what’s active, what’s next. This updates cleanly as work progresses without accumulating implementation archaeology.
Current status: Step 3 in progress
Completed:
- Step 1: Interfaces extracted ✓
- Step 2: OAuth2Provider implemented ✓
In progress:
- Step 3: User model updates
- Fields added ✓
- Migration script: in progress
Blocked: None
Next: Step 4 (waiting on step 3 completion)
The agent maintains clean state: “We’re at step 3. Steps 1 and 2 are done. Step 4 waits on step 3.” No need to replay the implementation conversations from steps 1 and 2 to determine current status. State tracking answers “where are we?” without requiring historical context reconstruction.
Decision Log Layer
Why specific choices were made and what they constrain going forward. Decisions stay queryable without being lost in implementation chatter.
Decision [Step 1, 2025-12-10]: Use generic IAuthProvider interface
Rationale: Enables multiple auth providers in future
Impact: All auth implementations must implement this interface
Active constraint: Yes
Decision [Step 2, 2025-12-12]: Stateless token handling
Rationale: Aligns with existing session management
Impact: No persistent token storage, refresh logic handles expiration
Active constraint: Yes
Decision [Step 3, 2025-12-14]: Dual auth support during migration
Rationale: Zero-downtime rollout requirement
Impact: User model must track auth provider type
Active constraint: Until step 5 complete
When the agent reaches step 4, it can query: “What architectural constraints are active?” and receive precisely the relevant decisions. It doesn’t need to remember or scan through the debugging conversation from step 2 where the stateless design was discussed. The decision log maintains queryable architectural context.
Scoped Context Layer
What’s relevant for current work: dependencies from previous steps, active constraints, specific technical context. Not everything that ever happened.
Context for Step 3:
Required from previous steps:
- IAuthProvider interface definition (step 1)
- OAuth2Provider token structure (step 2)
- Stateless design constraint (step 1)
NOT needed:
- Step 1 implementation details (archived)
- Step 2 error handling edge cases (archived)
- Future step planning details (not yet relevant)
Active focus:
- User model schema changes
- Migration script for existing users
- Testing backwards compatibility
This scoping is critical. Step 3 needs to know about the IAuthProvider interface structure from step 1, but it doesn’t need the conversation about why we chose async methods or the edge cases we discovered during step 1 implementation. Those details are archived. Step 3 gets rich context for its specific work without the cognitive load of unrelated details.
Key Properties of Structured Process Storage
What makes this different from conversation context:
-
Queryable: “What constraints are active?” returns a specific list of current architectural constraints, not a chronological conversation scan.
-
Filterable: “What steps are incomplete?” returns steps 3 through 5 explicitly, without requiring the agent to reconstruct state from conversation history.
-
Scoped: “What does step 4 need from previous steps?” returns specific dependencies and their current values, not the entire conversation history from steps 1 through 3.
-
Archivable: Step 2 is complete. Its implementation details are stored but not in active context. The agent maintains “Step 2 outcome: OAuth2Provider implemented with stateless token handling” without holding all the debugging conversations and edge case discoveries that occurred during step 2 execution.
This is exactly what databases provide for data: structured storage with query capabilities, indexes for efficient access, schemas that enforce consistency. Process storage needs the same architectural properties, just applied to workflow state rather than data records.
Current Workarounds and What They Reveal
The sophistication of user-developed workarounds demonstrates both the reality of the process storage gap and the viability of structured solutions. When teams independently converge on similar compensating patterns, that reveals genuine capability boundaries rather than user misunderstanding.
External Orchestration Platforms
Teams building production AI workflows increasingly adopt dedicated orchestration infrastructure. n8n provides visual workflow design with explicit state management. Temporal handles complex distributed coordination with durable execution guarantees. LangGraph offers agent-specific orchestration with built-in state tracking.
These platforms provide exactly what conversation context lacks. Workflow definitions persist as queryable structures: “Here are the steps, here are the dependencies, here’s what constrains each phase.” State tracking updates cleanly: “Step 3 is in progress, steps 1 and 2 are complete, step 4 waits on step 3.” Clean handoffs between phases prevent context pollution: each step gets focused context without accumulated historical debris. Error recovery and retry logic work because state is structured, not buried in chronological conversation.
The investment these platforms require validates the gap’s significance. Setup and configuration demand substantial overhead. Teams must learn new systems, integrate them with existing workflows, and maintain platform-specific implementations. For straightforward multi-step development work, this shouldn’t be necessary. Yet teams accept the cost because conversation context fails predictably.
What these platforms demonstrate is what good process storage looks like in practice. The workflow diagrams aren’t just visual aids: they’re queryable process definitions. The state tracking isn’t just status dashboards: it’s structured storage that enables precise queries. The handoff mechanisms aren’t just UI conveniences: they’re architectural separation between coordination and execution.
The sophistication of these solutions reveals that users aren’t working around minor inconveniences. They’re building the storage infrastructure that agents should maintain natively.
Manual State Files
In Helping AI Agents Remember, I documented systematic approaches users develop for maintaining project state across sessions. PROJECT_STATE.md files track current goals, active phases, recent decisions, and next steps. DECISIONS.md provides chronological logs of architectural choices. Session handoff notes capture mental state between work periods.
These files serve as externalised process storage. They separate “what I’m tracking” (process state) from “what I’m implementing” (execution context). Agents reference these files to reload context efficiently: “Read PROJECT_STATE.md and tell me what we’re working on” provides clean recovery without replaying entire conversation histories.
The pattern works. Teams using systematic state file approaches maintain coherence across complex multi-phase projects that would otherwise collapse under context pollution. The slim state in external files combined with rich execution context in conversation enables sustained progress.
But the awkwardness reveals the gap. Users must manually maintain these files. Agents don’t naturally recognise when state files need updating without explicit prompting. Discipline is required to keep state current and accurate. The entire scaffolding represents cognitive work users do that agents should handle autonomously.
What makes this particularly telling: sophisticated users independently converge on remarkably similar patterns. The four-layer structure I described (process definition, state tracking, decision log, scoped context) matches what experienced practitioners build manually. They’ve discovered through painful experience what structured process storage requires. Agents should learn these patterns natively rather than requiring every user to rediscover them.
Manual Context Management
Experienced AI users develop systematic context management practices that work around agent limitations. Starting fresh conversation threads for major phase transitions prevents context pollution from previous phases. Using separate agent instances for different concerns manually implements what should be automatic context scoping. Explicit context reloading protocols (“Read these three files before we continue”) compensate for missing agent memory. Copy-pasting state between contexts distils what’s relevant from polluted contexts into clean ones.
These aren’t prompting techniques. They’re architectural compensations. Users are manually implementing the context segmentation and scope management that agents should initiate autonomously. The fact that systematic users develop these protocols independently validates that the underlying approach is sound.
What this reveals: users recognise context pollution when it happens and know how to address it through fresh contexts. They understand that “what I’m tracking” should separate from “what I’m implementing.” They’ve learned through experience when accumulated context degrades reasoning quality. This is the cognitive work agents should learn to do: recognise context pollution, initiate context segmentation, maintain slim state whilst enabling rich execution contexts.
The investment users make in these practices demonstrates the gap’s impact. Developing systematic protocols, training team members on approaches, maintaining discipline across projects: these represent substantial organisational overhead. Teams wouldn’t sustain this investment if agents naturally handled process storage appropriately.
Anthropic’s Approach
Anthropic’s research on long-running agents demonstrates both recognition of the problem and early steps toward native solutions. Their published work acknowledges the core challenge directly: “Getting agents to make consistent progress across multiple context windows remains an open problem.”
Their article on effective harnesses for long-running agents describes the fundamental issue: agents must work in discrete sessions, and each new session begins with no memory of what came before. The solution they developed reveals the pattern clearly: external process storage because agents can’t maintain it natively.
The two-fold approach uses an initialiser agent to set up the environment on first run, then a coding agent that makes incremental progress in each session whilst leaving clear artifacts. The key insight: a claude-progress.txt file for state tracking. This external file serves exactly the function I’ve described: process state storage that persists across sessions without consuming context window with accumulated detail.
This validates the process storage gap. If agents naturally maintained workflow state, Anthropic wouldn’t need to build external file-based tracking. The claude-progress.txt pattern is the same solution sophisticated users discover independently: externalise process state so agents can reference it without holding all history in active context.
More recently, Anthropic introduced platform capabilities that move toward native process storage. Context Editing automatically clears stale tool calls and results from the context window when approaching token limits, removing completed work whilst preserving conversation flow. In a 100-turn web search evaluation, this enabled agents to complete workflows that would otherwise fail due to context exhaustion whilst reducing token consumption by 84%.
The Memory tool enables Claude to store and consult information outside the context window through a file-based system that persists across conversations. This provides exactly the external storage layer that process control requires: decisions and state accessible without consuming active context.
These are infrastructure steps in the right direction. They acknowledge that conversation context alone fails for sustained work. They provide mechanisms for external storage and automatic context management. But they still require explicit user intervention. Agents don’t recognise when they need process storage and initiate it autonomously. They don’t structure state tracking without prompting. They don’t scope context appropriately without user-created fresh threads. They don’t archive completed work automatically.
The gap between Anthropic’s infrastructure capabilities and native agent behaviour reveals what’s still missing: the recognition layer. Agents need to learn: “This work is complex enough that I should initiate structured process storage. Let me create workflow definition, maintain state tracking, log decisions, and scope context appropriately.” The capability to use these tools exists. What’s missing is the instinct to reach for them without being explicitly directed.
What Needs to Evolve
Closing the process storage gap requires coordinated evolution across model builders, platform providers, and tool developers. Each contributes different capabilities toward agents that naturally maintain structured process state.
For Model Builders
Training data needs to shift from showing only implementations to including architectural decision-making. Agents have seen thousands of code examples. They need to see process control examples: workflow definitions, state management patterns, decision logging structures. More critically, they need exposure to the recognition moment: “I recognised this workflow has 15 steps with complex dependencies, so I structured it with process storage rather than attempting sequential execution.”
The architectural thinking that precedes structured approaches is what’s missing from training data. Not just the resulting n8n workflow diagram, but the engineer’s thought process: “Conversation context will fail here. I need to separate process definition from state tracking from execution context.”
Capability development should focus on recognition and initiation. Agents need to learn to recognise complexity that demands process control, initiate structured storage for multi-step work autonomously, maintain state separately from execution context, archive completed work whilst retaining outcomes, and query process state efficiently.
Current benchmarks measure wrong capabilities. “Can you complete this task?” tests execution. “Does your code work?” tests implementation quality. What we need: “Can you recognise this needs process storage?” and “Can you maintain coherence across 10 steps without external scaffolding?”
For Platform Providers
Native process storage infrastructure becomes foundational rather than optional. Platforms should enable agents to create workflow definitions with queryable structure, maintain state storage that supports filters and queries, initiate scoped context management, and automatically archive completed work.
The critical shift: agent-initiated capabilities. Platforms currently provide tools that users must explicitly invoke. Agents should recognise: “This needs process control” and create workflow structures autonomously. They should maintain state tracking automatically, not when prompted. They should scope context per step appropriately without requiring users to manually create fresh conversation threads.
Visibility and control matter for trust and refinement. Show users the process structures agents create. Make state tracking visible so users understand what agents are maintaining. Let users review and adjust workflow definitions when agent judgement needs correction. Provide oversight without requiring micromanagement of every decision.
For Tool Developers
Integration with existing practices bridges current workarounds and emerging capabilities. Connect agent process storage to project management tools teams already use. Update decision logs automatically rather than requiring manual synchronisation. Integrate with documentation systems to maintain continuity. These bridges enable teams to adopt better agent capabilities without abandoning proven workflows.
Workflow visualisation makes agent process control tangible. Show the process structure agents create: steps, dependencies, constraints. Display current state clearly: what’s complete, what’s active, what’s blocked. Highlight dependencies and blockers so teams understand coordination requirements. Make agent organisation visible rather than opaque.
What This Means Practically
Understanding the process storage gap matters most in how it affects current work, evaluation decisions, and strategic planning. These implications apply whether you’re building with AI agents today, evaluating agent capabilities, or planning adoption strategies.
For Current Work
Recognise the degradation pattern when it appears. Agent losing coherence by step 6 of a 10-step plan? That’s missing process storage causing context pollution. Agent forgetting architectural decisions made in earlier phases? No decision log storage to maintain queryable constraints. Context feels bloated with irrelevant details? No scoping or archiving mechanism separating active concerns from completed work.
Compensate systematically rather than fighting agent limitations. Build external state files using the PROJECT_STATE.md approach I documented in Helping AI Agents Remember. Use orchestration platforms like n8n or Temporal for production-critical multi-step workflows. Employ manual context management: fresh conversation threads for major phase transitions, explicit state reloading protocols.
These aren’t workarounds for broken tools. They’re systematic recognition of current capability boundaries. The gap will close, but understanding it now enables effective work whilst waiting for evolution.
Know when to invest in infrastructure versus using lightweight approaches. Complex multi-phase work with production requirements justifies n8n or Temporal overhead. Production-critical processes where reliability matters more than elegance benefit from external orchestration. Internal tools and development work often succeed with manual state files and systematic practices. Simple tasks work fine with current agent capabilities without additional scaffolding.
For Evaluation
When choosing between agent tools or evaluating new capabilities, these questions reveal process storage sophistication:
Does the agent recognise complexity before attempting implementation? Give it a multi-step problem. Agents that propose structured approaches demonstrate better instincts than those jumping straight to monolithic implementation.
Can it maintain coherence across 10 steps? Run the agent through a realistic multi-phase workflow. At what step does degradation begin? Step 3 indicates severe limitations. Step 8 shows better state management even if not perfect.
Does it lose earlier decisions or maintain them? Watch for contradictions. Agents suggesting approaches that violate constraints established in earlier phases lack decision log capability. Those maintaining consistency show better architectural awareness.
Can it recover cleanly when pointed to external state? “Read PROJECT_STATE.md and tell me what we’re working on” should provide effective context reload. Agents that can reference state and resume coherently demonstrate better external memory integration.
What to look for: agents proposing structured approaches rather than diving straight into implementation, explicit state tracking visible in responses, reference to earlier decisions without replaying all conversational details, clean phase transitions that don’t lose thread.
For Planning
Strategic decisions about AI adoption should account for capability evolution trajectories.
Production workflows justify external orchestration platform investment today. When failure costs are high and reliability matters more than elegance, n8n, Temporal, or LangGraph provide guarantees conversation context doesn’t. This investment makes sense for business-critical processes.
Development work and internal tools can often wait. Agents will improve. The process storage gap will close. Investing heavily in compensating infrastructure today creates technical debt when agent capabilities advance. Use lightweight approaches: manual state files, systematic practices, portable patterns.
Don’t over-invest in current workarounds, but do build transferable capabilities. Systematic state management, clear task boundaries, explicit handoff protocols: these practices work today and translate directly to better agent capabilities tomorrow. They’re not wasted effort; they’re organisational learning that compounds.
Competitive advantage comes from understanding the gap, not just working around it. Teams that recognise process storage limitations can build effective AI-augmented processes now whilst maintaining flexibility for future capabilities. Teams that miss the pattern either avoid complex multi-step work (limiting AI leverage) or build elaborate compensations that become obsolete (creating technical debt).
The Path Forward
The process storage gap parallels the database blind spot: same root cause, same solution path. Agents defaulting to conversation context when they need process storage mirrors agents defaulting to text files when they need databases. Both gaps stem from training data showing implementations without the architectural thinking that recognises problem patterns.
In the database article, I showed how agents write excellent SQL when prompted but don’t recognise structural data problems before being told. The capability exists; the instinct is missing. Process storage exhibits the identical pattern. Agents maintain state effectively when users build PROJECT_STATE.md scaffolding. They track decisions when given explicit logging structures. They scope context when prompted to use fresh conversation threads. The capability is there. The autonomous recognition isn’t.
The workarounds validate both the problem and the solution direction. Users building n8n workflows, maintaining state files, developing systematic context management: they’re constructing the process storage infrastructure agents should maintain natively. The fact that sophisticated users independently converge on similar patterns demonstrates this is the right architectural approach. Agents need to learn it.
What changes when agents develop native process storage: the complexity ceiling rises dramatically. Currently, coherence collapses around 8 to 10 steps as context pollution overwhelms coordination. With structured process storage, 50 to 100 step workflows become maintainable. The constraint shifts from “what fits in undifferentiated context” to “what can be coherently planned and tracked.”
Reliability improves fundamentally. Degradation from context pollution disappears. Earlier architectural decisions remain active constraints rather than fading into implementation archaeology. Requirements don’t get buried under accumulated detail. Phase transitions happen cleanly without losing thread.
The human role shifts from building scaffolding to validating structure. Instead of manually maintaining PROJECT_STATE.md files, we review process structures agents create. Instead of prompting “read this file” for context reload, we validate that agents have correctly identified dependencies and constraints. The collaboration becomes more symmetric: agents handle process storage natively whilst humans provide architectural judgment and strategic direction.
Maturity indicators will be clear. External state files transition from essential to optional. The degradation pattern that currently appears around step 6 disappears. Complex projects sustain across sessions and phases without elaborate user-maintained scaffolding. Teams stop investing in external orchestration infrastructure for straightforward multi-step development work.
Strategic implications vary by stakeholder. For organisations, current investment in systematic practices isn’t wasted. Teams developing state management discipline, clear task boundaries, and systematic handoff protocols build capabilities that compound as agent sophistication increases. These aren’t temporary workarounds; they’re transferable skills.
For tool vendors, the window for external orchestration platforms may be shorter than expected. When agents maintain process storage natively, value shifts from compensation to enhancement. Platforms that help agents orchestrate effectively will outlast platforms that orchestrate for agents. Integration and visualisation matter more than providing capabilities agents lack.
For the ecosystem broadly, process storage capability represents competitive advantage. Whoever solves autonomous process storage recognition gains leverage on complex multi-step work. But over-investment in current workarounds creates technical debt. Simple, portable compensations whilst waiting for capability evolution position teams better than elaborate infrastructure that becomes obsolete.
The parallel to the database blind spot provides confidence about trajectory. That gap is closing as agents see more architectural decision-making examples. The same will happen for process storage. Training data evolution, capability development, and platform support will converge on agents that recognise: “This workflow is complex enough to need structured process storage” and initiate appropriate architecture autonomously.
Until then, understand the gap. Work systematically with current limitations. Build portable practices that translate to future capabilities. The teams succeeding now are those who recognise process storage as missing infrastructure rather than agent failure, compensate systematically whilst preparing for evolution, and avoid both paralysis (waiting for perfect agents) and over-investment (building elaborate workarounds for temporary gaps).
About the Author
Tim Huegdon is the founder of Wyrd Technology, a consultancy focused on helping engineering teams achieve operational excellence through strategic AI adoption. With more than 25 years of experience in software engineering and technical leadership, Tim specialises in AI agent architectural patterns, process control infrastructure, and strategic evaluation of AI capabilities. His approach combines deep technical expertise with practical observation of how engineering practices evolve under AI assistance, helping organisations develop sustainable AI workflows whilst maintaining the quality standards that enable long-term velocity.