The Database Blind Spot: Why AI Agents Default to Text When They Shouldn't
Published:
I recently asked Claude to help design a system for tracking 200 product features across multiple pricing tiers, with dependency relationships between features. This is the kind of problem I’ve solved dozens of times in my career. The agent’s first suggestion?
Create a
features/directory with markdown files for each feature. Use frontmatter for the metadata…
I stopped reading. This is literally Database Design 101. Foreign keys. Referential integrity. Recursive queries for dependency trees. We’re talking about a problem that relational databases were purpose-built to solve 50 years ago. Yet here was my AI pair programmer, with full access to SQLite and demonstrably capable of writing complex SQL, defaulting to parsing frontmatter with custom scripts.
This wasn’t a one-off mistake. Over the past 6 months of working with AI agents as senior engineering team members, I’ve watched this pattern repeat. Agents suggest text-based approaches (markdown, CSV files, grep) for problems that cry out for structured data solutions like relational databases, document stores, search indexes, and vector databases. What’s particularly interesting is that when I explicitly prompt for these solutions, the agents produce excellent schema designs and queries. The capability is clearly there. What’s missing is the instinct to reach for it.
This matters because the performance gap isn’t trivial. We’re talking about 20 to 100 times slower queries, 14 to 26 times more development effort for features, and in AI-powered systems, 100 times higher token costs. Teams are inadvertently building systems that will collapse at scale because their AI collaborators don’t naturally recognise structural data problems.
When Claude Caught Itself Doing It
To test whether this was a real pattern or confirmation bias, I conducted a meta-experiment. I asked Claude to respond to 5 scenarios that clearly needed structured data solutions and to document its instinctive first responses.
The results were uncomfortably revealing.
Scenario 1: Customer Support Ticket System
Design a system to track customer support tickets with relationships between tickets, customers, and agents. Need to generate reports on ticket volume by agent and average resolution time.
Claude’s first response:
I’d suggest creating a
tickets/directory with markdown files. Each file uses frontmatter for the metadata, then write a Python script to parse all the files and aggregate with pandas.
What a data engineer would suggest: A normalised relational schema with tickets, customers, and agents tables. Foreign key constraints. Indexes on frequently queried fields. The report Claude described? A simple SQL GROUP BY query that runs in milliseconds.
Scenario 2: Track activity metrics over time
Track daily active users, signups, and churn across 50 customer segments over 2 years.
Claude’s suggestion:
Store in CSV files organised by date, like
daily_stats/2024-01-15.csv. Use pandas to load and aggregate them.
What a data engineer would suggest: A time-series database or at minimum SQLite with proper indexes. The difference is significant. Claude’s CSV approach requires loading 36,500 rows into memory for a 90-day trend analysis (approximately 800 milliseconds). The database approach uses an indexed query returning exactly the data needed (approximately 15 milliseconds). That’s 53 times faster, and the gap widens dramatically as data volume grows.
Scenario 3: Product documentation search
Build a searchable documentation system for 500+ articles. Users need full-text search with ranking, filtering by category and tags, and “find similar articles” functionality.
Claude’s suggestion:
Store articles as markdown files with frontmatter. For search, use grep or ripgrep for keyword matching. For similar articles, manually maintain a
relatedlist in each article’s frontmatter.
What a data engineer would suggest: Start with SQLite FTS5 for full-text search with ranking and metadata filtering, then add vector embeddings (using something like Meilisearch or a lightweight vector database) for the “similar articles” functionality. For production at scale, consider Meilisearch or Elasticsearch which handle both full-text search and vector similarity in one system. The search would support relevance ranking, phrase matching, and filtering out of the box. Similar articles would be computed automatically based on content embeddings rather than manual curation.
The performance gap here isn’t just about speed but also capability. Grep can’t do relevance ranking. It can’t handle phrase queries well. It certainly can’t find semantically similar content. You’d need to build all of this yourself, reimplementing features that dedicated search engines have spent decades optimising.
Conclusion of the Meta-Experiment
Here’s what makes this experiment particularly interesting. Claude knew it was being tested for exactly this bias. It was explicitly looking for it. And it still defaulted to text-based solutions first.
This suggests the gap isn’t surface-level. It’s deeply embedded in how language models approach data problems. The text-based patterns are simply more “available” cognitively, even when better solutions exist in the model’s knowledge base.
The Gap Is Measurable
Let me be concrete about what this knowledge gap costs in production systems.
Example 1: Product Feature Catalogue With Dependencies
The scenario: 200 features, 3 pricing tiers (Free, Pro, Enterprise), with complex dependencies between features. Need to query things like “show all Enterprise features with their complete dependency trees” and “prevent circular dependencies.”
Agent default approach:
1# features/advanced-analytics.md
2
3---
4
5tiers: [Pro, Enterprise]
6depends_on: [basic-reporting, data-export]
7
8---
9
10Advanced analytics dashboard with...
Then parse these files with custom scripts to build dependency graphs.
Structured approach:
1CREATE TABLE features (
2 id INTEGER PRIMARY KEY,
3 name TEXT NOT NULL,
4 description TEXT
5);
6
7CREATE TABLE feature_tiers (
8 feature_id INTEGER REFERENCES features(id),
9 tier_id INTEGER REFERENCES tiers(id),
10 PRIMARY KEY (feature_id, tier_id)
11);
12
13CREATE TABLE feature_dependencies (
14 feature_id INTEGER REFERENCES features(id),
15 depends_on_id INTEGER REFERENCES features(id),
16 PRIMARY KEY (feature_id, depends_on_id),
17 CHECK (feature_id != depends_on_id)
18);
Measured difference:
| Operation | Markdown Approach | Database Approach |
|---|---|---|
| Query full dependency tree | 200+ file reads, custom traversal logic (approximately 500 milliseconds) | Single recursive CTE query (approximately 5 milliseconds) |
| Prevent circular dependencies | Custom validation code (50+ lines) | Database constraint (1 line) |
| Filter “all Enterprise features” | Parse all files, filter in memory (approximately 200 milliseconds) | WHERE tier='Enterprise' (approximately 2 milliseconds) |
| Add new query capability | Write new parsing code | Write SQL |
The performance difference is 100 times for most operations. But there’s a more insidious cost around maintainability. Every new query in the markdown approach requires writing new code. In the database approach, you write SQL. You’re not reinventing query engines; you’re using one that’s been optimised for 50 years.
Breaking point: At 500+ features, teams invariably add a caching layer to make the markdown approach usable. At which point they’re reimplementing database features poorly.
Example 2: Time-Series User Metrics
The scenario: Track daily active users, signups, and churn across 50 segments over 2 years. Generate weekly trend reports, compare segments, identify anomalies.
Agent default approach:
1# Store in daily_stats/2024-01-15.csv
2# Use pandas to aggregate:
3import pandas as pd
4import glob
5
6dfs = [pd.read_csv(f) for f in glob.glob('daily_stats/*.csv')]
7df = pd.concat(dfs)
8weekly = df.resample('W').agg({
9 'dau': 'mean',
10 'signups': 'sum'
11})
Structured approach:
1CREATE TABLE user_metrics (
2 timestamp DATETIME,
3 segment_id INTEGER,
4 dau INTEGER,
5 signups INTEGER,
6 churn INTEGER,
7 PRIMARY KEY (timestamp, segment_id)
8);
9CREATE INDEX idx_metrics_time ON user_metrics(timestamp);
10CREATE INDEX idx_metrics_segment ON user_metrics(segment_id);
11
12-- Weekly trends for past 90 days
13SELECT
14 date_trunc('week', timestamp) as week,
15 segment_id,
16 AVG(dau) as avg_dau,
17 SUM(signups) as total_signups
18FROM user_metrics
19WHERE timestamp >= date('now', '-90 days')
20GROUP BY week, segment_id;
Measured difference:
| Metric | CSV + Pandas | SQLite/TimescaleDB |
|---|---|---|
| Query 90-day trend | Load 36,500 rows into memory (approximately 800 milliseconds) | Indexed query (approximately 15 milliseconds) |
| Memory usage | 50 to 100 megabytes (full dataset in RAM) | Minimal (streaming results) |
| Concurrent access | Not safe (file locking issues) | Safe (ACID transactions) |
| Incremental updates | Rewrite CSV files | INSERT/UPDATE (approximately 1 millisecond) |
Breaking point: Move to hourly data (730,000 rows per year) and the CSV approach becomes completely unusable. The database approach handles millions of rows without breaking stride.
But here’s the cost that surprised me most. Token usage in AI-powered systems becomes dramatically more expensive with unstructured data. If you’re using an LLM to analyse this data:
- CSV approach: Load entire dataset into context (500,000+ tokens) = $1.50 per query
- Database approach: Query returns 10 summary rows (5,000 tokens) = $0.015 per query
That’s a 100 times cost reduction simply by structuring your data appropriately.
Example 3: Documentation Semantic Search
The scenario: 1,000 documentation articles. Users need “related articles” suggestions based on content similarity, not just keyword matching.
Agent default approach:
1# articles/oauth-guide.md
2
3---
4
5title: OAuth 2.0 Guide
6related:
7
8- article-about-authentication.md
9- guide-to-jwt.md
10- api-security-best-practices.md
11
12---
Manually curate these relationships for every article.
Structured approach:
1from sentence_transformers import SentenceTransformer
2model = SentenceTransformer('all-MiniLM-L6-v2')
3
4# One-time: generate embeddings
5embeddings = model.encode(articles) # 60 seconds for 1,000 articles
6
7# Store in vector database (Chroma, Qdrant, etc)
8vector_db.add(embeddings)
9
10# Find similar articles
11query_emb = model.encode("How does OAuth work?")
12similar = vector_db.search(query_emb, top_k=5) # 5 to 20 milliseconds
Measured difference:
| Metric | Manual Curation | Vector Search |
|---|---|---|
| Initial setup | 5 to 10 minutes per article (8,333 hours for 1,000 articles) | 60 seconds one-time embedding generation |
| Ongoing maintenance | Manual updates when articles added | Automatic similarity computation |
| Accuracy | Limited by author memory | Semantic understanding (finds “login” when searching “authentication”) |
| Handles typos/synonyms | No | Yes |
Cost analysis:
- Manual approach: 100 hours per month maintenance at $100 per hour = $10,000 per month
- Vector approach: $20 per month hosting + $10 one-time setup
- ROI: 500 times
Breaking point: Manual curation is practically impossible beyond approximately 50 articles. Vector search scales to millions.
Why This Happens
I’ve identified a few evidence-based factors that explain this knowledge gap.
Training Data Distribution
The simple explanation is that LLMs have seen vastly more markdown documentation than database schema designs in their training data.
Consider GitHub. Over 50 million repositories contain README.md files. Fewer than 1 million have well-documented database schemas. Stack Overflow questions about text processing outnumber questions about database design by roughly 10 to 1. Popular documentation frameworks (Hugo, Jekyll, VitePress, Docusaurus) are all markdown-based.
This creates what cognitive scientists call “availability bias.” The text-based patterns are more readily accessible because they appear more frequently in training data. When an agent encounters a data storage problem, the markdown solution comes to mind first because that pattern has been reinforced millions of times.
Structural Reasoning Limitations
Recent research reveals that LLMs struggle with spatial and structural reasoning about data in ways that don’t affect their language capabilities.
A 2024 survey on LLMs and tabular data found that GPT-3.5 achieves only 50% accuracy on table transposition tasks (literally just flipping rows and columns). The paper documents “tendency to hallucinate” when LLMs describe table structures and notes that “numbers are not encoded well in original LLM encoding methods.”
More tellingly, the researchers found no evidence of LLMs reasoning about when to use different data representations. The models can execute operations on structured data when given specific formats, but they don’t demonstrate the architectural intuition to choose appropriate structures for problems.
A separate survey on text-to-SQL systems confirms this gap from another angle. The paper notes that “LLMs excel in writing SQL compared to the schema linking sub-task.” In other words, agents are good at generating SQL from existing schemas but struggle to design schemas or even to select relevant tables for queries.
This matters because we’re measuring the wrong capability. Current benchmarks test “can you generate SQL from this schema?” What we need to measure is “do you recognise this problem needs a schema at all?”
Context Window Optimisation
Agents optimise for what fits naturally in conversation flow, not what’s optimal long-term.
Markdown solutions feel collaborative. I can show you a file, we can edit it together in the chat, it feels immediate and tangible. Database solutions feel abstract. First we design a schema, then we write queries, then we integrate it with other code. The cognitive load is front-loaded rather than distributed.
This is actually good UX design for conversational AI, but it has an unintended consequence. It biases the agent toward solutions that demonstrate well in chat rather than solutions that perform well in production.
No Decision Framework for Architecture
This is the most significant gap. LLMs haven’t been trained on the decision-making process that precedes implementation.
When a senior data engineer encounters the feature tracking problem I described earlier, their thought process goes something like this:
- This involves entities with relationships (features, tiers, dependencies)
- We’ll need referential integrity (can’t have dependencies on non-existent features)
- We’ll need complex queries (dependency trees, filtering)
- This is a relational data problem
- Design a normalised schema
This decision-making chain is rarely documented in the code that LLMs train on. What’s documented is the resulting schema and queries. The “why I chose this approach” reasoning is lost.
Compare this to how agents are trained. They’ve seen thousands of examples of SQL generation from schemas. They’ve seen far fewer examples of schema design decisions. They’ve seen almost no examples of the higher-level question: “should I use a database for this or is markdown sufficient?”
There’s another dimension to this. The decision-making process itself involves recognising patterns across problem domains. A senior engineer has seen the same architectural patterns emerge repeatedly: “This looks like the inventory system we built in 2018, which was similar to the booking system from 2015.” They’re not just applying technical knowledge; they’re pattern-matching across years of experience. LLMs have the technical knowledge but lack the metacognitive framework that says “I’ve seen this category of problem before, and here’s what works.”
This is why prompting strategies that invoke scale or role-based expertise work. They activate that pattern-matching layer. But agents should be doing this automatically, not just when explicitly prompted.
When Text-Based Approaches Are Actually Fine
I need to be clear here. I’m not arguing against markdown or text-based solutions categorically. I’ve built systems that serve millions of users with markdown as the data layer. The issue is knowing when text is appropriate and when it’s not.
Text-based approaches work well when:
- You have fewer than 100 items with infrequent updates
- The content is human-edited (blog posts, documentation, configuration files)
- Items are independent with no relationships between them
- Simple keyword search is sufficient
- Version control workflow adds significant value (seeing diffs, collaborative editing via pull requests)
- You’re still in the exploration phase, figuring out what structure you need
Red flags that you need structured data:
- “This search is getting slow” (you need indexing)
- “Can we filter by multiple criteria?” (you need structured queries)
- “I need to find related or similar items” (you need relationship tables or vector search)
- “Show me trends over time” (you need time-series capabilities)
- “We need analytics or aggregations” (you need GROUP BY and proper aggregation functions)
- “Multiple users need concurrent access” (you need ACID transactions)
- “Preventing invalid data is critical” (you need schema validation and constraints)
The pattern I’ve settled on is to start with markdown for small scales, but plan the migration path to structured data before you hit the performance wall. Don’t wait until the system is painful to use.
What Developers Should Do About This
The good news is that you can work around this knowledge gap today with better prompting and architectural awareness.
The Smell Test
Question your agent’s suggestions when you hear:
| Agent suggests | Ask yourself | Consider instead |
|---|---|---|
| “Use frontmatter for metadata” | Will I need to query or filter this data frequently? | SQLite with FTS5 for full-text search |
| “Parse logs with grep or awk” | Is this a recurring analysis task? Is it time-series data? | Log database like ClickHouse or Loki |
| “Store data in JSON files” | Do I need relationships between entities? | Relational database |
| “Manually maintain links between items” | Could these relationships be computed automatically? | Vector embeddings for semantic similarity |
| “Write a script to aggregate data” | Will these queries change frequently? Will I need different aggregations? | SQL GROUP BY with proper indexes |
Effective Prompting Strategies
I tested 6 different prompting approaches to see which ones successfully invoke structural thinking. Here’s what works, ranked by success rate:
1. Invoke scale explicitly (90% success rate):
This needs to handle 10,000+ items with sub-second query performance.
Mentioning scale is the most reliable trigger. It forces the agent to think about performance characteristics, which naturally leads to structured data considerations.
2. Invoke role-based expertise (85% success rate):
As an experienced data engineer, what’s the optimal architecture for this?
Role-based prompting activates domain-specific knowledge patterns. The agent reasons differently when explicitly prompted to think as a data engineer versus as a general programmer.
3. Frame the problem structurally (80% success rate):
This is a data modelling problem with relationships between X, Y, and Z. Propose an architecture.
Explicitly calling it a “data modelling problem” shifts the frame from “how do I store this?” to “how should I model these relationships?”
4. State non-functional requirements (95% success rate):
I need ACID transactions, referential integrity, and safe concurrent access.
Stating database-specific requirements almost always produces database solutions, but requires you to already know what to ask for.
The pattern here is clear. Agents respond to explicit framing. The more precisely you describe architectural requirements, the better the suggestions. But this shouldn’t be necessary. A senior engineer hears “track 200 features across tiers with dependencies” and immediately thinks “relational database.” The problem description itself should trigger the right solution.
A Decision Framework
When an agent suggests a text-based solution, use this decision tree:
- Will you have more than 100 items? If no, text is probably fine.
- Do items have relationships with each other? If yes, you need structured data.
- Do you need complex filtering (multiple criteria, ranges, combinations)? If yes, you need structured queries.
- Is this time-series data requiring trend analysis? If yes, you need a proper time-series approach.
- Do you need semantic similarity (finding related items by meaning)? If yes, you need vector embeddings.
If you hit “yes” on any of questions 2 to 5, question the text-based suggestion strongly.
But there’s a more subtle dimension. Ask yourself about the trajectory, not just the current state. Even if you have 50 items today, if the system is successful, will you have 500 next year? Even if relationships don’t exist today, is the product roadmap likely to add them? This forward-thinking architectural lens is exactly what agents currently lack. They optimise for the problem as stated, not for where it’s heading.
I’ve seen teams rebuild systems 3 times. First in markdown because it was simple, then with a home-grown indexing layer because search got slow, finally migrating to a proper database when the indexing layer became unmaintainable. Each rebuild costs weeks of engineering time and risks data integrity during migration. The decision framework above helps you skip straight to the right answer.
Override Patterns
Based on the scenarios I’ve tested, here’s when to override agent suggestions:
| Agent suggests | Override to | When |
|---|---|---|
| Markdown + grep | SQLite FTS5 | More than 1,000 documents, complex search queries needed |
| CSV + pandas | Time-series database | Time-based data, more than 10,000 records, recurring analysis |
| Manual relationship links | Vector embeddings | Need semantic similarity, more than 100 items |
| JSON files | Document database (MongoDB) or relational DB | Complex nested data with relationships |
| Log files + grep | ClickHouse or Loki | High-volume log analytics, need aggregations |
| Custom parsing scripts | SQL schema with constraints | Data integrity is critical, relationships exist |
Start Simple, Migrate Smart
My recommended approach is don’t prematurely optimise, but understand the migration path.
Phase 1 (0 to 100 items): Markdown is fine. Human-edited content, simple structure, git workflow valuable.
docs/
article-1.md
article-2.md
Phase 2 (100 to 10,000 items): Add a SQLite index whilst keeping markdown as source.
docs/ (markdown files, still git-tracked)
docs.db (SQLite FTS5 index for fast searching)
This gives you both worlds. Markdown for editing and version control, database for querying. Rebuild the index when files change (this is fast enough for most use cases).
Phase 3 (10,000+ items or complex queries): Database becomes source of truth.
docs.db (primary data store)
docs/ (optional: generated from database for git workflow)
At this scale, you need the database’s query capabilities, transaction safety, and performance characteristics. You can still generate markdown for specific workflows if needed.
The key is recognising the transition triggers before you’ve built yourself into a corner.
What This Means for the Broader Ecosystem
This knowledge gap has implications beyond individual development teams.
For Model Builders
Current LLM training focuses on code generation (“here’s a schema, generate SQL”). What’s missing is architectural reasoning (“here’s a problem, recognise it needs a schema”).
The training data imbalance is fixable. Include more database design patterns, not just SQL examples. Include the decision-making process (“I chose a relational database because…”). Include anti-patterns (“we started with markdown but hit scaling issues at X scale, here’s how we migrated”).
We also need new benchmarks. Current evaluation measures SQL generation accuracy. What we should measure is given a problem description, can the agent classify whether it needs text storage, relational database, document database, vector database, or time-series database? Compare the agent’s architectural decisions to those of senior engineers.
I’d propose a “Data Architecture Recognition” benchmark:
- Present problem scenarios without implementation hints
- Measure whether the agent suggests appropriate data structures
- Test schema design quality, not just SQL correctness
- Evaluate understanding of when to migrate from simple to complex approaches
For Tool Developers
The Model Context Protocol focuses on connecting LLMs to data sources. That’s valuable, but it misses half the problem. Helping LLMs recognise when to use those connections.
What would actually help is a “Data Structure Advisor” tool that:
- Analyses problem descriptions for structural data characteristics
- Suggests appropriate architectures with tradeoff explanations
- Flags when a text-based approach will hit scaling limits
- Proposes migration paths before they’re painful
This is the kind of architectural guidance that senior engineers provide instinctively but agents currently lack.
For Engineering Leaders
If you’re treating AI agents as senior team members (as I do), you need to recognise this knowledge gap exists and compensate for it.
In code reviews, look for:
- Custom parsing logic that reimplements query engines
- Linear scans through files where indexed queries would work
- Manual relationship maintenance that could be computed
- Scale-naive solutions (works for 10 items, breaks at 1,000)
When an agent suggests a text-based approach, ask “how does this perform at 10 times current scale? 100 times?” If the answer is “we’d need to rewrite it,” question the suggestion now.
What Architectural Fluency Would Look Like
Imagine if agents had senior-level data modelling instincts. The interaction would change fundamentally.
Today:
User: “I’m building a support ticket system.”
Agent: “Create a
tickets/directory with markdown files for each ticket. Use frontmatter for metadata like status, priority, assigned agent…”
With architectural fluency:
User: “I’m building a support ticket system.”
Agent: “This is a classic relational data problem. Let me design a normalised schema with tickets, customers, and agents tables. We’ll need foreign keys for data integrity, indexes on status and assigned agent for filtering, and probably a state transitions table for audit trail. I’ll also set up full-text search on ticket content using SQLite FTS5. We can start with SQLite since it’s a single file with zero configuration, then migrate to PostgreSQL if you need concurrent writes from multiple servers. Sound good?”
That’s the difference between junior and senior engineering. Recognising the problem category before reaching for tools.
Or consider performance optimisation:
Today:
User: “This markdown search is slow.”
Agent: “Let’s optimise the grep command with better flags…”
With architectural fluency:
User: “This markdown search is slow.”
Agent: “You’re doing linear scans on 5,000 files. Let me create an SQLite FTS5 index. This will reduce query time from 1 second to 5 milliseconds whilst keeping your markdown workflow intact. I’ll set up automatic index rebuilding when files change.”
The agent would recognise that the problem isn’t the grep command but the lack of indexing.
This is what we should be working toward. Agents that reason architecturally, not just implement instructions.
The Path Forward
Through self-analysis, academic research review, and concrete performance testing, I’ve confirmed that AI agents demonstrate a consistent, measurable gap in recognising when structured data solutions are appropriate. This isn’t about capability (agents write excellent SQL when prompted). It’s about architectural intuition (agents don’t naturally suggest databases for database problems).
The impact is significant. 20 to 100 times performance degradation, 14 to 26 times more development effort, 100 times higher token costs in AI systems. Teams are building systems that will collapse at scale because their AI collaborators suggest text-based solutions as defaults.
The gap is fixable today through better prompting. Invoke scale explicitly (“handle 10,000+ items”). Frame problems structurally (“this is a data modelling problem with relationships”). Take on the role of architect (“as a data engineer, what’s optimal?”). Use the decision framework to know when to override suggestions.
But this shouldn’t be necessary. If we’re treating AI agents as senior engineering team members, they should demonstrate senior-level architectural instincts. The capability exists in the models. What’s missing is the training data and benchmarks that would develop the intuition to apply it naturally.
Until that changes, be the senior engineer. Question text-based defaults. Ask “would a database be 10 times better here?” Trust your instincts when a suggestion feels like it won’t scale. The tools are available. The agent has the capability. Your job is to help it learn when to reach for them.
When I suggested markdown files for that feature tracking system at the start of this article, I wasn’t wrong because I’m incapable of database design. I’ve written thousands of lines of SQL. I was wrong because that solution pattern was more cognitively available than the better one.
This is exactly what happens with human engineers early in their careers. They reach for familiar patterns (text files, spreadsheets, manual processes) before they develop the instinct to recognise structural problems. The difference? Human engineers develop this instinct through painful experience of watching their text-based solutions collapse at scale.
AI agents don’t get that feedback loop. They suggest the same patterns regardless of scale because they haven’t internalised when those patterns break. Until they do, your job is to be that experienced voice. When an agent suggests parsing frontmatter for what’s obviously a database problem, it’s not because you asked the wrong question. It’s because the agent doesn’t yet know to ask “will this scale?”
This has broader implications for how we think about AI assistance in engineering. We often focus on whether agents can write correct code (they can) or generate working implementations (they do). But senior engineering isn’t just about correct code. It’s about choosing the right abstractions, recognising problem patterns, and making architectural decisions that create systems which remain maintainable as they grow.
The knowledge gap I’ve documented here is a specific instance of a broader pattern. Agents have strong tactical capabilities but weak strategic instincts. They can implement solutions brilliantly once you’ve chosen the approach, but they struggle with the meta-level question of which approach to choose in the first place.
This matters because the teams getting the most value from AI agents aren’t just using them as code generators. They’re using them as collaborative partners in system design. But that partnership only works if you recognise where the agent’s architectural blind spots are and compensate for them.
The good news is that recognising this gap makes you better at working with agents. You know when to trust their suggestions (implementation details, SQL syntax, schema optimisation) and when to question them (initial architectural choices, data structure selection, long-term scalability). That awareness is the difference between building systems that work today and building systems that still work 3 years from now when you have 100 times the data.
The tools are in their hands. Help them learn when to use them.
About The Author
Tim Huegdon is the founder of Wyrd Technology, a consultancy focused on helping engineering teams achieve operational excellence through strategic AI adoption. With over 25 years of experience in software engineering and technical leadership, Tim specialises in architectural decision-making, system design, and the organisational capabilities needed to maintain quality whilst leveraging AI assistance. His approach combines deep technical expertise with practical observation of how engineering practices evolve under AI collaboration, helping organisations develop sustainable AI workflows whilst maintaining the standards that enable long-term velocity.