AI Will Write All Our Code… and Other Fairy Tales

Published: 22 May, 2025

Part 1 of a two-part “AI Will Write All Our Code” series

So… I’ve Been Thinking About AI

You’ve probably seen the posts.

“By the end of the year, 90% of all code will be written by AI.”

“Software engineers will become prompt whisperers.”

“We trained a GPT model on our Slack history and now it’s running product strategy.”

It’s an exciting time to be in tech—if only because watching the hype train barrel down the tracks without brakes is strangely exhilarating. If you believe the internet, we’re just months away from replacing delivery teams with a well-tuned prompt and a generous OpenAI bill.

But then I open my laptop, hop on a call with a client, and reality gently taps me on the shoulder.

One is trying to add AI-powered search to an internal knowledge base.
Another is replacing a brittle Excel rules engine with something less… medieval.
A third is experimenting with LLMs in customer support, and mostly discovering new ways to confuse customers faster.

These aren’t moonshots. They’re grounded, thoughtful, and often very messy. They remind me that integrating AI into real-world systems is more “duct tape and caution signs” than “silver bullets and unicorns.”

And that’s what this article is about. Not the fantasy of AI as a mythical engineer sent to save us from Jira boards and code reviews—but the practical, sometimes strange, often delightful process of integrating AI into traditional systems.

Over the next few sections, I’ll walk you through what I’ve seen—warts and all—as teams move from bolt-on experiments to deeper architectural changes, and what might lie ahead if we keep going down this path.

Spoiler: Your AI agent probably isn’t writing 90% of your code. But it might be helping you rethink what you need to build in the first place.

And no, it’s not coming for your job.

At least, not unless your job involves pasting Stack Overflow answers into legacy codebases without testing them—which, let’s be honest, is already half the internet’s backend strategy. Never mind vibe-coding—we’ve been doing SODD (Stack Overflow Driven Development) for years already.

In truth, the most successful uses of AI I’ve seen so far feel less like replacement and more like pairing. It’s like having an unusually helpful (if slightly erratic) junior developer sitting next to you—happy to scaffold boilerplate, draft documentation, or take the first stab at a function. You still have to steer, review, and (often) undo its work—but it gets you moving faster.

The future of software isn’t engineer vs. AI. It’s engineer plus AI.

And right now, that partnership is still figuring out which way is up.

What I’m Seeing on the Ground

Bold predictions might dominate LinkedIn, but on the ground, the vibe is very different. The teams I’m working with aren’t chasing breakthroughs—they’re chasing small, survivable wins.

What’s interesting isn’t the tools they’re using—it’s the patterns emerging in how they think about applying them:

They start with low-stakes use cases: documentation, triage, internal search.
They test in isolation before even thinking about integration.
They expect mess—and are relieved when it’s only slightly chaotic.

The real story isn’t about which company did what. It’s how these small trials are shaping internal conversations: raising expectations, provoking better questions, and building the confidence teams need to take the next step. No one’s replacing their dev team with a fleet of GPT-powered agents just yet. What they are doing is poking at the edges of their systems to see where AI might stick.

In every case I’ve been involved with recently, the process is messy. Prompts are brittle. Edge cases pop up like weeds. Stakeholders get a little too excited. And we all have to remind each other that hallucinating an answer with confidence doesn’t make it correct—even if parts of Business Development have been modelling that behaviour successfully for decades.

But despite the chaos, there’s a through-line: these companies aren’t using AI to avoid engineering—they’re using it to support it. They’re not handing over the keys. They’re building copilots, scaffolds, and shortcuts.

It’s not magical. It’s not even particularly elegant. But it’s real. And for now, that’s the most useful kind of AI there is.

Phase 1 – The Add-On Era

Once the hype fog clears and teams decide to try something tangible, most land here first: Adding AI at the edges. It’s low risk, fast to ship, and just credible enough to get through an architecture review (if no one asks too many questions).

These are experiments that don’t aim to transform the system—they aim to test the water. Can we add a chatbot? Can we generate ticket summaries? Will anyone notice?

This isn’t architectural innovation yet. But it is where innovation starts to get negotiated across teams—and where real learning begins.

I call this “AI as Clippy, but with more compute”: You’re not rethinking your product. You’re just giving it a clever hat.

What it Looks Like

A chatbot on your homepage that does slightly more than redirect to your FAQ.
An “Ask AI” button hovering over your internal documentation.
Auto-generated summaries for customer support tickets (that are occasionally accurate and occasionally poetry).
LLMs awkwardly grafted onto your search bar like a third arm.

At this stage, AI isn’t part of the system’s soul—it’s a helpful (if slightly untrustworthy) assistant whispering over your shoulder. It can’t make real decisions, and you wouldn’t trust it with anything mission-critical. But it’s fun, it’s flashy, and it demos well in front of leadership.

How It’s Wired (Now With Bonus Jank)

AI is typically integrated via third-party APIs like OpenAI, Anthropic, or Google’s PaLM/Gemini stack. If you’re fancy, you might have set up a vector database too (or at least said you did on LinkedIn).
Most implementations involve prompt engineering disguised as architecture. You’ll find YAML files full of capitalised sentences like “You are a helpful assistant who NEVER lies” sitting uncomfortably close to production.
No training pipelines, no fine-tuning, no data feedback loops—just stateless prompts, a network call, and a growing sense of dread when it returns something “creative.”

You’re not building a system around intelligence—you’re building a bridge to a giant statistical guess engine and praying for coherence.

Why It’s Popular (and Why That’s Not a Bad Thing)

Fast to deploy: You can ship an MVP in a day. Two, if you want to add a loading spinner.
Low risk: If it breaks, you can just call it “beta” and pretend it was intentional.
Low effort, high optics: No data science team required. Just wire up a POST request and update the pitch deck.
Board slide gold: “Strategic investment in generative AI to elevate customer experience.” Boom—budget approved.

Let’s not pretend this doesn’t matter. Having something visual and vaguely futuristic to show stakeholders is often what unlocks time, money, and internal permission to explore deeper uses. Sometimes, you have to ride the hype wave to get real work done.

So if the chatbot on your landing page helps you secure funding for a serious architecture rethink? Bless that chatbot.

But Here’s the Thing…

This phase is like buying a gym membership in January. You feel innovative. You’re doing AI. But most of the value is performative unless you dig deeper.

The outputs are shallow, the reliability is questionable, and every engineering team ends up having the same conversation:

“What exactly did we ask it?”
“Why did it respond with that?”
“Can we stop it from recommending our competitors?”

However, this phase is still useful. It gets teams familiar with the behaviour of models in the wild. It forces people to grapple with latency, prompt fragility, and the existential horror of unstructured input.

More importantly, it kicks off an internal conversation.

As soon as something AI-driven lands in front of users—or executives—it forces the rest of the business to pay attention. People start engaging with the technology, not just the idea of it. Expectations get recalibrated. Curiosity gets channelled into actual discussion. It creates a kind of shared exposure therapy. Legal asks about model outputs, product starts thinking about edge cases, and someone in marketing inevitably asks if it can write the blog.

(Spoiler: It can. In fact, I collaborated with one to write this very post. It suggested three puns and tried to add a unicorn emoji. I said no. We’re working on boundaries.)

It’s messy, but it’s movement. And that movement helps demystify the technology, sharpen the right questions, and build the organisational muscle you’ll need later on.

And—crucially—it lays the groundwork for the next stage: where AI starts moving from nice-to-have to actually doing things.

Eventually, someone will ask: ‘Can we hook this into our workflows?’

Which is where things start to get weird…

Phase 2 – The Frankenstack

This is the phase where things start to get… ambitious. AI stops being a bolt-on and starts showing up inside the machine. Not just in the UI, but in workflows, decision points, and logic layers where you used to have conditionals, spreadsheets, or—if we’re being honest—Phil from Ops with a checklist. You begin to replace brittle parts of the system with LLMs. Sometimes it’s magical. Sometimes it’s a weird, unstable mess held together by duct tape and “TODO” comments.

This is The Frankenstack: part old-school app, part generative model, stitched together with vector embeddings and a prayer.

What It Looks Like

LLMs replacing rules engines or heuristics: Instead of 300 if/else statements, you now have a prompt that says “Decide how to route this support ticket based on tone, content, and urgency.” It works surprisingly well—until someone sends an email in emoji.
RAG (Retrieval-Augmented Generation) pipelines everywhere: The model isn’t just guessing anymore—it’s looking things up. It fetches documents from your knowledge base, indexes your policies, pulls in CRM data… and then still sometimes says, “I’m not sure.”
Prompt templates sprawling across the codebase: You started with one prompt. Then came the variation for angry customers. Then the one for enterprise users. Then the A/B test. Now your prompts/ folder is a forest, and you’re debating version control strategies for plain English.
LLMs generating structured responses: Want to summarise a support ticket? Extract key fields from unstructured text? Draft the first pass of a weekly report? That’s now the AI’s job—and it’s mostly getting it right. Until someone swears in a way that rhymes with a product name.
Middleware turning into AI orchestration glue: You now have a layer between the user and your backend services that interprets intent, chooses a prompt, fetches context, makes a decision, then sends a result. You’ve effectively built a pseudo-agent… you’re just not calling it that yet.
Non-engineers tuning prompts in production: Somewhere, someone in marketing has access to a prompt editor and is now “experimenting with tone.” No one told security. Everyone finds out when the bot says, “Absolutely slay, queen 💅” in response to a refund request.

At this stage, AI starts doing real work. It’s not just answering questions—it’s making decisions. Triage. Routing. Classification. Even automation of multistep processes.

And that’s when the engineering team starts sweating.

What Breaks First

Reliability: Traditional systems give you consistent behaviour. LLMs do not. You quickly discover that “same input, same output” is more of a polite suggestion than a guarantee. One day your model returns a perfectly sensible response. The next, it answers in haiku.
Testing: You want to write a unit test, but how do you assert that the model’s answer is “correct enough”? You could snapshot the output, but good luck when the model updates and your CI fails because the wording changed from “Sorry, we can’t help with that” to “Unfortunately, that is not possible at this time.” Technically identical. Functionally frustrating.
Observability: Your metrics say the service is fine. The logs say the requests went through. But now someone’s cat has been enrolled in your loyalty programme, and no one knows why. Logging the prompt and response helps—until you have to sift through 40MB of “conversation context” to figure out what went wrong.
Explainability: Product wants to know why the model recommended Option C. You’re not sure. You check the prompt history. You check the retrieval context. You rerun it with the same inputs and get Option A. Congratulations: you’re now gaslit by your own software.
Versioning: Models evolve. APIs change. And unless you’re pinning models with the kind of paranoia reserved for nuclear launch codes, one day something will break and your only clue will be: “We upgraded to gpt-4.5-turbo-vision-xl-plus.”

Engineering Feels

We’re not vibe coding. We’re running vibe ops.

“We shipped a feature, but now no one knows how it works.”
“Can we make it… less clever?”
“Why is the model recommending refunds to everyone in New South Wales?”

The Good News

This phase, despite the chaos, can actually reduce complexity. You can offload legacy logic to a model and let it learn from examples instead of writing brittle code that covers every edge case. In the right use case, it’s incredibly powerful.

But you have to build guardrails. Fallbacks. Manual overrides. And you’ll probably need to design a whole new kind of monitoring system—one that doesn’t just check for uptime, but for believability.

And once the AI starts calling your services, someone’s going to ask why it’s not just running the whole show.

Wrapping Up: From Bolting On to Breaking Through

So far, we’ve looked at the early curve—where teams are cautiously experimenting with AI, layering it onto existing systems, and finding both surprising wins and new kinds of complexity. At this stage, AI is still something external. It answers questions, routes tickets, writes helpful blurbs, but it hasn’t started driving.

That shift—from assistant to actor—is where things start to get strange. And powerful. And slightly terrifying.

Coming Up Next…

In Part 2 of this series, “From Agents to Architectures: The Future of AI-Native Systems”, we’ll explore what happens when AI stops waiting for permission and starts orchestrating your workflows. From agent-led systems to AI-native architectures, we’ll walk through what breaks, what works, and what the future of “software systems” might really look like.

About the Author

Tim Huegdon is the founder of Wyrd Technology, a consultancy specialising in helping teams achieve operational excellence through pragmatic, value-driven Agile and engineering practices. With deep experience in software architecture and technical leadership, he works with organisations to integrate emerging technologies—like generative AI—without falling into hype-driven traps. His focus is on building systems that are adaptable, resilient, and grounded in real-world value.

Tags:AI, Architecture, Hype, Software Engineering