Building Career Ladders for AI-Era Engineering Teams
Published:
Ask an engineering leader who has been navigating AI adoption for the past year to describe their team’s operating model, and most can give you a reasonable answer. AI agents produce; engineers specify and evaluate. The quality of direction and the reliability of evaluation are now the primary determinants of team output. Ask the same leader to describe their career ladder, and the answer is almost always less clear. Some have a framework they inherited or copied from a larger organisation’s public engineering blog. Some have been meaning to revise the framework for the past six months. A few have decided it can wait until things settle down.
None of those positions is as safe as it feels. The absence of an updated framework does not mean the question of progression is unanswered. It means it is being answered implicitly, by a framework built for different work, against which the most important capabilities in an AI-augmented team are invisible.
The structural problem this creates is not theoretical. Engineers who are developing genuine evaluation depth, learning to write specifications that produce correct AI output, and building the governance that their organisation will depend on are being assessed against criteria that do not name any of those things. Managers who can see the difference between an engineer who is progressing and one who is not cannot articulate it in terms that their organisation’s current framework validates. Hiring processes are selecting for attributes that describe the previous generation of the role. The gap between what the organisation says it values and what it actually rewards widens quietly until it surfaces as something else.
The Quiet Cost of Ambiguity
The problems that emerge from the absence of a clear career framework are not immediately visible. They are the kind that compound slowly, going underground until they appear as something more legible and more urgent.
The compounding works through three costs in sequence.
-
Engineers cannot self-assess. Without a framework that names the competencies that matter, engineers cannot accurately assess their own position or trajectory. That is not because they lack self-awareness. It is because self-assessment is a cognitive process that requires something external to work against. An engineer who is genuinely developing better specification and evaluation skills than she had two years ago has no shared vocabulary to describe that development, and no way of knowing whether her organisation sees it as progression rather than the expected baseline. The uncertainty is not neutral; it accumulates as disengagement, as the quiet calculation that somewhere else would reward her more clearly.
-
Managers cannot give meaningful feedback. The feedback a manager can offer is bound by the framework available. Without a framework that describes specification depth, evaluation reliability, or governance responsibility, feedback defaults to what the framework does describe, which in most inherited frameworks is something close to production output. The engineer receives feedback that is technically accurate about what the framework measures but does not address the capabilities that will actually determine whether she progresses. She hears that her work is good. She does not learn whether the things that will actually move her forward are developing.
-
Organisations cannot plan. Without a shared language for what seniority means in the team’s actual operating model, organisations cannot identify where capability gaps are forming. Hiring criteria remain anchored to what the role previously required, which is increasingly not what it requires now. Development investments are made toward the wrong targets. Promotion decisions become inconsistent because different managers work from different implicit models of what progression means, and neither model is right for the work being done.
These costs do not typically announce themselves. They surface, eventually, as retention problems that are hard to diagnose, as promotion decisions contested without a clear basis for resolution, or as performance conversations that get stuck because neither party can make the disagreement precise. By the time any of those become visible, the underlying condition has usually been developing for some time.
AI adoption accelerates this dynamic in a specific way. When the primary competencies of a role change faster than the framework that describes them, the gap between what the organisation says it values and what it actually rewards widens quickly. Informal consensus fills the gap in the absence of a formal framework, but it is anchored in what people already understand and agree on. In a stable environment, that lag is manageable. In an environment where the nature of the work is changing materially, it is not.
The Wrong Foundation
Reach for a career ladder from a major technology company’s public engineering blog, and you will almost certainly find a framework built around production. Who writes the most complex code? Who delivers the most independently? Who ships the most features without requiring oversight? Who solves the hardest technical problems under the least guidance? These are the dimensions that have historically organised seniority in engineering organisations, and for understandable reasons: for most of the history of software engineering, production was the primary differentiator. Complex code, delivered independently, was tangible evidence of depth.
That framework was always a proxy. What it was actually trying to measure was judgment, technical depth, and independence of thinking. Complex production was a reasonable proxy for those things, so long as it required genuine understanding to achieve. The proxy became the measure, and the measure became the framework.
The proxy fails in an AI-augmented team for a straightforward reason: AI now handles production. The complexity of what an AI agent generates is no longer correlated with the depth of the engineer directing it. An entry-level engineer with adequate prompt technique can instruct an AI to produce code whose complexity would previously have indicated senior-level capability. The output complexity signals the AI’s capability and the engineer’s ability to direct it. It is no longer a reliable signal of the engineer’s understanding of what “correct” looks like or of whether she can detect when the AI has got it wrong.
This creates a specific and compounding problem for borrowed frameworks. An organisation that copies a production-centric ladder and applies it to an AI-augmented team will not simply have a framework that fails to capture new capabilities. It will have a framework that actively rewards the wrong things and makes the right ones invisible. The engineer who produces the most complex output, most likely by directing AI agents widely without rigorous evaluation, will appear most senior. The engineer who is developing the precision and rigour of genuine evaluation depth will be less visible, possibly less rewarded, because she is doing less production and more assessment.
The compounding matters here. An organisation that makes this category error early in its AI adoption journey may not feel the consequences immediately. The framework misaligns gradually as AI adoption deepens. By the time the misalignment becomes clearly visible, it has likely already shaped promotion decisions, compensation bands, and the informal model of what excellent engineering looks like in the organisation. Revising a framework that has been embedded for two or three years and used in performance reviews and levelling conversations is substantially harder than starting correctly.
Copying a framework from Google, Stripe, or any comparable organisation is not a shortcut. Those frameworks were written to describe the engineering work those organisations were actually doing at the time. Some of them predate the widespread use of AI agents by several years. Applying a description of work from one operating model to structurally different work is a category error, and the fact that the original framework was written by excellent engineers at respected organisations does not change that. Starting from a framework that describes the wrong thing and trying to adapt it is more difficult, not less difficult, than building from the dimensions the actual work requires.
What Advancement Actually Means Now
A previous piece in this series introduced the direction/evaluation loop as the primary frame for understanding how AI-era engineering teams operate. AI agents produce. Humans specify what should be produced and evaluate whether what was produced is correct, appropriate, and safe to proceed with. That loop, running continuously at every level of the organisation, is the model within which career progression now needs to be defined.
In that model, seniority is primarily about three things: specification depth, evaluation reliability, and governance responsibility.
-
Specification depth is the ability to translate ambiguous requirements into precise, complete specifications that produce correct AI output. The word “complete” carries more weight here than it might initially appear. A human implementer encountering an underspecified requirement can exercise judgement: ask a clarifying question, make a reasonable assumption, flag the ambiguity during review. An AI agent encountering an underspecified requirement will typically produce something plausible, without flagging the gap. The specification has to be right before the AI acts, not corrected after. Engineers who have developed genuine specification depth have learned, through experience, where requirements tend to break down. They know what ambiguity looks like in a business rule, where interface contracts fail when left implicit, and which constraints are assumed in conversation but need to be written down before an AI agent will handle them correctly. This is not a prompt engineering technique. It is a form of domain and systems understanding that is built gradually through the experience of incomplete specifications and the work of understanding why.
-
Evaluation reliability is the ability to assess AI-generated output accurately: to know when it is correct, when it is subtly wrong, and when it is wrong in ways that look superficially correct. That last category is the hard one, and it is the one that distinguishes genuine evaluation depth from prompt fluency. AI agents can produce output that passes all the stated tests and satisfies the formal requirements, yet be incorrect. The error only becomes apparent when the output is deployed, or when an engineer with sufficient depth reviews it carefully enough to notice that something is slightly off. That kind of evaluation cannot be achieved through process rigour alone. It requires having built things, being wrong about them, and, through that experience, developing an internal model of what correct looks like that no test specification can fully capture. Prompt fluency is a real skill; it is not a substitute for this.
-
Governance responsibility is the ability to define and maintain the constraints within which AI agents operate: architectural boundaries, security requirements, business rules, quality standards, the scope of what agents are and are not permitted to do in a given context. “The AI-Era Engineering Org in Practice” covers the governance layer in more detail. What matters here is that governance responsibility scales with seniority. At the individual level, it is an understanding of which constraints apply to a specific piece of work. At the team level, it is the ability to consistently set and communicate those constraints. At the staff and principal level, it is an ability to maintain architectural coherence across multiple teams and a codebase that AI agents are continuously modifying, which requires knowing which constraints are load-bearing and which can safely vary.
These three dimensions need to be explicitly present in a career framework, not bolted on as annotations to a production-centric structure. A framework that adds “uses AI tools effectively” as a note at each level is not the same thing as one that makes specification depth, evaluation reliability, and governance responsibility the primary axes of progression. The former names an activity; the latter describes the capability that activity requires, making that capability the thing the organisation is committed to developing and recognising.
The difficulty is that these capabilities are harder to observe than production output. Production output is tangible: pull requests, commit histories, shipped features. Specification depth is not visible until a specification fails, and even then, the attribution may not be clean. Evaluation reliability is most visible when an evaluator catches something that would otherwise have shipped; it is invisible when the evaluation is done well, and no one notices that a subtle error was caught. Governance responsibility is most visible in its absence, when a constraint that should have been in place was not. None of this is an argument against making these the primary dimensions of a framework. It is an argument for building the observation mechanisms, the structured practices and review processes that make assessing these capabilities possible. The framework creates the expectation; the organisation then needs to ensure the practices that make meeting the expectation visible.
The Entry-Level Question
The efficiency argument for reducing entry-level hiring in an AI-augmented team is not difficult to construct. AI agents can produce what an entry-level engineer would produce at higher velocity, without the supervision overhead that entry-level engineers require. If the bottleneck in an AI-era team is evaluation depth, and entry-level engineers do not yet have that depth, where is the organisational case for hiring them?
The argument is wrong about the time horizon, and the error is consequential.
Entry-level roles exist for capability pipeline reasons, not for efficiency. Organisations need a continuous supply of engineers who have developed genuine evaluation depth, and that depth cannot be acquired without building things, being wrong about them, and working through why. The understanding a Staff or Principal Engineer brings to governance decisions cannot be absorbed from documentation or prompt-level observation. It is built through years of production experience, including instances where things go wrong in instructive ways. The evaluation capacity that makes AI-augmented delivery reliable at scale is downstream of a pipeline that begins at the entry level. Interrupting that pipeline to capture a short-term efficiency gain has consequences that appear several years later, not immediately.
An organisation that eliminates entry-level hiring today will not notice the problem in the next quarterly review. The senior engineers it already has will continue to provide depth of evaluation over the next several years. What it will notice, five to seven years from now, is that its senior pipeline is thinning and cannot be replenished from within. Engineers who would have become the evaluators and governance owners of the next generation were never hired. The depth they would have built was never developed. The organisation becomes dependent on hiring experienced engineers from outside, in a market where those engineers are increasingly scarce, because other organisations have made the same calculation.
The organisations that have thought carefully about this question do not simply preserve entry-level hiring unchanged. They redesigned the entry-level role. The traditional model, give entry-level engineers implementation tickets and lets them learn through doing, is not simply inefficient in an AI-era team. It is no longer teaching the thing it used to teach. The repetition used to build evaluation depth through production is not available in the same form when AI is producing. What the entry-level role needs to be, in an AI-augmented team, is something closer to a structured apprenticeship in evaluation and specification:
- Deliberate exposure to the governance layer early, so that engineers develop an understanding of the constraints within which AI operates before they are responsible for setting them.
- Structured practice in reviewing AI-generated output against specifications, with explicit feedback on whether the evaluation was accurate.
- Explicit mentorship toward the depth that the operating model will require, rather than an implicit assumption that depth will develop naturally through unconstrained implementation work.
Designing that role well is harder than the traditional entry-level engineering position. It requires thought about what constitutes adequate structured practice in evaluation at the early stages of a career, and what progression toward genuine depth looks like from a starting point where production repetition is no longer the primary vehicle. These are questions that most organisations are only beginning to work through. They are answerable and worth answering, because the alternative is not an honest efficiency gain. It is a deferred cost, invisible until it is very much not.
Levels, Titles, and the Politics of Both
Career ladders require internal level identifiers to function. Whether they are numbers, letters, or some other notation, they serve a practical purpose: making seniority comparable across tracks, anchoring compensation bands, and giving progression a precision that job titles alone cannot provide. A Senior Engineer in the software track and a Senior Quality Engineer in the quality track both have a level that places them in relation to each other and to the rest of the organisation, regardless of whether their titles map cleanly onto a shared external hierarchy.
The trouble is that level identifiers and job titles serve different purposes and different audiences, and organisations that conflate them produce frameworks that function poorly at both.
-
Internal comparability vs external legibility. A level number has no meaning outside the organisation. Engineers carry their job title externally, not their level number. “L5 at Acme Corp” means nothing to a prospective employer or to an engineer at a different company. “Senior Software Engineer” means something specific and portable. Titles carry industry meaning; level numbers carry internal meaning. Both are necessary, and neither substitutes for the other. A framework that treats them as the same thing, or that uses level codes as external-facing titles, either limits the utility of levels internally or leaves engineers without titles they can use meaningfully in the industry.
-
Cross-track comparability vs within-track hierarchy. A level scheme that places a Staff Software Engineer and a Senior Engineering Manager at the same level is internally coherent. Both roles operate at cross-team scope; both are compensated equivalently; both represent the same level of organisational seniority. But the titles corresponding to those levels in their respective tracks carry different connotations of seniority when read against industry conventions. Both people will compare their title to peers at other companies and draw conclusions that the internal level scheme was not designed to produce.
-
The political reality of title seniority. When a level number is mapped to job family titles across multiple tracks, someone will almost always notice that the title that sounds most senior at a given level is not theirs. A Principal Engineer sounds more senior than a Senior Engineering Manager to some external readers; a Director of Engineering sounds more senior than a Principal Engineer to others. The mapping will always produce at least one person who feels that their title undersells their seniority relative to a colleague at the same level in a different track. This is not a design flaw that can be engineered away by selecting better titles. It is a structural property of any multi-track ladder that maps different tracks to industry-standard titles with different seniority connotations. Attempting to eliminate it by inflating titles destroys external legibility. Attempting to eliminate it by stripping titles down to level codes or removing domain context leaves engineers without identifiers that carry meaning in the industry.
The recommendation is to accept the tension as a human reality rather than treat it as an unsolved design problem. Be transparent about the level-to-title mapping and what each serves. Anchor internal conversations about seniority to the level and its descriptors rather than to the title. Accept that the job family title is the external-facing identifier, carrying the industry meaning that matters for how people represent themselves and are received, while the level is the internal truth that governs compensation, promotion eligibility, and organisational standing. This is not a compromise; it is a correct partitioning of what each identifier is for.
-
Organisational intent vs individual autonomy. This tension compounds the other three: the gap between what the organisation intends and what individuals do independently. People put their job title on LinkedIn, whether the organisation has approved the wording or not. An organisation that has not worked through how internal levels map to externally meaningful job family titles will find that engineers fill the gap themselves, inconsistently, in ways that may not reflect the organisation’s intent. This is not primarily a compliance problem; it is an information problem. An organisation that has a coherent, clearly communicated mapping between levels and job family titles gives its engineers a clear answer to the question they are trying to answer when they update their public profile. Inconsistency in external representation is usually a symptom of internal ambiguity, not of individual non-compliance.
The Calibration Problem
A career framework built around specification depth, evaluation reliability, and governance responsibility describes where AI-augmented engineering work is heading, not where every organisation is today. Most engineering organisations are at some point along the adoption curve. AI tools are in use; the direction/evaluation loop is beginning to take shape, but the operating model is not yet fully settled, and the distribution of work between AI production and human production varies significantly from team to team and context to context.
Applying a framework that assumes fully settled AI augmentation to an organisation that is six months into adoption creates confusion rather than clarity. Engineers who were hired and developed against a production-centric standard cannot be assessed fairly against descriptors that assume specification and evaluation as primary competencies when the daily work does not yet reflect that. The framework becomes aspirational in the wrong sense: rather than describing where the organisation is going and giving people a clear path toward it, it describes a world that does not yet exist in this context and produces levelling conversations that feel disconnected from reality.
The structure of a well-designed framework can be stable across different stages of adoption: the three dimensions described above apply regardless of where an organisation is in its AI journey. What requires calibration are the specific descriptors. What does adequate specification depth look like for a mid-level engineer at this organisation, today, given the operating model the team is actually running? The answer to that question will differ substantially between an organisation that has been running AI-augmented delivery for three years and one that is still in the early stages of integrating AI tools into its workflow.
Getting calibration wrong in either direction destroys the framework’s utility. Overstating AI adoption and writing descriptors that assume a settled AI operating model when the organisation is still in transition produce a framework that engineers read as aspirational rather than operational. They recognise that the work being described is not the work they are doing, and the framework becomes a statement of intent that cannot be used for honest self-assessment or substantive feedback. Downplaying adoption and keeping the production-centric frame to avoid difficult conversations about changing expectations produces a framework that actively validates the wrong things. It tells engineers that what matters is still production output, at exactly the moment when the organisation needs them to be developing specification and evaluation capabilities. The capabilities the operating model actually requires develop more slowly than they would if the framework were to name and incentivise them.
The right approach is to hold the structure and the calibration separately. Describe the target state explicitly: this is where the organisation is heading, this is what seniority means in a fully AI-augmented operating model, and the framework is built around those dimensions. Then calibrate the current descriptors to reflect an honest assessment of where the organisation actually is: what those dimensions look like in the team’s current context, and what “adequate” means given the work happening today rather than the work that will be happening in three years. The calibration is not a one-time decision; it is a live question that should be revisited as AI adoption progresses. A career framework that is not reviewed as the operating model evolves will drift from the work it describes at the same quiet pace as the work itself changes.
The calibration question to start with is not technical. It is the honest one: where is this organisation actually in its AI adoption journey? Not where public communications say it is, not where leadership would like it to be, but where the team’s daily work places it. That question, answered honestly, determines how far the current descriptors need to be from the target and where the framework needs to be explicit about transition rather than assuming completion.
Pulling It Together
The argument of this piece can be stated plainly. Existing career ladders describe production seniority. In an AI-augmented team, production has moved to the AI agent, which makes production-centric frameworks not just incomplete but actively misleading. Specification depth, evaluation reliability, and governance responsibility need to be the primary axes of progression, not footnotes to a framework built for different work.
Entry-level roles need to be preserved but redesigned. The efficiency argument for reducing or eliminating them is understandable, but wrong about the time horizon. The evaluation depth that makes AI-augmented delivery coherent at scale is built over years and the pipeline begins at entry level. The role needs to change: less unconstrained implementation, more structured practice in evaluation and specification, explicit apprenticeship toward the depth the operating model will require.
The structural questions of levels and titles do not resolve themselves. The tensions between internal comparability and external legibility, between cross-track coherence and within-track hierarchy, between organisational intent and what individuals put on LinkedIn, are real and recurring. They are best managed with transparency: anchor seniority conversations to the level and its descriptors, and accept that the job family title serves external representation rather than internal truth.
And the framework needs to be calibrated to where the organisation actually is, not where it hopes to be. The structure can be stable; the descriptors need to reflect the team’s current reality while pointing toward where it is heading.
This is genuinely difficult work. Career frameworks are political artefacts as well as structural ones. Building one surfaces disagreements about what the organisation values and what seniority means that are easier to leave implicit. Naming those things precisely, which is what a good framework requires, is uncomfortable in predictable ways. People who built their careers against production-centric definitions of seniority will find aspects of the shift difficult to accept. Organisations that have been promising entry-level engineers a career path built on implementation work will need to have honest conversations about how that path is changing.
The discomfort is not a reason to defer the work. The implicit framework that currently governs progression in most organisations navigating AI adoption is already producing the wrong signals, and the cost of that misalignment compounds quietly. The moment to replace it with something designed for the actual work is before those costs become acute, not after.
If you are at the point of sitting down to build or revise a career framework for an AI-augmented engineering team, Wyrd Technology can help. We have developed foundational frameworks built around these dimensions, which we calibrate to each organisation’s specific context, stage of AI adoption, and team structure. The foundation exists; the work is in making it fit. If that is where you are, get in touch.
About the Author
Tim Huegdon is the founder of Wyrd Technology, a consultancy that helps engineering leaders design organisations fit for the AI era. With over 25 years of experience in software engineering and technical leadership, he works with CTOs and engineering directors to rethink operating models, redefine role structures, and build the governance practices that allow AI-augmented teams to deliver at pace without sacrificing quality or architectural coherence.