Who Writes the Code? A Practical Guide to Agentic AI in Software Development

25 Jun 2026 | 10 min. read

Six months ago, a senior engineer at AWS decided to try rebuilding a piece of infrastructure he’d written by hand nearly 20 years ago. He used an AI agent. He and a colleague had originally spent four months on it. He figured the agent would knock it out in a night.

Five nights later, he was still at it. Babysitting every step. Getting output that looked right but wasn’t.

On the fifth night, he figured out the problem. He hadn’t given the agent the tools to test its own output. Without that, the agent was producing confident, fast, and wrong results — and had no way to know it.

That engineer was Swami Sivasubramanian, VP of Agentic AI at AWS. His conclusion, published in GeekWire: “The bottleneck is not the time it takes to build something. It’s crafting the right specification and the tests.”

If the VP of Agentic AI at Amazon needed five nights to learn that lesson — what does it mean for the rest of us figuring out agentic AI in software development right now?

From “Suggest the next line” to “Here’s the feature. I tested it.” How Agentic AI in Software Development Actually Changed

AI in software development isn’t new. Developers have used autocomplete and code suggestions for years. The question was always: useful shortcut, or distraction?

What changed in the last 18 months isn’t the tools. It’s what the tools can actually do. And understanding that shift is the starting point for anyone making decisions about agentic AI in software development today.

According to Stack Overflow’s 2025 Developer Survey — 49,000+ developers across 177 countries — 84% now use or plan to use AI in their workflow. But the number isn’t the point. The nature of the work is.

A year ago, an AI coding tool suggested your next line. You stayed in control of every decision. You prompted, it suggested, you accepted or rejected. One step at a time. Today, you give an agent a goal — “implement the payment retry logic from this ticket” — and walk away. The agent reads your codebase, plans the approach, writes across multiple files, runs the tests, sees what fails, fixes it and comes back with a working implementation.

You weren’t in the loop for any of that middle part. Engineers already moved from faster autocomplete to a fundamentally different way of assigning work.

Read: AI-First Teams: How Roles, Skills, and Expectations Are Shifting in 2026

Levels of autonomy of Agentic AI in Software Development

Not all “agentic AI” is equal. The market is full of tools claiming to be autonomous that are really just slightly smarter autocomplete. Before evaluating any vendor’s approach to agentic AI software development, it helps to know what level you’re actually buying.

Level	What the AI actually does	Where humans come in
L1	Suggests the next line or function	You approve every single change
L2	Generates multi-line blocks, completes whole functions	You review before accepting
L3	Executes multi-step tasks, fixes its own errors	You review the finished output
L4	Plans and builds entire features autonomously	You set the goal, review the result
L5	Runs independently across full projects	You define the outcome

The honest picture: most production teams in 2026 are working at L2–L3. L4 exists in controlled, well-specified environments. L5 is on roadmaps and keynote slides, not in your codebase.

This matters because a lot of vendors are selling L4–L5 language while actually running L2. Knowing the difference helps you ask the right questions and avoid paying for something that doesn’t exist yet. At JetSoftPro, we run at L2–L3 for most workflows. L4 for tasks where the spec is tight, the test coverage is strong, and the scope is clearly bounded. That’s not a limitation, but deliberate choice for this fast-changing time.

Read: The End of “Cheap Outsourcing”: Why Choosing the Right Software Development Partner Is Now a Strategic Decision

Where AI agents in software development are actually working right now, across the delivery pipeline

Still asking if the team uses AI? Then you get an answer that doesn’t reflect the real state of affairs and blocks your AI development. The really useful question is “which parts of the work are agents actually handling?”

Here’s where AI agents can bring the most productivity to your team (when properly configured and used, of course):

Before a line of code is written
Agents read requirement briefs, flag missing edge cases, spot contradictions between user stories, and generate structured acceptance criteria. The PM still owns the decision, but the gaps appear before the sprint starts, not after it ends.
Architecture and design decisions
Agents cross-reference new feature proposals against the existing codebase, identify potential conflicts, and flag where a proposed approach has caused problems in similar contexts before. The tech lead still decides. They just decide faster and with fewer blind spots.
Writing the code itself
AI now generates 41% of all code globally, according to Gartner. Agents handle multi-file implementation, respect existing conventions, manage dependencies, and iterate on compiler feedback without anyone pressing a button between cycles.
Testing — and this one surprises people
Most people assume AI saves the most time in code generation. Cisco’s engineering team found the opposite: their agentic system cut total execution time by 65%, with the biggest gains in testing, not writing code. Agents generate test suites, run them, read the failures, fix the underlying code, and repeat.
Review, CI/CD, and documentation
Agents embedded in CI pipelines flag regressions before a human reviewer opens a PR. Documentation gets generated from code context and commit history — not written six weeks after the feature shipped, by someone who wasn’t there.

The practical result of all of this: Amazon rebuilt a core Bedrock inference engine — originally scoped for 30 engineers over 12–18 months — with 6 engineers in 76 days. AWS teams that restructured their workflows around agents saw a median 4.5x productivity gain. Teams that just added the tools to their existing process saw much less.

So why are 40% of agentic AI software development projects getting canceled?

Gartner’s experts predicted that more than 40% of agentic AI projects will be canceled by the end of 2027. The reason isn’t that the technology doesn’t work. It’s that teams are deploying it wrong.

Here are three failure patterns show up more than any others:

The spec problem
An agent doesn’t read between the lines. It executes on what you give it. Vague tickets, loose acceptance criteria, undocumented edge cases — the agent will produce code that is technically plausible and functionally wrong. Fast, confident, and wrong.
This is exactly what happened to Sivasubramanian at AWS. And if it can happen to the VP of Agentic AI at Amazon, it can happen to any team that treats agentic AI as a shortcut around good engineering practice.
The “we’re measuring the wrong thing” problem
Most teams track AI adoption by usage: how many developers are using the tool, how many prompts were sent, how many tokens were consumed. None of that tells you whether the output is any good.
The teams getting real results measure output quality — independently written tests, architectural review, security checks — before AI-generated code reaches a human reviewer. That requires building an evaluation layer. Most teams skip it entirely and then wonder why they’re spending more time fixing AI mistakes than they saved in generation.
The governance vacuum
Only 21% of companies globally have a mature governance model for AI agents, according to Deloitte. That means when an agent does something unexpected — makes a decision outside its intended scope, racks up unexpected infrastructure costs, introduces a vulnerability that slipped past review — nobody has a clear answer to: who approved this? Under what conditions? What’s the rollback?
One specific surprise most teams don’t anticipate: MCP token costs can run 160x higher than equivalent CLI operations. That’s not a footnote. That’s a budget line that can blow up quietly over a few weeks if nobody’s watching it. Governance for agentic systems isn’t the same as traditional software governance because you’re managing autonomous decisions.

What actually changes for you when your engineering partner runs on Agentic AI

Most writing on agentic AI in software development talks about what it does for developers. Almost none of it talks about what it means for the product owner, the CTO, or the client.

Here’s the honest version.

You get more speed — if you invest in better input

An agentic engineering partner can move faster than a traditional team. But that speed only materializes if the specifications are good. Fuzzy requirements don’t slow agents down. They generate confident, fast output in the wrong direction. Expect your partner to push harder for clarity upfront. That’s not process overhead. That’s how you actually get the speed.

Code volume goes up. Your review work changes shape.

When an agent delivers a feature implementation in hours instead of days, your review capacity becomes the constraint. The skill shifts from writing code to evaluating it — catching what the agent got subtly wrong, validating that it meets the actual goal, not just the literal spec. Your team needs to develop that muscle. Good partners help you build it.

The questions you ask vendors need to change

Don’t just ask “do you use AI?” Ask insread:

At what autonomy level do you operate for this type of task?
How do you evaluate AI-generated output before it reaches review?
What does human oversight look like at each stage?
How do you govern agent infrastructure costs?

Partners who answer these precisely are running agentic AI properly. Partners who give you a tool list and a productivity stat are using the word “agentic” as a marketing badge.

Your team structure and your vendors’ teams look different

Gartner predicts that by end of 2026, 75% of developers will spend more time on orchestration and architecture than on writing code directly. An engineering partner already operating this way ships with smaller, more senior teams — focused on goals and validation, not headcount.

What we see at JetSoftPro: The clients who get the most from AI-augmented delivery stopped asking “how many developers are assigned to my project” and started asking “what’s the delivery velocity and what’s the defect rate.” That reframe changes everything about how you measure whether the engagement is working.

What’s already being built for 2027 and why your architecture decisions today matter

Single agents handling one task at a time is where most teams are now. What’s moving into production next is more interesting and more complex.

Multi-agent systems. An orchestrator agent coordinates a network of specialist agents — one for code generation, one for testing, one for security scanning, one for documentation, running in parallel, passing results between each other, completing work that used to require a whole sprint in hours.

Gartner tracked a 1,445% surge in enterprise inquiries about multi-agent systems. IBM research shows multi-agent architecture reduces process handoffs by 45% and accelerates decision cycles threefold.

Two protocols are becoming the infrastructure layer that makes this possible: MCP (Model Context Protocol), developed by Anthropic and now co-governed with OpenAI and Google through the Linux Foundation, and A2A (Agent-to-Agent) for cross-system communication. These define how agents plug into your existing tools, databases, and APIs.

If you’re making software architecture decisions right now, these protocols are relevant. Not because you need to build with them today but because building against them means a painful migration in 18 months.

The governance gap is also worth building toward now. Only 21% of organizations have mature agent governance in place. The ones who build it early won’t just avoid Gartner’s predicted failures. They’ll scale when competitors are still arguing about who’s responsible for what the agent did last Tuesday.

Agentic AI in software development works. It’s in production. The productivity numbers are real.

It also breaks in predictable ways — vague specs, no evaluation, no governance — and the teams that skip those fundamentals are exactly the ones Gartner is counting in that 40% cancellation rate.

The difference between a team that ships faster with agentic AI in software development and a team that creates a mess faster isn’t the tools. It’s the engineering discipline around the tools.

That’s what we focus on at JetSoftPro. And it’s what you should be looking for in any engineering partner making agentic AI claims in 2026.

JetSoftPro builds AI-native software products and has been an engineering partner to companies across the US, UK, and EU for over 20 years. Want to know how agentic AI actually fits into your delivery process without the marketing layer? Let’s talk.