How We Deliver Software Faster With an Agent Team

The Headline

We build production software with a team of AI agents organized like a real Scrum team. The split is 70% AI, 30% human. The reason to care: delivery is dramatically faster — without cutting compliance, quality, or the audit trail.

Industry benchmarks back the speed claim. Cognition’s Devin now reports a 67% PR merge rate in production across customers like Goldman Sachs, Santander, and Nubank. Independent 2026 estimates put agent-led teams at compressing 4–8 month projects into roughly two weeks. We’re on that curve.

What we are not claiming: full autonomy. The 100%-autonomous pitch you see elsewhere is marketing. Our 30% human is where the irreplaceable work happens — judgment calls, stakeholder conversations, compliance sign-off, orchestration design. The 70% AI is everything else, and it moves at machine speed.

This post is the honest tour: the team, the orchestration, the stack, and the four real failures we’ve already hit in Sprint 0 — which is in flight as this goes up.

Why a Team Is What Makes It Fast

One big agent doesn’t scale. Loaded with the entire backlog, every module, every constraint, every review comment — it runs out of attention long before it runs out of tokens. Outputs drift. Decisions get re-litigated. The same bug ships twice. That’s the actual bottleneck on speed.

A team of specialized agents removes it. Each one gets a narrow mandate, a clean system prompt, and the right toolset. The Backend Lead never thinks about Tailwind. The QA Automation agent never has to remember the auth middleware. They run in parallel — frontend, backend, tests, infra, docs — against a shared contract. Work that used to be sequenced by headcount now happens simultaneously.

The pattern is mainstream in 2026 — Scrum.org publishes official guidance, and MetaGPT, CrewAI, LangGraph, and AutoGen all ship production tooling. We are not claiming novelty. We are claiming we run it carefully, with humans where humans still matter.

The Roster

Fourteen agents, mirroring a real org chart:

Role	Responsibility
Product Owner	Defines backlog, prioritizes features, stakeholder liaison
Scrum Master	Facilitates sprints, removes blockers, process guardian
Tech Lead / Solution Architect	System design, tech decisions, code review
Backend Developer (×2–3)	API, database, business logic, integrations
Frontend Developer (×2–3)	Nuxt 4 UI, components, state, accessibility
DevOps Engineer	CI/CD, deployment, Docker, monitoring
QA Engineer (×2)	Manual testing, automation, regression suites
UI/UX Designer	Wireframes, design system, prototypes
Business Analyst	Requirements, process mapping, documentation

Each agent lives in its own Markdown file with a complete system prompt — role, scope, modules it owns, modules it must not touch, escalation paths. The whole team is legible from the repo layout:

hospital-system/
├── ITT_Hospital_Management_System.md
├── AGENT_TEAM_STRUCTURE.md
├── apps/
│   ├── api/          # Hono + TypeScript + Drizzle ORM
│   └── web/          # Nuxt 4 + TypeScript
├── packages/         # Shared types
├── agents/
│   ├── po-agent.md         arch-agent.md
│   ├── sm-agent.md         be-lead.md   be-dev1.md   be-dev2.md
│   ├── fe-lead.md          fe-dev1.md   fe-dev2.md
│   ├── devops-agent.md     qa-lead.md   qa-auto.md
│   └── ux-agent.md         ba-agent.md
├── sprints/
│   ├── sprint-0-foundation.md
│   ├── sprint-1-user-management.md
│   └── sprint-2-patient-management.md
├── tasks/            # Drop task files here to activate agents
└── orchestration/
    ├── sprint-kickoff.md
    └── agent-spawn-commands.md

System prompts live in version control. Sprint plans live in version control. Every decision an agent makes is traceable back to the brief it was given.

How Agents Actually Run — Ralph Loop

The orchestration layer is Ralph Loop: a long-running process that watches the tasks/ directory, picks up task files, dispatches them to the right agent, captures output, and reports status. A cron job emits a sprint status update every five minutes — agent state, git status, story progress, blockers — straight into our team channel.

To activate an agent, we drop a task file describing the goal and the context it needs:

delegate_task({
  tasks: [
    {
      goal: "Build patient registration API",
      context: "[From be-dev1.md] + project context",
      toolsets: ["terminal", "file"]
    },
    {
      goal: "Build patient registration UI",
      context: "[From fe-dev1.md] + project context",
      toolsets: ["terminal", "file", "browser"]
    }
  ]
})

Two agents, one ticket, running in parallel against a shared API contract. The Tech Lead agent reviews both before merge. Humans approve.

The Stack

Layer	Choice
Frontend	Nuxt 4 + TypeScript
Backend	Hono + TypeScript
ORM	Drizzle
Auth	JWT + RBAC middleware
Infra	Docker Compose for local, GitHub Actions for CI
Tooling	ESLint + Prettier + Husky

Boring on purpose. Healthcare doesn’t reward novelty in the stack — it rewards traceability, audit logs, and a small surface for compliance review. We’re chasing speed in delivery, not in tech-debt accumulation.

The Sprint Cadence

Standard two-week Agile sprints. The cadence is familiar; what changes is who’s in the room.

Sprint	Focus
0	Foundation: monorepo, schema, auth, RBAC, Docker, CI
1	User management + RBAC roles
2	Patient management
3–4	Billing, Pharmacy, Lab integration
5–6	Ward management, Reports, advanced RBAC
7	Integration testing, UAT prep
8	Bug fixes, performance, docs
Release	Production deploy, training, go-live

Inside each sprint, all the usual ceremonies fire — planning, standups, code review, testing, demo, retro — with the agents in the seats humans would normally occupy. The unlock is the escalation protocol, which is explicit and short:

Blocker        → SM-Agent  → (escalate if needed) → PO-Agent / Architect
Technical      → Architect
Scope change   → PO-Agent
Quality issue  → QA-Lead

Sounds bureaucratic for AI. It’s the opposite — it keeps each agent’s prompt clean. The Backend Developer doesn’t reason about scope; it forwards scope questions to PO and gets back a one-line directive. Less context per agent means cleaner outputs and fewer wrong turns.

Where We Are Right Now — Sprint 0, Week 1

This is not a hypothetical. Sprint 0 is in flight as this post goes up. Here’s the live snapshot:

Scaffolding done: monorepo created, Nuxt 4 web initialized, Hono + Drizzle API initialized, shared types package up, Docker Compose configured, Husky+Prettier+ESLint wired in.
Schema + auth + RBAC middleware present; routes for auth, users, patients scaffolded.
Initial commit on sprint/0-foundation branch — just merged.
Stories complete: 0 of 5. Story points: 0 of 40. Week 1 of 2.

We’re behind a hypothetical “all green” track, and that’s fine. Sprint 0 is supposed to be the messy one.

What’s Already Going Wrong (Honest List)

Four real failures in Week 1, all instructive:

Agent timeouts. The FE-Lead and DevOps agents both timed out during long delegate_task calls. Long-running tool calls (project init, dependency install) push past the orchestrator’s timeout budget. Fix in flight: split kickoff tasks into smaller chunks; never let one agent own both “init the project” and “configure CI” in one call.
Empty tasks/ directory = idle agents. Ralph Loop watches tasks/ for work. With no task files written, every agent sat at 🟡 Idle for the first hour of the sprint. Obvious in hindsight, invisible in the moment. We now seed tasks/ as part of Sprint kickoff, not as a post-kickoff step.
Missing initial commit blocked everything. The sprint branch was created without an initial commit. Downstream agents couldn’t open PRs, couldn’t run CI, couldn’t push. A 30-second human action that nobody had explicitly assigned. Now it’s in the kickoff checklist.
Typosquat false positive. Our security scanner flagged npx nuxt prepare as a typosquatting risk because nuxt is edit-distance-1 from next. A real, blocking false positive on a legitimate command. We had to allowlist nuxt explicitly. Worth flagging if you’re running any package-reputation tool in an agent pipeline.

These are the kinds of failures you only learn about by actually running this in production. None of them are model failures — they’re orchestration, process, and tooling failures. The agents do what they’re told. The work is in telling them precisely.

Where the 30% Human Effort Lives

The 30% is senior engineers, not juniors-with-LLMs. That’s what makes the model work — agents amplify experienced judgment, they don’t substitute for it. Honest accounting of what those humans still do:

Judgment calls. “Should this be one screen or two?” “Is the RBAC boundary at the org level or the department level?” The UX and Architect agents propose; we decide.
Stakeholder conversations. The PO agent drafts agendas and summaries. A live call with a hospital admin about shift handover is still ours.
Compliance and consent. Anything touching PII, audit retention, or data subject rights gets a human approver in the loop before merge. Non-negotiable for Indian DPDPA and parallel global regimes.
Orchestration design. Every time an agent times out, an agent sits idle, or a task file gets dropped wrong — we fix the protocol, not the agent. The 30% is mostly here.
Code review on critical paths. Auth, RBAC, payment flows, and audit logging get human eyes regardless of what the Architect agent signed off on.

Why the Hospital Build Is the Proof

We picked a Hospital Management System as the case study because it’s the most demanding thing we have on the floor — RBAC at every layer, audit logs, multi-tenant isolation, integrations with pharmacy and lab systems, real Indian DPDPA compliance pressure. If the 70/30 agent-team model holds up here, it holds up almost anywhere we ship for a client.

The principle generalizes: specialized agents with narrow prompts, structured Agile ceremonies, explicit escalation, and humans on the judgment calls ships dramatically faster than either a lone agent or a traditional human-only team — and without the headcount tax of either. You get the speed of automation and the structure of Scrum, with humans kept exactly where humans are still better. That’s the delivery curve worth caring about.

What’s Next

We’ll keep publishing what we learn — including the failures. The follow-up post covers what broke as the model met real client work: the consistency drift between agents, the context-poisoning hazard, and the structural pivot we made to fix both. Read it here.

At OLabs, we deliver production software faster with agent-augmented engineering teams — without trading away compliance, quality, or the audit trail. Indian-market expertise, healthcare-grade rigor, 70/30 split. Talk to us to see what this looks like for your product.