High Output: The Future of Engineering

Why AI Productivity Gains Are Context-Dependent | With Raju Matta


Listen Later

Some engineering teams are seeing real, measurable AI productivity gains. Cursor is transforming how frontend developers build React apps. AI-assisted code review is catching bugs before deployment. Prototypes that took weeks now take days.

But not everyone’s seeing the same results.

Raju Matta runs engineering for Cambridge Mobile Telematics—200+ engineers, three countries, petabytes of real-time sensor data processing driver safety. Six months ago, he formed a tiger team to systematically track AI tool adoption. Status reports every two weeks. Multiple tools tested: Copilot, Cursor, PR review bots.

His finding? “I’ve not seen the measurable velocity increase that people are saying out in the market—but that doesn’t mean I have totally written off LLMs yet.”

This isn’t skepticism. It’s measured evaluation. And the pattern Raju’s seeing reveals something important about when AI tools deliver and when they don’t.

Where AI Tools Excel

As part of their evaluation, CMT ran an internal hackathon to see what AI tools could do in practice. The results told a clear story. Eighteen projects, all using AI. Teams built fully working web apps—complete with datasets—in 2-4 hours.

“For that purpose, it’s great. It’s not bad at all,” he says.

The pattern: AI coding tools work brilliantly for rapid prototyping with established patterns, web development using well-documented frameworks, mechanical coding tasks like boilerplate and test generation, and quick experiments to validate product ideas.

These are real productivity gains. The people claiming 2x-3x aren’t exaggerating—they’re working in contexts where AI capabilities align perfectly with task requirements. When your bottleneck is writing React components or generating CRUD endpoints, AI tools deliver measurable acceleration.

But CMT’s production systems are different.

The Complexity Multiplier

They’re processing petabytes of data from gyroscopes, accelerometers, GPS sensors, video streams. They’re distinguishing potholes from crashes, sharp corners from reckless driving. They’ve been using AI and machine learning for this work for 13 years—long before LLMs became everyone’s productivity obsession.

The engineering challenge isn’t writing code. It’s architecting systems that handle sensor fusion at scale, debugging why clusters fail under load, ensuring accuracy when lives depend on your classifications, and managing tech debt across distributed teams in six countries.

“You can outsource your engineering and coding with AI tools, but not your thinking,” Raju explains.

In complex production systems, the thinking is where the time goes. Code generation helps, but it’s not the bottleneck. The productivity multiplier drops from 3x to “incrementally helpful” because the constraint isn’t in the typing—it’s in the architectural decisions, the system design, the understanding of how everything fits together.

This doesn’t make AI tools useless. They still catch bugs in PRs. They still help prototype solutions. They still accelerate certain tasks. But the overall velocity gain is modest because code generation often isn’t the long pole.

The Tiger Team Approach

Here’s what makes Raju’s perspective valuable: he’s not guessing. Six months ago, CMT’s CTO gathered the engineering leaders. “How are you guys thinking of AI?” The response: treat it like a first-class citizen.

They formed a dedicated tiger team. Three people producing status reports every two weeks on tool adoption, usage patterns, and measurable impact. “We have about three or four tools that we are using all the way from PR review tools to tools like Copilot, Cursor.”

This is systematic evaluation, not anecdotal impressions. And the data shows results that differ from the market narrative: “My general experience is that it’s good, it’s doing its job, but I haven’t seen the measurable velocity increase as much as what people are saying out in the market.”

His peer conversations confirm the pattern isn’t unique to CMT: “Even other leaders and my peers that I speak with, who are working at big tech companies, have said similar things. So it’s not uncommon.”

But Raju’s not dismissing the technology. “The tools are progressing at a very fast pace. I wouldn’t be surprised if it’s another six months or a year where we get to exhaust more pieces of the tool and get more done.”

That “yet” matters. He’s still tracking, still evaluating, still expecting improvement.

When Mistakes Have Consequences

When Raju says “we have to save people’s lives,” he’s not being dramatic. CMT’s technology directly impacts driver safety. Their telematics platform processes sensor data to detect dangerous driving, assess risk, and potentially prevent accidents.

This creates a different bar for “move fast and break things.”

“We are a little bit more diligent because at the end of the day, we have to save people’s lives. So for us, we’d rather spend the time beforehand than reactively trying to address it.”

The stakes are high—both financially and ethically. When your technology directly impacts human safety, you can’t afford to ship fast and fix later.

The constraint isn’t just technical complexity—it’s consequence of failure. “AI tools can take you north, but with the same speed, they can take you south.”

In safety-critical systems, the review time, the testing time, the verification time doesn’t compress even if code generation does. You can’t ship and iterate rapidly when mistakes could harm people. The overall productivity gain shrinks accordingly because the non-coding portions of the development cycle remain unchanged.

This applies beyond telematics. Financial systems. Healthcare platforms. Infrastructure control. Any domain where errors have serious consequences faces the same limitation: AI can accelerate code generation, but it can’t compress the necessary validation and testing cycles.

Where AI Struggles

AI’s limitations show up in unexpected places. CMT uses AI to filter thousands of resumes for each job opening. The results? “50% makes sense. And 50% don’t make sense.”

This split illustrates a broader pattern. AI works brilliantly for well-defined, repeatable tasks. It struggles with judgment calls, context-dependent decisions, and situations requiring nuanced understanding.

The tool saves time on mechanical filtering. But the judgment about who’s actually right for the role? Still human. And critically, the humans can immediately spot when AI recommendations miss the mark—they don’t trust it blindly.

This mirrors the coding experience. AI generates boilerplate quickly. But understanding whether the generated code fits the broader system architecture, handles edge cases properly, and follows team conventions? That requires human judgment that doesn’t compress.

Where This Leaves Engineering Leaders

The mistake isn’t believing AI tools work—they demonstrably do in many contexts. The mistake is assuming your context will see the same gains as someone in a completely different situation.

Raju’s systematic evaluation reveals the variables that matter:

Your problem domain determines gains. Web apps and prototypes with established patterns can see significant productivity improvements. Complex distributed systems with unique requirements tend to see incremental improvements. The difference isn’t the tool quality—it’s how much of your bottleneck typically sits in code generation versus system design.

Your constraint defines the impact. If implementing features is your rate-limiting step, AI delivers massive value. If architectural decisions and system design are your constraint, AI helps less. Most production systems fall into the second category after the initial prototyping phase.

Your risk tolerance changes the math. If you can ship and iterate rapidly, AI accelerates that cycle. If mistakes have serious consequences, the review and testing time doesn’t compress proportionally. The overall velocity gain depends heavily on how much of your process can safely be accelerated.

Your system complexity matters. Greenfield projects with established patterns see huge gains. Legacy systems with unique constraints and interconnected dependencies see modest gains. The complexity of your codebase directly impacts how useful AI-generated code becomes.

The Honest Assessment

Raju isn’t claiming AI tools are overhyped. He’s providing the nuanced reality: they work extremely well for specific contexts and deliver modest improvements in others.

His 6-month tiger team experiment with dedicated tracking hasn’t found a productivity revolution. They’ve found incremental gains with clear constraints. That’s the honest number engineering leaders need for planning.

“LLMs can help us experiment and prototype features faster. They can help developers catch mistakes in our pull requests. They can help us find answers faster, and we are constantly evaluating,” he explains. “But I’ve not seen the impact that people are saying out there.”

This doesn’t mean ignore AI tools. It means understand your context, measure systematically, and set realistic expectations.

For rapid prototyping and web development? The 2-3x gains are real. For complex production systems with safety requirements? The gains exist but are much more modest. Both can be true simultaneously—the difference is context.

What This Means for You

First, measure systematically rather than relying on anecdotes. Set up dedicated tracking like Raju’s tiger team—assign ownership, establish regular reporting, and gather actual usage data. The hype cycle around AI tools means everyone has an opinion, but data reveals what actually works in your specific context.

Second, understand where your bottleneck actually sits. If architectural decisions and system design consume most of your time, AI tools will help less than if code generation is your constraint. Be honest about what’s actually slowing you down before expecting AI to solve it.

Third, adjust expectations based on risk profile. If your domain allows rapid iteration and tolerable failure rates, AI tools can deliver significant acceleration. If mistakes have serious consequences, the non-compressible validation cycles will limit overall gains regardless of how fast code gets generated.

Fourth, keep evaluating as tools improve. Raju expects capabilities to expand significantly over the next 6-12 months. Today’s limitations may not be tomorrow’s. But base your current planning on current capabilities, not projected future states.

The question every engineering leader should ask: What’s actually constraining my team’s velocity—code generation or everything else? Because if it’s everything else, AI coding tools will help incrementally, not transformationally. And that’s okay—incremental gains compound over time.

Raju’s measured approach provides the reality check the market needs. AI tools deliver real value, but the magnitude depends entirely on your specific context. Understanding that context is how you set realistic expectations and make smart adoption decisions.

High Output is brought to you by Maestro AI. Raju talked about forming a tiger team to systematically track AI tool adoption with biweekly status reports—but that measurement challenge extends beyond just AI tools. When your 200+ person engineering team is distributed across four countries and multiple tools, it becomes impossible to see what’s actually happening without systematic tracking. Maestro cuts through that complexity with automated reporting and metrics and show where' your team’s time and energy actually go, so you can spot patterns and make data-driven decisions about everything from AI adoption to resource allocation.

Visit https://getmaestro.ai to see how we help engineering leaders get actually useful insights into their teams.

Running systematic evaluations of new tools and processes? We’d love to hear your approach. Schedule a chat with our team → https://getmaestro.ai/book



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com
...more
View all episodesView all episodes
Download on the App Store

High Output: The Future of EngineeringBy Maestro AI