📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent Google whitepaper reveals that in AI-assisted software development, the model itself accounts for only about 10% of system behavior. The key to success lies in the harness and context engineering, which constitute the remaining 90%. This shifts focus from model improvements to configuration, verification, and strategic design.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the AI model is only about 10% of the system in modern AI-driven development. The core message is that the harness and context engineering—not the model itself—drive most of the system’s behavior and performance. This insight challenges common assumptions and has significant implications for how organizations invest in AI tools.

The whitepaper highlights that, contrary to popular belief, the model’s capabilities are only a small part of what determines an AI system’s success. Instead, configuration, tooling, and context management—collectively called the harness—are responsible for approximately 90% of the system’s behavior. This includes prompts, rule sets, tools, and observability mechanisms that shape how the model functions within a larger framework.

Evidence cited in the paper shows that modifications to the harness, such as changing prompts or adding tools, can dramatically improve performance. For instance, a team improved a coding agent’s ranking from outside the top 30 to within the top 5 by adjusting only the harness, with no change to the underlying model. Similarly, tweaking middleware increased an agent’s benchmark score by nearly 14 points.

The authors argue that the focus should shift from chasing newer, larger models to developing and owning the harness infrastructure—since this is where the real control and competitive advantage lie. They also emphasize that most failures in AI agents are due to configuration errors, missing tools, or poor context management, not the model’s inherent limitations.

At a glance
reportWhen: published early 2026
The developmentGoogle’s new whitepaper on the SDLC emphasizes that AI models are only a small component, with most of the system’s effectiveness coming from the harness and context management.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development Strategies

This shift in understanding has major implications for organizations deploying AI. Instead of investing heavily in acquiring or developing the latest models, companies should prioritize building robust harnesses—tools, prompts, and verification processes—that shape and control AI behavior. This approach can lead to significant cost savings and improved reliability, as the harness is more manageable and customizable than constantly chasing model improvements.

Furthermore, recognizing that most failures are configuration-related underscores the importance of expertise in context engineering and system design. This reorientation could democratize AI development, making it accessible to teams that focus on system architecture rather than solely on model training or fine-tuning.

Overall, this perspective encourages a more strategic, system-oriented approach to AI, emphasizing control, verification, and cost-efficiency over raw model power.

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on AI System Design and Recent Findings

The common narrative in AI development has been that larger, more sophisticated models are the key to better performance. However, the recent Google whitepaper challenges this by presenting evidence that the model itself accounts for only about 10% of the overall system behavior.

This insight aligns with ongoing industry observations: many AI failures stem from poor configuration, missing tools, or inadequate context management. The paper builds on earlier discussions about the importance of system design, testing, and verification—highlighting that these aspects are often overlooked in favor of model size and complexity.

Prior to this, the industry has seen rapid model improvements, but practical deployment issues persisted, suggesting that the bottleneck is less about the model and more about how it is integrated and controlled within systems. The whitepaper formalizes this understanding and provides concrete examples demonstrating the outsized influence of harness design.

“The model is only 10% of what determines AI system behavior; the harness and context are the other 90%.”

— Addy Osmani

Amazon

AI system configuration software

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Remaining Uncertainties in AI System Optimization

While the whitepaper provides compelling evidence that harness and context are dominant, it does not specify precise metrics for how much performance improvements can be achieved solely through system configuration. It remains unclear how these principles scale across different domains or with future model advancements. Additionally, the long-term impact of this shift on AI development costs and organizational structures is still being evaluated.

Amazon

AI observability and monitoring tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Development and Organizational Adoption

Organizations are likely to reevaluate their AI investment strategies, focusing more on developing sophisticated harnesses, verification processes, and system architecture. Industry leaders may prioritize training in system design and context engineering. Further research and case studies are expected to validate and refine these insights, potentially leading to new standards in AI deployment practices.

Additionally, tool vendors and AI platform providers may offer more configurable frameworks, emphasizing harness components and verification tools to capitalize on this paradigm shift.

Amazon

prompt engineering tools for AI

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system?

The whitepaper shows that most of the AI system’s behavior depends on how the model is configured, controlled, and integrated—collectively called the harness—which includes prompts, tools, rules, and observability mechanisms.

How can organizations improve AI performance according to this new insight?

By focusing on building and refining the harness—such as better prompts, tools, verification, and context management—rather than solely investing in larger or newer models.

Does this mean smaller models are better?

Not necessarily. The insight is that the model’s size is less critical than how it is used and controlled within the system. Effective harness design can significantly enhance performance even with smaller models.

What skills should AI teams develop now?

Focus on system architecture, context engineering, verification, and tooling—skills that enable effective harness design and management.

Will this change how AI products are built?

Yes. Emphasis will shift from model development to system configuration, testing, and verification, leading to more reliable and cost-effective AI solutions.

Source: ThorstenMeyerAI.com

This content is for general information only and is not financial, tax or legal advice. Consult a qualified professional for decisions about your money.
You May Also Like

Disk Is the Contract: Inside Threlmark’s Local-First Architecture

Threlmark treats local disk storage as the definitive data source, simplifying sync, enhancing offline use, and improving interoperability through a file-based approach.

Vertigo relief app

A new mobile app for vertigo relief is being tested to guide patients through repositioning maneuvers and track symptoms, targeting adults with BPPV.

The Quiet Audit: 55–75% of Your Week Is on Thin Ice. Here’s Which Part.

A new analysis reveals that 55–75% of knowledge workers’ weekly time is spent on low-impact tasks, with implications for productivity and AI automation.

7 Best Office Product Scanners for Prime Day Deals in 2026

Discover the best office scanners on Prime Day 2026, including top picks for shared offices, solo use, and portable needs. Find the perfect fit today.