When GPT-4 first topped benchmark leaderboards, the AI community celebrated a new milestone in capability. Yet, people seemed to keep using base GPT-3.5 to triple check their emails for typos. When Claude 3 Opus launched as the most capable model, Wall Street soared at the promise of future productivity gains. But students everywhere seemed to stick with Claude’s cheaper, faster Sonnet to panic study before their tests.
Time and time again, we see that the “best” model on paper isn’t always the most widely adopted by everyday people, nor able to deliver the transformative value its benchmark performance would otherwise suggest.
This observation points to a fundamental tension in how we think about progress in AI. In 2019, Rich Sutton, the godfather of reinforcement learning, wrote “The Bitter Lesson.” He argued that general methods leveraging compute and data ultimately outperform approaches built on human comprehension and judgement. He was right, and modern scaling laws prove scale alone can yield incredible qualitative results. But, there’s another bitter lesson emerging: those same human qualities that lost to computation and data in developing such capable AI models are precisely what’s needed to drive lasting adoption and meaningful change in the real world.
We tend to think of foundation models as something new, but history offers surprising lessons about which foundational frameworks have actually stuck around. In broad terms, a foundation model is a coherent explanatory framework that guides how people understand the world and act within it. The Greek geometric cosmos, ancient religious doctrine, Galenic medicine: these were all the top foundation models of their time, offering systematic ways for entire societies to run inference on reality and supposedly make informed decisions.
Take the ancient Greeks and their view of the cosmos. They famously embraced geocentricism, the idea that the Earth sits at the center of the universe with planets and stars rotating around it, because the model made intuitive sense. Look up: the sun rises, crosses overhead, and sets. The stars wheel around us. Of course we’re at the center. Who would think otherwise?
Aristarchus of Samos, who around 270 BCE proposed heliocentricism, the idea that Earth and other planets orbit the sun. In modern terms, his model leveraged better data, used more compute and claimed top benchmark performance. Yet, it didn’t stick. For nearly 2,000 years, heliocentrism remained largely a footnote, a silly hypothesis overshadowed by the unquestionable geocentric foundation model, the GPT (Geocentric Planetary Theory) of its time.
Galenic medicine too, with its questionable practices of bloodletting and purging remained standard well into the 1800s, far after William Harvey’s 1628 discovery of blood circulation undermined its theoretical foundations. In fact, President George Washington died in 1799 from extreme bloodletting, decades after evidence began mounting against it.
In both cases, the more accurate foundation model built on better data and more rigorous methods, ultimately lost to those that were more comprehensible to everyday people.
Here’s how this relates to modern AI: we may have already crossed the capability threshold where further optimizing for accuracy matters less than optimizing for use and adoption.
Transformative technologies like the internet and iPhone haven’t fundamentally changed in their core functionality since their introduction. The internet still moves packets. The iPhone still makes calls and runs apps. What made them so essential to life today was relentless iteration on the application layer: the interfaces, the integrations and the thousands of small decisions that made them delightful and indispensable, even in the face of “better” alternatives.
Modern foundation models are already very good. GPT-5, Claude 4, Gemini 3: they can write coherently, reason thoughtfully, generate code and understand nuanced instructions. For most real world applications, the difference between two models that score, for example, 75% vs 80% on some obscure technical benchmark is imperceptible to everyday people. What they actually notice is much more fundamental: Is it reliable? Is it easy to use? Is it actually solving my problems?
This is the crucial shift we’re witnessing in AI: from the quantifiable compute and data race for top benchmark performance that delivered today’s impressively capable AI models, to the amorphous human centered mission for real, lasting impact to determine tomorrow’s winners.
We’re likely still in the early innings of AI development and adoption, much like the early dot-com era. Back then, “[insert service] but on the internet” filled pitch decks, and about 50% of those startups failed. Today, “[insert service] but with AI” fills YC batches, and over 95% of enterprise AI pilots have failed to deliver return on investment. As the internet did, AI may take years to develop and go from wrapping around existing processes to transforming entire domains in the way that AI evangelists promise on X.
The mode of thinking and research that got us to modern AI (Sutton’s insight that computation and data outperform human judgement in building capable models) answered a key question: how do we build capable foundation models? But this same mode of inquiry has led us to new questions we can’t answer in the same way: what makes a foundation model good for everyday people? How do we integrate AI in our lives that actually stick? These aren’t problems that more data reliably solves.
The distinction between building with AI and building for AI matters enormously. Foundation models are improving rapidly, but capability doesn’t automatically translate to real world value. In the case of the internet, success didn’t come from simply putting existing businesses online, but from fundamentally rethinking how value could be created in a connected world. The same will be true for AI. The most impactful applications of AI won’t be those that just wrap around what already exists, but from fundamentally rethinking how value could be created in a world where cognition is cheap, fast and accessible. What matters now is understanding people deeply, designing thoughtful human centered experiences, and iterating relentlessly.
Ben Gao ’25 M.S. ’25 studied math and Management Science & Engineering. He is a Bay Area native.