Apple's reality check on AGI: For now, data reigns supreme for AI progress
Published on Jun 12, 2025 --- 0 min read
By Shalini Kurapati

Apple's reality check on AGI: For now, data reigns supreme for AI progress

Share this article

Apple's Reality Check on AGI: For Now, Data Reigns Supreme for AI Progress

Apple's recent publication, "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity", provides a timely and critical commentary on the capabilities of Large Reasoning Models (LRMs).

The Rise of LRMs

LRMs rose rapidly in popularity precisely because they offered the promise of structured reasoning, breaking down problems into intermediate logical steps rather than merely reproducing outputs from training data. This capability set them apart from Large Language Models (LLMs) whose performance is training data dependent, fueling optimism about the potential for achieving the so-called Artificial General Intelligence (AGI).

Apple's Reality Check

Apple's recent study however dampens expectations around LRMs' capabilities. Apple's researchers tested frontier LRMs including Claude-3.7-Sonnet, DeepSeek-R1, and OpenAI’s o3-mini, comparing them directly to their “non-thinking” counterparts (Claude-3.7-Sonnet without thinking and DeepSeek-V3).

Through controlled experiments involving puzzles like the Tower of Hanoi and River Crossing, the research reveals that while LRMs perform admirably on tasks of moderate complexity, their effectiveness diminishes sharply as problem complexity increases. Notably, beyond a certain threshold, these models not only fail to solve problems but also reduce their reasoning efforts, despite using significant computational resources.

The Data Dependency Problem

LRMs seem to be fundamentally limited by their reliance on training data, exhibiting sophisticated pattern matching rather than true flexible reasoning or logical deduction. This dependency causes consistent struggles with novel, complex scenarios not extensively represented in their datasets.

Their "reasoning" proves inconsistent across puzzle types, indicating a lack of generalized problem-solving. For example, models might execute numerous correct moves in the Tower of Hanoi but fail early in the River Crossing puzzle, suggesting performance hinges on prior data exposure, not robust skills.

Furthermore, LRMs exhibit surprising limitations in exact computation, the precise, step-by-step execution of logical operations. This means even when a precise solution algorithm for a puzzle like the Tower of Hanoi is provided, their performance still collapses at similar complexity levels when they must devise the solution themselves.

Looking Forward: The Future of AGI

While the Apple paper is a timely reality check and has sparked much-needed discussion on AI advancements, it does not diminish the significant strides made in AI. LRMs have indeed enhanced performance across various applications, from mathematical problem-solving to strategic planning. Their ability to handle more structured tasks marks a notable advancement over previous models.

LRMs do represent a significant leap forward, however their limitations remind us that data is still the moat. Achieving AGI will likely require a significant paradigm shift beyond data dependency for AI progress. For now, it looks like data reigns supreme.

Tags:

blogpost
Picture of Shalini Kurapati
Shalini Kurapati, PhD, is an expert in data governance, privacy, and responsible AI. As co-founder and CEO of Clearbox AI, she focuses on building transparent and compliant data solutions.