
The rapid rise of large reasoning models (LRMs) has sparked an intense discussion in the AI community: can these systems genuinely think, or are they simply sophisticated pattern-matching machines?
The debate gained momentum after the publication of Apple’s research paper, The Illusion of Thinking, which argues that large reasoning models struggle with increasingly complex algorithmic tasks and therefore may not be engaging in true reasoning. According to the paper, when problem complexity grows beyond a certain threshold, these models often fail to continue executing structured procedures accurately, suggesting limitations in their reasoning capabilities.
However, this conclusion may not be as straightforward as it appears.
In This Content
Why Failure on Complex Tasks Doesn’t Prove the Absence of Thought
Consider a human being who understands the rules of the Tower of Hanoi puzzle. While most people can solve smaller versions of the puzzle, very few could successfully complete a version involving twenty or more discs entirely in their heads.
Does that inability mean humans are incapable of thinking?
Of course not.
It simply demonstrates that reasoning systems—whether biological or artificial—have limitations related to memory, attention, and computational capacity. Therefore, showing that an AI system fails at extremely large versions of a task does not automatically prove it lacks reasoning abilities.
Instead, it suggests that the system has practical constraints, much like humans do.
While this observation alone doesn’t prove that LRMs think, it weakens the argument that their limitations are evidence against genuine reasoning.
Defining Thinking Before Evaluating AI
Before determining whether AI systems can think, it is important to define what “thinking” actually means.
If we focus specifically on problem-solving, human cognition typically involves several interconnected processes:
1. Building a Mental Representation of the Problem
When humans encounter a challenge, the brain first constructs an internal model of the situation.
The prefrontal cortex plays a major role here, helping us maintain focus, organize information, and break complex tasks into manageable parts. Meanwhile, regions of the parietal cortex contribute to symbolic and mathematical reasoning.
This mental representation forms the foundation for all subsequent thinking.
2. Running Internal Simulations
Human reasoning often involves an internal dialogue.
Many people effectively “talk to themselves” when solving problems, evaluating possibilities step by step. This verbal reasoning process closely resembles the chain-of-thought (CoT) techniques used by modern AI systems.
Humans also employ visual imagination, mentally manipulating objects and spatial relationships to understand concepts and solve challenges.
3. Retrieving Knowledge and Recognizing Patterns
Reasoning rarely starts from scratch.
The brain constantly retrieves memories, concepts, and prior experiences to guide decision-making. The hippocampus and temporal lobes help access stored knowledge, allowing us to identify familiar structures and patterns.
Similarly, neural networks rely heavily on information acquired during training to interpret and respond to new situations.
4. Monitoring and Correcting Errors
Humans continuously evaluate their own reasoning.
The anterior cingulate cortex helps detect inconsistencies, mistakes, and dead ends. When we realize a strategy is failing, we adjust our approach.
This ongoing self-monitoring is a critical component of effective thinking.
5. Reframing Problems and Generating Insights
Sometimes solutions emerge only after stepping back.
During moments of reflection, the brain may reorganize information and uncover entirely new perspectives. These “aha” moments often occur when conventional approaches fail.
Learning-driven reasoning systems exhibit similar behaviors when they explore alternative paths and develop new strategies during training.
Comparing Human Thought and AI Reasoning
Modern reasoning models do not replicate every aspect of human cognition.
For instance, most LRMs lack robust visual imagination capabilities. They generally do not create detailed internal images while generating chain-of-thought explanations.
However, does the absence of visual reasoning mean they cannot think?
Not necessarily.
Some humans experience a condition known as aphantasia, which limits or eliminates mental imagery. Despite this, many individuals with aphantasia excel at mathematics, logic, and abstract reasoning.
This suggests that visual simulation is not a mandatory requirement for thought.
At a higher level, both human reasoning and AI reasoning appear to rely on several shared principles:
- Pattern recognition and knowledge retrieval
- Temporary storage of intermediate information
- Step-by-step evaluation of reasoning paths
- Error detection and correction
- Exploration of alternative solutions when progress stalls
These similarities make it difficult to dismiss AI reasoning as mere pattern matching.
Chain-of-Thought Reasoning Resembles Internal Dialogue
One of the strongest arguments supporting AI reasoning comes from chain-of-thought processes.
When humans solve difficult problems, they often verbalize their reasoning internally:
- “If this is true, then that must follow.”
- “That assumption doesn’t seem right.”
- “Let’s try another approach.”
Chain-of-thought prompting encourages AI systems to engage in a comparable process, generating intermediate reasoning steps before arriving at an answer.
In many cases, reasoning models even demonstrate forms of backtracking. When a line of reasoning appears unproductive, they may abandon it and pursue alternative paths.
Interestingly, this behavior was observed in experiments where models attempted increasingly complex puzzles. Rather than blindly executing procedures, some models appeared to recognize that direct approaches would exceed their effective working memory limits and began searching for alternative strategies.
Such behavior resembles problem-solving more than simple pattern retrieval.
Why Would a Next-Token Predictor Learn to Reason?
A common criticism of large language models is that they are “just predicting the next word.”
While technically true, this description can be misleading.
Predicting the next token accurately often requires extensive knowledge of the world.
For example, consider the incomplete sentence:
“The tallest mountain on Earth is Mount…”
To correctly predict “Everest,” the model must possess factual knowledge about geography.
Now imagine a far more complex scenario involving mathematics, logic, or multi-step planning. To generate the correct next token, the system may need to maintain intermediate calculations, evaluate relationships, and track dependencies across many reasoning steps.
In other words, successful next-token prediction frequently requires internal reasoning processes.
The model may only output one token at a time, but producing that token often depends on computations occurring beneath the surface.
Natural Language as a Powerful Knowledge Representation System
Another important consideration is the expressive power of language itself.
Formal systems such as symbolic logic are highly precise but often limited in scope. Natural language, by contrast, can represent facts, abstractions, emotions, hypothetical situations, and even discussions about language itself.
Because language can encode virtually any kind of information, a system trained to model language at scale inevitably absorbs vast amounts of world knowledge.
To accurately predict language, it must develop internal representations of:
- Facts
- Relationships
- Concepts
- Causal structures
- Problem-solving patterns
This does not guarantee consciousness or human-like understanding, but it does create conditions under which sophisticated reasoning can emerge.
Evidence From Reasoning Benchmarks
The most practical test of reasoning is performance on unfamiliar problems.
If a system consistently solves new challenges that require logic, deduction, planning, or mathematical reasoning, it becomes increasingly difficult to argue that no reasoning is taking place.
Open-source reasoning models have demonstrated strong performance across a variety of benchmark evaluations involving:
- Mathematical problem-solving
- Logical deduction
- Code generation
- Multi-step reasoning tasks
- Scientific question answering
While these models still make mistakes and often fall short of expert human performance, they frequently outperform average untrained individuals in specific reasoning domains.
Their success suggests that something more sophisticated than simple memorization is occurring.
The Bigger Picture
The question of whether AI truly “thinks” may ultimately depend on how thinking is defined.
If thinking requires consciousness, self-awareness, emotions, or subjective experience, today’s reasoning models likely do not qualify.
However, if thinking is understood as the ability to represent problems, manipulate information, evaluate alternatives, correct mistakes, and generate solutions, then modern reasoning models increasingly exhibit many of those characteristics.
Their limitations mirror some of the limitations found in human cognition: finite memory, imperfect reasoning, and occasional failure under extreme complexity.
Final Thoughts
Current large reasoning models are far from perfect, and many unanswered questions remain about the nature of intelligence in artificial systems.
Nevertheless, the evidence increasingly suggests that these models do more than simply regurgitate patterns. Their ability to perform multi-step reasoning, adapt strategies, retrieve relevant knowledge, and solve previously unseen problems points toward genuine computational reasoning.
Whether we ultimately label that process as “thinking” is partly a philosophical question. But from a functional perspective, large reasoning models appear to be exhibiting many of the core behaviors that humans associate with thought itself.
As research continues, the debate will undoubtedly evolve. For now, however, the claim that large reasoning models are incapable of thinking seems far less certain than some critics suggest.