54/100 · Developing · Frontal Lobe (anterior)

Prefrontal Cortex

AI maturityDeveloping54/100
absembpridevmatsup

Prefrontal Cortex

From the brain to the algorithm. The prefrontal cortex holds the world in mind while you work on it. Its AI counterpart — reasoning-trained models with extended thinking and agentic computer use — is the fastest-moving frontier in artificial intelligence in 2026.

What the biology does

The prefrontal cortex sits at the front of the frontal lobe and is the last part of the brain to mature in development. It is responsible for working memory, planning, abstract reasoning, executive control, metacognition and theory of mind. Three sub-regions specialise:

  • Dorsolateral PFC — working memory, planning, problem decomposition.
  • Ventromedial PFC — value-based decision-making, emotional regulation.
  • Orbitofrontal cortex — context-sensitive inhibition, reward expectation.

It integrates context across long time-spans, inhibits prepotent responses and rewrites plans on the fly. Damage produces the classic Phineas Gage profile: intact perception and memory, devastated judgement.

What we have built

Inference-time reasoning is the defining AI research arc of 2024–2026. The full sequence:

  • January 2022 — Chain-of-Thought prompting. Wei et al. show that reasoning scales with token budget.
  • July 2024 — DeepMind AlphaProof + AlphaGeometry 2. Silver-medal score at IMO 2024 (28/42).
  • September 2024 — OpenAI o1 preview. First model RL-trained on chain-of-thought; full release December.
  • October 2024 — Anthropic Computer Use. Claude reads screenshots and synthesises mouse and keyboard input — Computer Use is the canonical computer-use demo.
  • December 2024 — o3 announcement, Project Mariner preview.
  • January 2025 — DeepSeek-R1 and Operator. Open-source reasoning via GRPO + RLVR; OpenAI launches Operator as the first commercial general-purpose agent.
  • April 2025 — o3 GA. OpenAI promotes o3 and o4-mini to general availability alongside reasoning-aware tool use.
  • July 2025 — Gemini 2.5 Deep Think + ChatGPT agent. Deep Think wins IMO 2025 gold (35/42) using parallel-thinking RL; ChatGPT agent supersedes Operator.
  • February 2026 — Gemini 3.1 Pro. 2× reasoning boost; #1 on twelve of eighteen benchmarks.
  • Q1 2026 — ARC-AGI-2 surge. Claude Opus jumps 8.6% (May 2025) → 68.8% (Feb 2026); o3/GPT-5.4 jumps 6.5% (Apr 2025) → 73.3% (Mar 2026) — see the live ARC Prize leaderboard.
  • May 2026 — Project Mariner sunset. Google retires the standalone Mariner brand and folds the capability into Gemini Agent and AI Mode.
  • May 2026 — GPT-5.5 leads ARC-AGI-2 at 85% — humans average 66%. First benchmark where AI majority-solves visual abstraction.

The conceptual leap was articulated in the Tree of Thoughts paper a year before o1 shipped:

"ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. […] In Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%." — Yao et al., 2023 (arXiv:2305.10601)

Every commercial reasoning model since 2024 — o-series, Claude Extended Thinking, Deep Think, Gemini 3.1 Pro — is a production-engineered descendant of that idea, with reinforcement learning replacing prompt engineering. The International AI Safety Report 2026 tracks the safety implications; the Stanford AI Index 2026 tracks the capability gains.

What is still missing

The bar for "reasoning" has moved fast, but the gap to a working prefrontal cortex remains wide.

  1. Calibration. Reasoning models score high on benchmarks while being overconfident on items they get wrong. They do not know what they do not know.
  2. Novel insight. Performance on Humanity's Last Exam and OpenAI's FrontierMath remains 40–50% — far below expert humans. Models excel at solved-class problems and stall on genuinely new ones.
  3. Long-horizon autonomy. Agentic tasks lasting more than an hour still fail silently, loop, or quietly drop their goal. The prefrontal cortex does not.
  4. Cost. Frontier reasoning scales with tokens; a deep-think run costs an order of magnitude more than fast inference.

How we read the verdict

We rate the AI counterpart Developing. The rate of improvement is the highest of any region in this atlas, but the absolute capability still trails educated humans on the open-ended problems that matter. Prefrontal-equivalent AI is plausibly the most important question of the decade.

Concrete examples

  • Gemini 2.5 Deep ThinkWon IMO 2025 gold (35/42) using parallel thinking — explores multiple solution paths before committing.
  • ARC-AGI-2 leaderboardGPT-5.5 leads at 85%, humans average 66% — first benchmark where AI majority-solves visual abstraction.
  • Anthropic Computer UseClaude reads a screenshot, decides what to click, and moves the mouse — the canonical computer-use demo.

Milestones

  • Jan 2022Chain-of-Thought prompting (Wei et al.) reveals reasoning scales with token budget
  • Jul 2024DeepMind AlphaProof + AlphaGeometry 2 — silver-medal score at IMO 2024 (28/42)
  • Sep 2024OpenAI o1 preview — first model RL-trained on chain-of-thought; full release Dec 2024
  • Oct 2024Anthropic Computer Use — Claude controls a desktop via screenshots + key/mouse
  • Dec 2024OpenAI announces o3 (frontier reasoning); Google previews Project Mariner
  • Jan 2025DeepSeek-R1 open-sources reasoning via GRPO + RLVR; OpenAI Operator launches
  • Apr 2025OpenAI o3 reaches general availability alongside o4-mini
  • Jul 2025Gemini 2.5 Deep Think — gold-medal at IMO 2025 (35/42); ChatGPT agent supersedes Operator
  • Feb 2026Gemini 3.1 Pro — 2× reasoning boost; #1 on 12/18 benchmarks
  • Q1 2026ARC-AGI-2 surge: Claude Opus 8.6% (May 25) → 68.8% (Feb 26); o3/GPT-5.4 6.5% (Apr 25) → 73.3% (Mar 26)
  • May 2026Project Mariner sunset → folded into Gemini Agent / AI Mode
  • May 2026GPT-5.5 leads ARC-AGI-2 at 85% — humans average 66%

Sources

Related Wikipedia entries

Other regions