Prefrontal Cortex
From the brain to the algorithm. The prefrontal cortex holds the world in mind while you work on it. Its AI counterpart — reasoning-trained models with extended thinking and agentic computer use — is the fastest-moving frontier in artificial intelligence in 2026.
What the biology does
The prefrontal cortex sits at the front of the frontal lobe and is the last part of the brain to mature in development. It is responsible for working memory, planning, abstract reasoning, executive control, metacognition and theory of mind. Three sub-regions specialise:
- Dorsolateral PFC — working memory, planning, problem decomposition.
- Ventromedial PFC — value-based decision-making, emotional regulation.
- Orbitofrontal cortex — context-sensitive inhibition, reward expectation.
It integrates context across long time-spans, inhibits prepotent responses and rewrites plans on the fly. Damage produces the classic Phineas Gage profile: intact perception and memory, devastated judgement.
What we have built
Inference-time reasoning is the defining AI research arc of 2024–2026. The full sequence:
- January 2022 — Chain-of-Thought prompting. Wei et al. show that reasoning scales with token budget.
- July 2024 — DeepMind AlphaProof + AlphaGeometry 2. Silver-medal score at IMO 2024 (28/42).
- September 2024 — OpenAI o1 preview. First model RL-trained on chain-of-thought; full release December.
- October 2024 — Anthropic Computer Use. Claude reads screenshots and synthesises mouse and keyboard input — Computer Use is the canonical computer-use demo.
- December 2024 — o3 announcement, Project Mariner preview.
- January 2025 — DeepSeek-R1 and Operator. Open-source reasoning via GRPO + RLVR; OpenAI launches Operator as the first commercial general-purpose agent.
- April 2025 — o3 GA. OpenAI promotes o3 and o4-mini to general availability alongside reasoning-aware tool use.
- July 2025 — Gemini 2.5 Deep Think + ChatGPT agent. Deep Think wins IMO 2025 gold (35/42) using parallel-thinking RL; ChatGPT agent supersedes Operator.
- February 2026 — Gemini 3.1 Pro. 2× reasoning boost; #1 on twelve of eighteen benchmarks.
- Q1 2026 — ARC-AGI-2 surge. Claude Opus jumps 8.6% (May 2025) → 68.8% (Feb 2026); o3/GPT-5.4 jumps 6.5% (Apr 2025) → 73.3% (Mar 2026) — see the live ARC Prize leaderboard.
- May 2026 — Project Mariner sunset. Google retires the standalone Mariner brand and folds the capability into Gemini Agent and AI Mode.
- May 2026 — GPT-5.5 leads ARC-AGI-2 at 85% — humans average 66%. First benchmark where AI majority-solves visual abstraction.
The conceptual leap was articulated in the Tree of Thoughts paper a year before o1 shipped:
"ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. […] In Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%." — Yao et al., 2023 (arXiv:2305.10601)
Every commercial reasoning model since 2024 — o-series, Claude Extended Thinking, Deep Think, Gemini 3.1 Pro — is a production-engineered descendant of that idea, with reinforcement learning replacing prompt engineering. The International AI Safety Report 2026 tracks the safety implications; the Stanford AI Index 2026 tracks the capability gains.
What is still missing
The bar for "reasoning" has moved fast, but the gap to a working prefrontal cortex remains wide.
- Calibration. Reasoning models score high on benchmarks while being overconfident on items they get wrong. They do not know what they do not know.
- Novel insight. Performance on Humanity's Last Exam and OpenAI's FrontierMath remains 40–50% — far below expert humans. Models excel at solved-class problems and stall on genuinely new ones.
- Long-horizon autonomy. Agentic tasks lasting more than an hour still fail silently, loop, or quietly drop their goal. The prefrontal cortex does not.
- Cost. Frontier reasoning scales with tokens; a deep-think run costs an order of magnitude more than fast inference.
How we read the verdict
We rate the AI counterpart Developing. The rate of improvement is the highest of any region in this atlas, but the absolute capability still trails educated humans on the open-ended problems that matter. Prefrontal-equivalent AI is plausibly the most important question of the decade.