82/100 · Mature · Inferior frontal & posterior superior temporal gyrus

Broca & Wernicke

AI maturityMature82/100
absembpridevmatsup

Broca & Wernicke Areas

From the brain to the algorithm. Two cortical regions — one for production, one for comprehension — implement the linguistic interface of human cognition. The Transformer is their closest engineered analogue, and on most measurable benchmarks it now matches educated humans.

What the biology does

Broca's area, in the inferior frontal gyrus of the left hemisphere (Brodmann areas 44 and 45), is the production engine: it sequences syntactically structured speech and motor commands for articulation. Damage produces telegraphic, agrammatic output. Wernicke's area, in the posterior superior temporal gyrus (Brodmann area 22), is the comprehension counterpart: fluent but meaningless output emerges when it is lesioned. The two are wired together by the arcuate fasciculus and to motor and auditory cortex by short white-matter loops.

Together they implement bidirectional symbol-grounding: input speech becomes meaning; meaning becomes output speech. The whole system runs at ~10 W and learns continuously.

What we have built

Modern large language models stand on a single architectural pillar, the Transformer. Twelve milestones, nearly nine years, one architecture.

  • June 2017 — Attention Is All You Need. Vaswani et al. introduce the Transformer, replacing recurrence with self-attention.
  • May 2020 — GPT-3. OpenAI demonstrates emergent few-shot prompting at 175B parameters — the moment "scale" became a research strategy.
  • January 2022 — InstructGPT. RLHF brings instruction-following to production.
  • November 2022 — ChatGPT. Crosses 100 million users in two months; RLHF goes mainstream.
  • March 2023 — GPT-4. Expert-level performance on the bar exam, USMLE and parts of the IMO.
  • February 2023 — Meta Llama 1. Launches the open-weight era for frontier-class models.
  • 2024 — Open-weights catch up. Llama 3, DeepSeek-V3 and Qwen3 close most open-vs-closed gaps on reasoning and code.
  • August 2025 — OpenAI GPT-5. A single GPT-5 router auto-selects between fast and deliberative modes per query; ~45% less likely to hallucinate than GPT-4o.
  • November 2025 — Google Gemini 3. Preview rolls out across all Google products.
  • February 2026 — Gemini 3.1 Pro. 2× the reasoning of Gemini 3, 1M-token context, #1 on twelve of eighteen benchmarks.
  • April 2026 — Claude Opus 4.7. Anthropic ships Claude Opus 4.7 — 1M-token context, 128K output, 87.6% on SWE-bench Verified.
  • May 2026 — Claude Opus 4.8. Six weeks later, a Fast Mode and meaningful coding-agent gains land at the same price point.

The original paper made an unfussy promise that became the entire field's foundation:

"We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train." — Vaswani et al., 2017 (arXiv:1706.03762)

What every model in the list above has in common: the same self-attention block, stacked deeper, fed more tokens, post-trained harder. The LMArena leaderboard — millions of head-to-head human preference votes — and the Hugging Face Open LLM Leaderboard record the consequence: a benchmark surface so crowded that monthly model launches barely move the needle anymore.

What is still missing

Language is the function where AI is closest to humans, and so the gaps are sharper.

  1. Hallucination. Even retrieval-grounded answers carry a non-zero confabulation rate. Calibration is the open problem of 2026.
  2. Long-horizon coherence. Conversations spanning days or months still drift without an external memory store. The model does not remember last Tuesday.
  3. Pragmatic and social inference. Theory of mind, ironic intent, conversational repair under genuine misunderstanding — these are reliably failed by current LLMs on adversarial probes.
  4. Physical grounding. LLMs describe the world without ever having moved in it; embodied benchmarks remain a separate field.

How we read the verdict

We rate the AI counterpart Mature. Linguistic production is, in 2026, broadly solved at the surface level. The remaining gaps — grounding, calibration, memory, social inference — are no longer about producing fluent text; they are about meaning what you say.

Concrete examples

  • Claude Opus 4.7 — 1M contextReads a 750K-line codebase in one shot and self-verifies its edits; SWE-bench Verified 87.6%.
  • GPT-5 routerAuto-selects between fast and thinking modes per query; ~45% less likely to hallucinate than GPT-4o.
  • LMArena leaderboardPublic head-to-head ranking — millions of human preference votes track who's actually winning.

Milestones

  • Jun 2017Vaswani et al. — Attention Is All You Need
  • May 2020GPT-3 — emergent few-shot prompting at 175B parameters
  • Jan 2022InstructGPT — RLHF brings instruction-following to production
  • Nov 2022ChatGPT crosses 100M users in two months; RLHF goes mainstream
  • Mar 2023GPT-4 reaches expert-level on bar exam, USMLE and IMO problem subsets
  • Feb 2023Meta Llama 1 launches the open-weight era for frontier-class models
  • 2024Llama 3 / DeepSeek-V3 / Qwen close most open-vs-closed gaps
  • Aug 2025OpenAI ships GPT-5 — unified routing between fast and deliberative modes
  • Nov 2025Google Gemini 3 — preview rolls out across Google products
  • Feb 2026Gemini 3.1 Pro — 2× reasoning over Gemini 3, 1M context, top of 12/18 benchmarks
  • Apr 2026Anthropic ships Claude Opus 4.7 — 1M token context, 87.6% on SWE-bench Verified
  • May 2026Claude Opus 4.8 follows six weeks later — "Fast mode" and coding-agent gains

Sources

Related Wikipedia entries

Other regions