Broca & Wernicke Areas
From the brain to the algorithm. Two cortical regions — one for production, one for comprehension — implement the linguistic interface of human cognition. The Transformer is their closest engineered analogue, and on most measurable benchmarks it now matches educated humans.
What the biology does
Broca's area, in the inferior frontal gyrus of the left hemisphere (Brodmann areas 44 and 45), is the production engine: it sequences syntactically structured speech and motor commands for articulation. Damage produces telegraphic, agrammatic output. Wernicke's area, in the posterior superior temporal gyrus (Brodmann area 22), is the comprehension counterpart: fluent but meaningless output emerges when it is lesioned. The two are wired together by the arcuate fasciculus and to motor and auditory cortex by short white-matter loops.
Together they implement bidirectional symbol-grounding: input speech becomes meaning; meaning becomes output speech. The whole system runs at ~10 W and learns continuously.
What we have built
Modern large language models stand on a single architectural pillar, the Transformer. Twelve milestones, nearly nine years, one architecture.
- June 2017 — Attention Is All You Need. Vaswani et al. introduce the Transformer, replacing recurrence with self-attention.
- May 2020 — GPT-3. OpenAI demonstrates emergent few-shot prompting at 175B parameters — the moment "scale" became a research strategy.
- January 2022 — InstructGPT. RLHF brings instruction-following to production.
- November 2022 — ChatGPT. Crosses 100 million users in two months; RLHF goes mainstream.
- March 2023 — GPT-4. Expert-level performance on the bar exam, USMLE and parts of the IMO.
- February 2023 — Meta Llama 1. Launches the open-weight era for frontier-class models.
- 2024 — Open-weights catch up. Llama 3, DeepSeek-V3 and Qwen3 close most open-vs-closed gaps on reasoning and code.
- August 2025 — OpenAI GPT-5. A single GPT-5 router auto-selects between fast and deliberative modes per query; ~45% less likely to hallucinate than GPT-4o.
- November 2025 — Google Gemini 3. Preview rolls out across all Google products.
- February 2026 — Gemini 3.1 Pro. 2× the reasoning of Gemini 3, 1M-token context, #1 on twelve of eighteen benchmarks.
- April 2026 — Claude Opus 4.7. Anthropic ships Claude Opus 4.7 — 1M-token context, 128K output, 87.6% on SWE-bench Verified.
- May 2026 — Claude Opus 4.8. Six weeks later, a Fast Mode and meaningful coding-agent gains land at the same price point.
The original paper made an unfussy promise that became the entire field's foundation:
"We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train." — Vaswani et al., 2017 (arXiv:1706.03762)
What every model in the list above has in common: the same self-attention block, stacked deeper, fed more tokens, post-trained harder. The LMArena leaderboard — millions of head-to-head human preference votes — and the Hugging Face Open LLM Leaderboard record the consequence: a benchmark surface so crowded that monthly model launches barely move the needle anymore.
What is still missing
Language is the function where AI is closest to humans, and so the gaps are sharper.
- Hallucination. Even retrieval-grounded answers carry a non-zero confabulation rate. Calibration is the open problem of 2026.
- Long-horizon coherence. Conversations spanning days or months still drift without an external memory store. The model does not remember last Tuesday.
- Pragmatic and social inference. Theory of mind, ironic intent, conversational repair under genuine misunderstanding — these are reliably failed by current LLMs on adversarial probes.
- Physical grounding. LLMs describe the world without ever having moved in it; embodied benchmarks remain a separate field.
How we read the verdict
We rate the AI counterpart Mature. Linguistic production is, in 2026, broadly solved at the surface level. The remaining gaps — grounding, calibration, memory, social inference — are no longer about producing fluent text; they are about meaning what you say.