Summary of "Yann LeCun on What Comes After LLMs"

Overview

Yann LeCun argues that the current AI boom—centered on large language models (LLMs)—is a major commercial success but an incorrect “endgame” for building human- or animal-like intelligence.

Why LLMs Aren’t the Route to Intelligence

LeCun says LLMs are useful and “great” for language manipulation and many products, but they do not model the world well enough to achieve intelligence or generally capable agent behavior.

He highlights a fundamental limitation:

Token-by-token generation instead of:
- Predicting consequences of actions
- Planning via search/optimization

What Comes After LLMs: World Models + Planning

LeCun proposes that progress should shift toward:

World models: systems that can anticipate what happens if an agent acts (i.e., predict the consequences of its own actions)
Planning / agentic systems: using search/optimization to plan sequences of actions toward a goal

He emphasizes two key capabilities:

Predicting consequences of actions
Planning by search/optimization to achieve a goal

He contrasts this approach with primarily reactive, autoregressive behavior.

Critique of Vision-Language-Action (VAs)

LeCun criticizes “vision-language-action” models as a dead end for now, arguing that:

they are unreliable
they require too much data
they tend to be brittle and task-specific

Why His Company Exists (Emmy/Amy Labs; AMI Labs)

LeCun says he launched a startup focused on advanced machine intelligence with the explicit goal of scaling world-model learning using JEPA-style (joint embedding predictive architecture) methods he pioneered at Meta.

He also claims Meta was no longer the right environment for this direction.

World-Model Learning Should Be Non-Generative Representation Learning (JEPA vs Pixel Generation)

A central point in his argument is that effective representation learning for images/videos often relies on non-generative/self-supervised objectives, whereas pixel-level generative prediction tends to fail or be inefficient.

What JEPA Does (Conceptually)

Learn an encoder that maps one observation to a representation
Use that to predict another representation from a different observation
Avoid predicting raw pixels directly

Bottleneck: Representation Collapse

LeCun identifies representation collapse (learning trivial constant representations) as a key research bottleneck.

He connects anti-collapse approaches to:

Contrastive learning
Mutual-information ideas
Regularization-based methods, including more recent approaches such as “SIGreg” (isotropic Gaussian regularization)

He cites promising early results from applying these ideas to world-model training.

Robotics / Automation Implications

LeCun acknowledges impressive robotics demos from generative approaches, but argues they are often:

trained with large datasets
dependent on imitation learning
sometimes using reinforcement in simulation

He claims this makes them expensive and brittle, and argues world models could improve:

generalization
zero-shot or few-shot capability
reduced dependence on task-specific training

Data-Efficiency vs Scaling Dynamics

LeCun argues industry competition increasingly rewards digging the “same trench” by scaling current methods with more compute and data instead of investing in more data-efficient approaches.

He concludes that robotics and generalization likely require a paradigm shift, not just more scaling.

Safety: Why He Is Particularly Bearish About LLM Reliability and Controllability

LeCun argues LLMs are intrinsically unsafe, mainly because:

they can hallucinate and cannot reliably predict consequences
even if made more “agentic,” their training does not guarantee they avoid dangerous actions

“Objective-Driven AI” Alternative

He proposes objective-driven AI:

define a goal/cost function
pair it with a world model
integrate safety constraints into planning by construction

He notes failures can still occur if:

the world model is wrong, or
the cost/safety objectives are mis-specified

But he argues this is more controllable than relying on LLM prompting.

Healthcare Example

LeCun suggests LLMs may be limited to knowledge regurgitation, while major clinical breakthroughs require modeling dynamics—for example:

designing interventions for complex biological systems
using action-conditioned prediction to guide stem cells into specific cell types

He positions world models as central to enabling this kind of approach.

Background at Meta/FAIR and Departure

LeCun describes himself as a long-time leader at FAIR, helping build it and emphasizing FAIR’s openness and its research-to-practice pipeline.

He claims that in the last year or two, FAIR’s direction shifted away from the world-model research he believed was necessary.

He also clarifies misconceptions about his role relative to Alex and internal LLM strategy, stating:

he had no technical contribution to LLaMA
his role was mostly strategic/organizational rather than hands-on model building

Why His Views Diverged Around 2023

LeCun says he didn’t change his mind—others did.

He points to:

GPT-4’s impact
an influential argument (attributed to Jeff Hinton) suggesting GPT-4 might be near human-level and possibly linked to cortex/neuron scaling

LeCun rejects that claim and maintains that a different blueprint is required.

Timeline Claims (Presented Humorously, but with Conviction)

LeCun frames a forecast partly as a joke:

the shift to world-model-style AI will be obvious by early 2027
“world domination” is in about 5 years

He also acknowledges that this doesn’t guarantee fully ready solutions by then.

Presenters / Contributors

Yann LeCun (guest; speaker)
Jacob Efron (host; presenter)

Share this summary

Is the summary off?