Summary of "Yann LeCun on What Comes After LLMs"
Overview
Yann LeCun argues that the current AI boom—centered on large language models (LLMs)—is a major commercial success but an incorrect “endgame” for building human- or animal-like intelligence.
Why LLMs Aren’t the Route to Intelligence
LeCun says LLMs are useful and “great” for language manipulation and many products, but they do not model the world well enough to achieve intelligence or generally capable agent behavior.
He highlights a fundamental limitation:
- Token-by-token generation instead of:
- Predicting consequences of actions
- Planning via search/optimization
What Comes After LLMs: World Models + Planning
LeCun proposes that progress should shift toward:
- World models: systems that can anticipate what happens if an agent acts (i.e., predict the consequences of its own actions)
- Planning / agentic systems: using search/optimization to plan sequences of actions toward a goal
He emphasizes two key capabilities:
- Predicting consequences of actions
- Planning by search/optimization to achieve a goal
He contrasts this approach with primarily reactive, autoregressive behavior.
Critique of Vision-Language-Action (VAs)
LeCun criticizes “vision-language-action” models as a dead end for now, arguing that:
- they are unreliable
- they require too much data
- they tend to be brittle and task-specific
Why His Company Exists (Emmy/Amy Labs; AMI Labs)
LeCun says he launched a startup focused on advanced machine intelligence with the explicit goal of scaling world-model learning using JEPA-style (joint embedding predictive architecture) methods he pioneered at Meta.
He also claims Meta was no longer the right environment for this direction.
World-Model Learning Should Be Non-Generative Representation Learning (JEPA vs Pixel Generation)
A central point in his argument is that effective representation learning for images/videos often relies on non-generative/self-supervised objectives, whereas pixel-level generative prediction tends to fail or be inefficient.
What JEPA Does (Conceptually)
- Learn an encoder that maps one observation to a representation
- Use that to predict another representation from a different observation
- Avoid predicting raw pixels directly
Bottleneck: Representation Collapse
LeCun identifies representation collapse (learning trivial constant representations) as a key research bottleneck.
He connects anti-collapse approaches to:
- Contrastive learning
- Mutual-information ideas
- Regularization-based methods, including more recent approaches such as “SIGreg” (isotropic Gaussian regularization)
He cites promising early results from applying these ideas to world-model training.
Robotics / Automation Implications
LeCun acknowledges impressive robotics demos from generative approaches, but argues they are often:
- trained with large datasets
- dependent on imitation learning
- sometimes using reinforcement in simulation
He claims this makes them expensive and brittle, and argues world models could improve:
- generalization
- zero-shot or few-shot capability
- reduced dependence on task-specific training
Data-Efficiency vs Scaling Dynamics
LeCun argues industry competition increasingly rewards digging the “same trench” by scaling current methods with more compute and data instead of investing in more data-efficient approaches.
He concludes that robotics and generalization likely require a paradigm shift, not just more scaling.
Safety: Why He Is Particularly Bearish About LLM Reliability and Controllability
LeCun argues LLMs are intrinsically unsafe, mainly because:
- they can hallucinate and cannot reliably predict consequences
- even if made more “agentic,” their training does not guarantee they avoid dangerous actions
“Objective-Driven AI” Alternative
He proposes objective-driven AI:
- define a goal/cost function
- pair it with a world model
- integrate safety constraints into planning by construction
He notes failures can still occur if:
- the world model is wrong, or
- the cost/safety objectives are mis-specified
But he argues this is more controllable than relying on LLM prompting.
Healthcare Example
LeCun suggests LLMs may be limited to knowledge regurgitation, while major clinical breakthroughs require modeling dynamics—for example:
- designing interventions for complex biological systems
- using action-conditioned prediction to guide stem cells into specific cell types
He positions world models as central to enabling this kind of approach.
Background at Meta/FAIR and Departure
LeCun describes himself as a long-time leader at FAIR, helping build it and emphasizing FAIR’s openness and its research-to-practice pipeline.
He claims that in the last year or two, FAIR’s direction shifted away from the world-model research he believed was necessary.
He also clarifies misconceptions about his role relative to Alex and internal LLM strategy, stating:
- he had no technical contribution to LLaMA
- his role was mostly strategic/organizational rather than hands-on model building
Why His Views Diverged Around 2023
LeCun says he didn’t change his mind—others did.
He points to:
- GPT-4’s impact
- an influential argument (attributed to Jeff Hinton) suggesting GPT-4 might be near human-level and possibly linked to cortex/neuron scaling
LeCun rejects that claim and maintains that a different blueprint is required.
Timeline Claims (Presented Humorously, but with Conviction)
LeCun frames a forecast partly as a joke:
- the shift to world-model-style AI will be obvious by early 2027
- “world domination” is in about 5 years
He also acknowledges that this doesn’t guarantee fully ready solutions by then.
Presenters / Contributors
- Yann LeCun (guest; speaker)
- Jacob Efron (host; presenter)
Category
News and Commentary
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.