Meta Researchers Reveal Five Schools of World Models: What AI Are Yann LeCun and Fei-Fei Li Betting On?

ChainNewsAbmedia

After the Turing Award winner and former Meta AI Chief Scientist Yann LeCun founded the startup Advanced Machine Intelligence (AMI), which recently completed a $1.03 billion mega seed round, “World Model” has once again become a hot keyword in the AI field. However, even though the AI community frequently discusses world models, the concepts referred to by different researchers vary greatly.

(Deep Dive: Are LLMs Flawed? Why LeCun’s AMI is Betting on the World Model Approach)

Meta AI researcher Zhuokai Zhao recently posted a lengthy article stating that the so-called world models in AI can be divided into at least five different technical approaches. He believes these methods are not directly competing but are addressing different aspects of the problem.

JEPA: Compressing Physical Understanding

Spatial Intelligence: Reconstructing 3D Worlds

Learned Simulation: Training AI in Simulated Environments

NVIDIA Cosmos: Providing Infrastructure

Active Inference: Proposing a New Theory of Intelligence

He expects the boundaries between these approaches will soon become blurred.

Approach 1: LeCun’s JEPA, Understanding the World in Abstract Space

Zhao believes the first type of world model is Joint Embedding Predictive Architecture (JEPA), with Yann LeCun as a key figure.

The core idea of JEPA is: AI should not try to predict every pixel, but instead predict the future within an abstract representational space.

In the real world, many details are inherently unpredictable, such as lighting changes, the exact position of leaves, or surface textures. If the model must generate all pixels, it is forced to handle a vast amount of meaningless details.

JEPA’s approach is to first encode images or videos into an abstract representation, then predict the occluded parts within this space. This allows the model to learn concepts like “a ball falling off a table” without generating every frame.

Meta’s V-JEPA 2 is one of the most representative experimental results. The model was trained with 1 million hours of video data in a self-supervised manner, and then only 62 hours of robot data were enough to produce a world model supporting zero-shot planning. The robot generates candidate action sequences, inputs them into the world model, and selects the sequence whose predicted outcome best matches the target image. This method is effective even for objects and environments never seen during training.

This high data efficiency is a key reason why AMI is betting on the JEPA architecture. If your representations are good enough, you don’t need to brute-force every task from scratch. AMI Labs is LeCun’s effort to push this technology from research into practical applications. They initially target healthcare and robotics. But this is a long-term investment; their CEO has publicly stated that commercial products may still take several years to emerge.

Approach 2: Fei-Fei Li’s “Spatial Intelligence”

Another well-known approach comes from Fei-Fei Li’s startup, World Labs.

(Who is AI pioneer Fei-Fei Li? The new unicorn startup World Labs secures funding from NVIDIA, AMD)

Unlike JEPA’s “predict the future,” Li’s core question is: “What does the world look like in 3D space?” Her concept, called Spatial Intelligence, emphasizes that true understanding requires explicit spatial structures: geometry, depth, persistence, and the ability to re-observe scenes from new angles — not just temporal prediction. This differs from JEPA’s philosophy: instead of learning abstract dynamics, it learns a structured 3D representation of the environment that can be directly manipulated.

World Labs’ product Marble can generate persistent 3D worlds from images, text, or videos. Unlike traditional video generation models, Marble produces real 3D scenes. Users can freely move the viewpoint, modify objects, and output 3D models. This makes it closer to a 3D creation engine rather than a simple generative model.

Approach 3: DeepMind’s “Learned Simulation World”

The third type of world model is learned simulation.

Representative research includes:

DeepMind Genie 3

Dreamer series

Runway GWM-1

These models aim to build interactive simulated worlds where AI can learn through interaction.

Approach 4: NVIDIA’s Physical AI Infrastructure

The fourth approach isn’t about building models directly but creating an entire ecosystem platform. The leading company is NVIDIA, with its Cosmos platform providing comprehensive infrastructure:

Video data processing

Visual tokenizer

Model training

Deployment services

NVIDIA’s world foundation models (WFM) have been trained on 20 million hours of real-world videos, with a total token scale reaching 9 trillion.

(NVIDIA’s Alpamayo ecosystem enables AI to have reasoning capabilities in self-driving cars and explain decision-making processes)

NVIDIA’s strategy is clear: they don’t necessarily need to develop world models themselves but provide all the tools for others to build them.

Approach 5: Active Inference (Neuroscience School)

The final approach is rooted in neuroscience theory. The prominent figure is neuroscientist Karl Friston, who proposed the famous Free Energy Principle. Unlike traditional reinforcement learning, Active Inference views AI as a biological entity constantly trying to understand the world. It takes actions aimed at making its predictions about the environment more accurate, reducing the occurrence of “things not matching expectations.”

The company VERSES AI’s AXIOM system uses an object-oriented model, where each object is an independent entity. The system updates beliefs using Bayesian inference, without relying on deep neural network gradient training. This architecture offers interpretability, composability, and extremely high data efficiency. AXIOM released a commercial product (Genius) in April 2025. Its benchmark tests on standard control tasks are competitive with RL baselines, while using orders of magnitude less data.

The Next Battlefield for AI: Understanding the World

Zhao concludes that these five world model approaches are not mutually exclusive but instead address different problems:

JEPA: Compressing physical understanding

Spatial Intelligence: Reconstructing 3D worlds

Learned Simulation: Training AI in simulated environments

NVIDIA Cosmos: Providing infrastructure

Active Inference: Proposing a new theory of intelligence

As AI increasingly moves toward robotics, autonomous driving, and physical AI, these technologies are likely to rapidly converge in the future.

This article, “Meta Researchers Reveal the Five Major Schools of World Models: What AI Do LeCun and Li Fei-Fei Bet On?” originally appeared on ABMedia.

View Original
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Comment
0/400
No comments