Disruptive Technology丨Comparing the Brain of Figure AI and World Labs' Spatial Intelligence Model

The "Cambrian Explosion" of evolutionary history, triggered by the emergence of sight, allowed organisms to perceive space and accelerate the evolution of intelligence. Today, AI is at a similar junction. While Large Language Models (LLMs) have mastered "speech," they remain "textual artisans working in the dark." The next frontier is moving AI from virtual text into the physical 3D world. Two companies stand as the primary benchmarks in this race toward General Purpose Artificial Intelligence (AGI): World Labs and Figure AI.
01 World Labs: Building the "3D Simulator" for AI
While LLMs understand human language, World Labs aims to teach AI how the physical world operates through Spatial Intelligence.
The Three Pillars: Founded by Dr. Fei-Fei Li, World Labs defines spatial intelligence through Imagination (creating coherent 3D environments), Agility (navigation), and Rigor (deducing physical relationships).
Technical Edge: Utilizing 3D Gaussian Splatting (3DGS) and the Spark 2.0 rendering engine, World Labs enables AI to "visualize" and interact with high-precision 3D scenes in real-time on any device, generating worlds that follow the laws of physics rather than just pixels.
02 Figure AI: The Real-Time Brain for "Physical Bodies"
If World Labs is building the "Matrix," Figure AI is crafting the "Agents" that inhabit it.
Vertical Integration: In early 2025, Figure AI shifted away from OpenAI’s cloud-based models to develop its own on-device brain. This was driven by the need for low-latency processing and tactile sensing that generic models cannot provide.
The Helix 02 System: This unified neural network merges perception, planning, and control. It processes visual and tactile feedback simultaneously, allowing robots to perform complex, fluid tasks—like tidying a room while walking—without disjointed logic shifts.
03 Deep Comparison: "World" vs. "Individual" Logic
The fundamental difference lies in their focus: World Labs asks "how is the world composed?" while Figure AI asks "how do I act within it?"
Data Essence: World Labs relies on geometric data and physical parameters to replicate reality; Figure AI relies on "Trajectory Data" to master motor sequences and force application.
Relationship with LLMs: World Labs uses LLMs as an interface to generate 3D worlds via text; Figure AI "embodies" the LLM, teaching it to understand torque and centers of gravity.
Predictive Nature: World Labs predicts future scene states (e.g., where a ball will land); Figure AI predicts the next action (e.g., where to place a hand to catch it).
04 Strategic Outlook: The Convergence of Pathways
The ultimate evolution of Embodied AI will likely be the merger of these two paths: Real-to-Sim-to-Real.
Perception (Real -> Sim): A robot enters a room; its "Spatial Brain" (World Labs) creates a high-fidelity 3DGS map from minimal visual samples.
Training (Sim -> Sim): The robot performs thousands of virtual reinforcement learning cycles within that simulated environment.
Execution (Sim -> Real): The optimized action model is deployed back to the physical "Body" (Figure AI) to complete the task.
Conclusion: A Modern Cambrian Explosion
Human intelligence evolved through physical interaction with the 3D world. World Labs is giving AI "eyes" to perceive depth, while Figure AI is providing the "limbs" to build muscle memory. The leap from "textual artisans" to "masters of the world" represents the most significant AI story of the next decade—the successful encoding of physical laws into digital weights and measures.




