13 Comments
User's avatar
Penelope Lawrence's avatar

The Man U thought experiment is a great framing, but I think it reveals something subtle about what "computing" means here. When we imagine a stadium full of fans, we're not actually simulating thousands of people - we're generating a vaguely plausible impression with almost zero detail. The old couple, the kid jumping, the flag - those are narrative patches, not simulation. So the real question becomes whether World Models need to actually compute the physics, or just learn to generate approximations that are "good enough" the way human imagination does. Because if it's the latter, the breakthrough isn't computing the uncomputable - it's learning which parts you can safely skip.

Jeroen Moons's avatar

You very well articulated a point I had floating vaguely in the back of my mind while reading that section, but could not quite put my finger on. Thanks!

Dorian's avatar

This feels like the transition from reasoning about the world to simulating the world.

LLMs compress knowledge.

World models compress reality.

Once AI can model action → consequence loops, the real unlock isn’t better chat.

It’s better decision infrastructure.

That’s when AI becomes less of an interface layer and more of a control layer.

Barry Winata's avatar

well written, deeply comprehensive and insightful.

more folks need to know more about World Models.

Iggy Fanlo's avatar

Expanding the mind every time

:-)

Elia De Leo's avatar

The framing here that resonates most with me: World Models aren't trying to simulate reality — they're trying to compress it well enough to make useful predictions.

That distinction matters enormously for how we evaluate progress. A model that can predict "if I push this object off the edge, it falls" doesn't need to compute fluid dynamics from first principles. It needs a good enough abstraction that generalizes.

What I find underexplored in most discussions: the difference between a world model for *prediction* vs. one for *planning*. Prediction only needs to get the likely outcome right. Planning needs to get the counterfactual right — "what happens if I do X instead of Y?" That's a much harder compression problem, and arguably where current AI still falls short.

The economic analogy is interesting here: general equilibrium models are "wrong" in every detail but right enough to be useful. Maybe that's the bar for world models too.

Jason's avatar

Excellent read all around a couple thoughts I had reading it

1) there's a big gap between being a researcher and being a product visionary. Fei, LeCunn, and Mara are all excellent in their discipline but whether or not they know how to commercialize it remains to be seen.

2) Meta's struggles in this space likely deserve their own article. How did the company that seemingly recognized where the world was heading, basically only end up being able to create 3d virtual zoom?

3) I still view the most interesting world model concept right now as the collaboration between Hanawha Aerospace and Krafton trying to utilize PUGB underlying tech to build a world model to speed up the development of aerospace tech.

James Borden's avatar

Wow, Fei-Fei Li is in on this. I think I saw something similar in a Mitko Vasilev post where the model is given only a short context window as part of the prompt and then writes code to analyze the rest of the prompt.

dremnik's avatar

I am fascinated by the idea of world models, and I think in time they will become foundational to robotics and embodied intelligence.

I do think that LLMs and other forms of symbol / concept models will take us very far however, and I suspect the LLM pessimism is a bit overstated. If you consider the most important aspects of our reality, much of what we do is already modeled in our heads in language and concepts. We live in an increasingly abstract world. We trade money - a concept; we talk about ideas in language - i.e. this very essay itself; we plan our lives in language, we converse with each other at the coffee shop, we learn from our mentors, and so on and so on.

"The Control Revolution" discusses the rise of Information Society as the central development of the past century or two. You can see the transition from an agricultural society where 90+% of people worked as farmers, to one increasingly dominated by jobs that deal with information. If you consider, for example, the tasks that are required in running a business as a CEO, very little of those tasks actually require the physical knowledge which that CEO has acquired about the world. The knowledge of how to walk to a coffee shop is not very important relative to the conceptual understanding he has of his industry, his organization, etc. The real work is abstract, the superficial work is physical.

So my prediction is that LLMs / symbol models will likely take us to some meaningful form of AGI, but world models will have a central place in robotics and unlocking the physical world to extend our conceptual power to mechanical work.

Don Draper's avatar

So good. This is the best thing I have read in weeks. It's both inspired and inspiring. Shared it with my 16 year-old as something we should both read and discuss. Reached out to you both (Packy and Pim) via LinkedIn to connect regarding TEDAI here in Vienna. Let's talk.

Colin Brown's avatar

I can't get the audio version to work. Please help!

Cane Phoenix's avatar

There are people that can't visualise at all, it's called Aphantasia. And it doesn't actually affect that much for most of them. It is still rather early, when it comes to research, because it have only been studied thoroughly, for about one decade.

There was a fairly recent test being done on two groups, one that was nerurotypical and one that had Aphantasia and it revealed that the Dual Coding Theory, probably isn't correct (at least not the way that was thought before).

Each group was given 3 types of representation of the same thing, namely pictures, symbols and text.

As expected, nerurotypical handled pictures and symbols equally well and text was worse.

When it came to people with Aphantasia, it was assumed that they would handle text best, since they can't visualise, but it turned out that they handled symbols best, then pictures and lastly text.

We don't know exactly why this is, but I guess that they are using the original symbolic representation (keep in mind that there was a time, when humans didn't have a language and they still was able to make enough sense of the world to survive).

What does this mean for AI? Basically that you need neither text nor video, but you might need symbolic representation, since it have existed for as long as we have had the ability to explore and understand the world.

Jonas Braadbaart's avatar

Nice! For the lowdown, I just published a simplified version of the same post :) https://metacircuits.substack.com/p/from-chatbots-to-world-models