#World Model

Research

PAN: Towards General World Model with Natural Language Actions and Video States

Benhao Huang,

#Image to Video

A step towards a General World Model (GWM) that can simulate complex video scenarios with natural language actions.

paper (in progress)

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

Qiyue Gao,

Xinyu Pi,

Kevin Liu,

Junrong Chen,

Ruolan Yang,

Xinqi Huang,

Xinyu Fang,

Lu Sun,

Gautham Kishore,

Bo Ai,

Stone Tao,

Mengyang Liu,

Jiaxi Yang,

Chao-Jung Lai,

Chuanyang Jin,

Jiannan Xiang,

Benhao Huang,

Zeming Chen,

David Danks,

Hao Su

ICLR 2025 Workshop World Models / ACL 2025 Findings

Internal world models (WMs) enable agents to understand the world's state and predict transitions, serving as the basis for advanced deliberative reasoning. Recent large Vision-Language Models (VLMs), ...