#2025

Research

DCA-Bench: A Benchmark for Dataset Curation Agents
Benhao Huang, 
Yingzhuo Yu, 
Jin Huang, 
Xingjian Zhang, 
Jiaqi W. Ma
KDD-2025 DB Track (Oral)
#LLM Agent
#Benchmark
#2025

A benchmark exploring the performance of LLM Agents on detecting issues in datasets hosted on popular platforms.

paper (to be updated soon)
code
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
Qiyue Gao, 
Xinyu Pi, 
Kevin Liu, 
Junrong Chen, 
Ruolan Yang, 
Xinqi Huang, 
Xinyu Fang, 
Lu Sun, 
Gautham Kishore, 
Bo Ai, 
Stone Tao, 
Mengyang Liu, 
Jiaxi Yang, 
Chao-Jung Lai, 
Chuanyang Jin, 
Jiannan Xiang, 
Benhao Huang, 
Zeming Chen, 
David Danks, 
Hao Su
ICLR 2025 Workshop World Models / ACL 2025 Findings
#World Model
#Benchmark
#2025

Internal world models (WMs) enable agents to understand the world's state and predict transitions, serving as the basis for advanced deliberative reasoning. Recent large Vision-Language Models (VLMs), ...

paper