A benchmark exploring the performance of LLM Agents on detecting issues in datasets hosted on popular platforms.