Paper Reading | Latent Thought Models
LTM turns extra inference-time compute into explicit per-example posterior optimization over latent thought vectors, rather than more token-space decoding.
LTM turns extra inference-time compute into explicit per-example posterior optimization over latent thought vectors, rather than more token-space decoding.
We know that deeper models generally have larger capacity, but what is the mechanism behind it? Given that the independent teacher regime is better aligned with the empirical signatures of real LLMs, ensemble averaging becomes the most plausible explanation for how real LLMs use depth.
Why popular LLM leaderboards can be gamed by structured outputs, how the cheating strategy works, and what this says about the reliability of automatic evaluation.