Posts

Paper Reading | Reasoning with Power Sampling

This reading note focuses on sampling methods. I read three recent papers as a progression: Reasoning with Sampling uses direct Metropolis-Hastings on a sequence-level power distribution, Scalable Power Sampling introduces a rollout-based tokenwise approximation, and Power-SMC reframes the same target through sequential importance resampling with particles. The goal is to make the incremental improvements explicit.

Paper Reading | Latent Thought Models

LTM turns extra inference-time compute into explicit per-example posterior optimization over latent thought vectors, rather than more token-space decoding.

Paper Reading | Inverse Depth Scaling From Most Layers Being Similar

We know that deeper models generally have larger capacity, but what is the mechanism behind it? Given that the independent teacher regime is better aligned with the empirical signatures of real LLMs, ensemble averaging becomes the most plausible explanation for how real LLMs use depth.

A 1958 Physicist's Rules for Doing AI Research

An adapted version of Walther Bothe’s 1958 advice to young physicists, rewritten for modern AI research and archived from my original X article.

Reflections on My 2025 Fall Applications

A candid retrospective on my 2025 AI application cycle, including offers, mistakes, mentors, and the lessons I learned about research, planning, and building real connections.

How to Write a Good Letter of Recommendation?

A practical note on drafting stronger recommendation letters: be concrete, comparative, credible, and make it easier for busy professors to advocate for you.

Paper Reading | Cheating Popular LLM Benchmarks

Why popular LLM leaderboards can be gamed by structured outputs, how the cheating strategy works, and what this says about the reliability of automatic evaluation.