BPTT | Husky'Log

Loop models are becoming active again in reasoning and language modeling, with recent examples such as HRM, TRM, recurrent-depth latent reasoning, and Parcae. This post asks a simple question: what is the actual compute cost of looping? I analyze that question in an ablation chain over major loop-model variants, with special attention to how optimizer interval, gradient path, and storage policy change the FLOPs, NFE, and memory accounting.