Loop-Model FLOPs and Memory in an Ablation Chain

Sun, 19 Apr 2026 12:00:00 -0400

Introduction

Loop models are becoming popular lately, with exciting results [1,2,3,4,5] Less is More: Recursive Reasoning with Tiny Networks
A. Jolicoeur-Martineau, (2025)
LinkScaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
J. Geiping, S. McLeish, N. Jain, J. Kirchenbauer, S. Singh, B. Bartoldson, B. Kailkhura, A. Bhatele, T. Goldstein, (2025)
LinkParcae: Scaling Laws For Stable Looped Language Models
H. Prairie, Z. Novack, T. Berg-Kirkpatrick, D. Fu, (2026)
LinkScaling latent reasoning via looped language models
R. Zhu, Z. Wang, K. Hua, T. Zhang, Z. Li, H. Que, B. Wei, Z. Wen, F. Yin, H. Xing, others, (2025)
Hierarchical Reasoning Model
G. Wang, J. Li, Y. Sun, X. Chen, C. Liu, Y. Wu, M. Lu, Y. Yadkori, (2025)
Link . Once we decide to reuse the same block across multiple layers, however, one practical question becomes unavoidable: what does the loop cost during training?

Recursive Models on Husky'Log

Loop-Model FLOPs and Memory in an Ablation Chain

Introduction