Mask Tokens as Prophet: Fine-Grained Cache Eviction for Efficient dLLM Inference
Posted: Oct 10th 2025
MaskKV exploits mask-token attention signals to evict low-utility KV pairs in diffusion LLMs, shrinking cache budgets while preserving long-context accuracy and increasing throughput.