Paper List

World Models & Video Reasoning

Rotary Position Embedding for Vision Transformer, 2024
- The application of RoPE in video transformers. Compared to the S/T Attention, it's something different.
Video Transformers: A Survey, 2022
Space or time for video classification transformers, 2023
- The concept of Space Attention and Temporal Attention is interesting.

VBench: Comprehensive Benchmark Suite for Video Generative Models | Code, 2023 (CVPR 2024)
- Comprehensive benchmark for video generation models.

RoFormer: Enhanced Transformer with Rotary Position Embedding, 2020
Base of RoPE Bounds Context Length, 2024
- Explore the influence of the base of RoPE on the context length of the model.
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding, 2024
- Creative idea to introduce a new dimension to the RoPE, which is called a chunk. Attention of attention.
LongEmbed: Extending Embedding Models for Long Context Retrieval, 2024 | Code
- A comprehensive experimental study on different methods to extend context window (e.g. Parallel Context Window, NTK, self-extend, Grouped Position & Reccurent Position, etc.). Also introduce a benchmark called LongEmbed.
CAPE: Context-Adaptive Positional Encoding for Length Extrapolation, 2024
- Exploring on using NNs to further enhance the additive PE methods.