Video-R1 significantly outperforms previous models across most benchmarks. Notably, on VSI-Bench, which focuses on spatial reasoning in videos, Video-RB achieves a new state-of-the-art accuracy of %, surpassing GPT-4o, a proprietary model, while using only 32 frames and 7B parameters. This highlights the necessity of explicit reasoning capability in solving video tasks, and confirms the.