We introduce FSVideo, a fast speed transformer-based image-to-video (I2V) diffusion framework. We build our framework on the following key components: 1.) a new video autoencoder with highly-compressed latent space (64 × 64 × 4 spatial-temporal downsampling ratio), achieving competitive reconstruction quality; 2.) a diffusion transformer (DIT) architecture with a new layer memory design to enhance inter-layer information flow and context reuse within DIT, and 3.) a multi-resolution generation strategy via a few-step DIT upsampler to increase video fidelity. Our final model, which contains a 14B DIT base model and a 14B DIT upsampler, achieves competitive performance against other popular open-source models, while being an order of magnitude faster. We discuss our model design as well as training strategies in this report.
FSVideo generates 720x1280 5s videos in the DiT component on 2 H100 GPUs, which includes 60 NFE running at low resolution and 8 NFE running at high resolution. Base dit costs 14.2s, SR dit costs 4.6s, total dit inference time is 18.8s.
The reference images and videos used in this webpage are sourced from public domains or generated by AI, and are intended solely to demonstrate the capabilities of this research. If there are any concerns, please contact us and we will delete it in time.
Core contributors: Xinwei Huang, Minxuan Lin, Yaojie Shen, Xiao Yang*‡, Yuxin Zhang
Contributors: Qingyu Chen*, Zhiyuan Fang, Haibin Huang*, Tong Jin, Bo Liu, Celong Liu*, Chongyang Ma, Xing Mei*, Xiaohui Shen, Fuwen Tan, Angtian Wang, Yiding Yang, Jiamin Yuan, Lingxi Zhang
Contributor names are alphabetically listed by last name and then first name. Names with an asterisk (*) are people who have left the company. Names with a dagger (‡) are project leads.
If you find FSVideo useful in your research, please kindly cite our paper:
@misc{fsvideoteam2026fsvideofastspeedvideo,
title={FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space},
author={FSVideo Team and Qingyu Chen and Zhiyuan Fang and Haibin Huang and Xinwei Huang and Tong Jin and Minxuan Lin and Bo Liu and Celong Liu and Chongyang Ma and Xing Mei and Xiaohui Shen and Yaojie Shen and Fuwen Tan and Angtian Wang and Xiao Yang and Yiding Yang and Jiamin Yuan and Lingxi Zhang and Yuxin Zhang},
year={2026},
eprint={2602.02092},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.02092},
}