I am currently a Senior Researcher at Visual Generation Group (a.k.a. Kling Team), Kuaishou Technology. I obtained my Ph.D. from MMLab of the Chinese University of Hong Kong, advised by Prof. Hongsheng Li. Prior to this, I obtained Bachelor of Engineer & Bachelor of Science degrees from Zhejiang University and Simon Fraser University, respectively. I'm interested in topics related to video generation. We are actively looking for research interns to work on cutting-edge research topics. Feel free to email me if you are interested.
Our VideoFlow, FlowFormer++, and FlowFormer occupy the TOP 3 places in the Sintel Optical Flow benchmark among published papers.
Two papers accepted to NeurIPS 2023.
One paper accepted to ICCV 2023.
One paper accepted to IROS 2023.
Two papers accepted to CVPR 2023.
One paper accepted to ECCV 2022.
Publication
* denotes first-author. † denotes corresponding author or project leader. Representative papers are highlighted.
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generationg
Qinghe Wang*,
Yawen Luo*,
Xiaoyu Shi†,
Xu Jia†,
Huchuan Lu,
Tianfan Xue†,
Xintao Wang,
Pengfei Wan,
Di Zhang,
Kun Gai
Siggraph, 2025
Paper
/
Project page
A 3D-aware and controllable text-to-video generation method allows users to manipulate objects and camera jointly in 3D space for high-quality cinematic video creation.
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Xiaoyu Shi,
Zhaoyang Huang,
Fu-Yun, Wang,
Weikang Bian,
Dasong Li,
Yi Zhang,
Manyuan Zhang,
Kachun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li Siggraph, 2024
Paper
/
Project page
/
Code
We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling.
Context-TAP: Tracking Any Point Demands Spatial Context Features
Weikang Bian*,
Zhaoyang Huang*,
Xiaoyu Shi,
Yitong Dong,
Yijin Li,
Hongsheng Li NeurIPS, 2023
Project page
/
Paper
We set new SOTA on the task of Tracking Any Point (TAP) by introducing rich spatial context features.
A Unified Conditional Framework for Diffusion-based Image Restoration
Yi Zhang,
Xiaoyu Shi,
Dasong Li,
Xiaogang Wang,
Hongsheng Li NeurIPS, 2023
Project page
/
Paper
/
Code
A unified conditional framework based on diffusion models for image restoration.
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation Xiaoyu Shi,
Zhaoyang Huang,
Weikang Bian,
Dasong Li,
Manyuan Zhang,
Kachun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li ICCV, 2023
Paper
/
Code
First method to achieve sub-pixel accuracy on the Sintel benchmark. 19.2% error reduction from the best published result on the KITTI-2015 benchmark.
BlinkFlow: A Dataset to Push the Limits of Event-based Optical Flow Estimation
Yijin Li,
Zhaoyang Huang,
Shuo Chen,
Xiaoyu Shi,
Hongsheng Li,
Hujun Bao,
Zhaopeng Cui,
Guofeng Zhang IROS, 2023
Paper
We build a benchmark BlinkFlow for training and evaluating event-based optical flow estimation method.
KBNet: Kernel Basis Network for Image Restoration
Yi Zhang,
Dasong Li,
Xiaoyu Shi,
Dailan He,
Kangning Song,
Xiaogang Wang,
Hongwei Qin,
Hongsheng Li Arxiv, 2023
Paper
/
Code
A general-purpose backbone for image restoration tasks.
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation Xiaoyu Shi*,
Zhaoyang Huang*,
Dasong Li,
Manyuan Zhang,
Kachun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li CVPR, 2023
Paper
/
Code
Ranks 1st on Sintel Optical Flow benchmark on Mar. 1st, 2023.
A Simple Baseline for Video Restoration with Spatial-temporal Shift
Dasong Li,
Xiaoyu Shi,
Yi Zhang,
Kachun Cheung,
Simon See,
Xiaogang Wang,
Hongwei Qin,
Hongsheng Li CVPR, 2023
Project Page
/
Paper
/
Code
Our approach is based on grouped spatial-temporal shift, which is a lightweight technique that can implicitly capture inter-frame correspondences for multi-frame aggregation.
FlowFormer: A Transformer Architecture for Optical Flow
Zhaoyang Huang*,
Xiaoyu Shi*,
Chao Zhang,
Qiang Wang,
Kachun Cheung,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li ECCV, 2022
Project Page
/
Paper
/
Code
Ranks 1st on Sintel Optical Flow benchmark on Mar. 17th, 2022.
Decoupled spatial-temporal transformer for video inpainting
Rui Liu,
Hanming Deng,
Yangyi Huang,
Xiaoyu Shi,
Lewei Lu,
Wenxiu Sun,
Xiaogang Wang,
Jifeng Dai,
Hongsheng Li
Arxiv, 2021
Paper
/
Code
We propose a novel decoupled spatial-temporal Transformer (DSTT) framework for video inpainting to improve video inpainting quality with higher running efficiency.
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
Rui Liu,
Hanming Deng,
Yangyi Huang,
Xiaoyu Shi,
Lewei Lu,
Wenxiu Sun,
Xiaogang Wang,
Jifeng Dai,
Hongsheng Li
ICCV, 2021
Paper
/
Code
A Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations.