I am a 4th-year Ph.D student at MMLab of the Chinese University of Hong Kong, advised by Prof. Hongsheng Li. Prior to this, I obtained Bachelor of Engineer & Bachelor of Science degrees from Zhejiang University and Simon Fraser University, respectively. I'm interested in computer vision and machine learning, with a special focus on video generation and correspondence learning.
Our VideoFlow, FlowFormer++, and FlowFormer occupy the TOP 3 places in the Sintel Optical Flow benchmark among published papers.
Two papers accepted to NeurIPS 2023.
One paper accepted to ICCV 2023.
One paper accepted to IROS 2023.
Two papers accepted to CVPR 2023.
One paper accepted to ECCV 2022.
Publication
Representative papers are highlighted.
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Xiaoyu Shi,
Zhaoyang Huang,
Fu-Yun, Wang,
Weikang Bian,
Dasong Li,
Yi Zhang,
Manyuan Zhang,
Kachun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li Siggraph, 2024
Paper
/
Project page
We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling.
Context-TAP: Tracking Any Point Demands Spatial Context Features
Weikang Bian*,
Zhaoyang Huang*,
Xiaoyu Shi,
Yitong Dong,
Yijin Li,
Hongsheng Li NeurIPS, 2023
Project page
/
Paper
We set new SOTA on the task of Tracking Any Point (TAP) by introducing rich spatial context features.
A Unified Conditional Framework for Diffusion-based Image Restoration
Yi Zhang,
Xiaoyu Shi,
Dasong Li,
Xiaogang Wang,
Hongsheng Li NeurIPS, 2023
Project page
/
Paper
/
Code
A unified conditional framework based on diffusion models for image restoration.
VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation Xiaoyu Shi,
Zhaoyang Huang,
Weikang Bian,
Dasong Li,
Manyuan Zhang,
Kachun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li ICCV, 2023
Paper
/
Code
First method to achieve sub-pixel accuracy on the Sintel benchmark. 19.2% error reduction from the best published result on the KITTI-2015 benchmark.
BlinkFlow: A Dataset to Push the Limits of Event-based Optical Flow Estimation
Yijin Li,
Zhaoyang Huang,
Shuo Chen,
Xiaoyu Shi,
Hongsheng Li,
Hujun Bao,
Zhaopeng Cui,
Guofeng Zhang IROS, 2023
Paper
We build a benchmark BlinkFlow for training and evaluating event-based optical flow estimation method.
KBNet: Kernel Basis Network for Image Restoration
Yi Zhang,
Dasong Li,
Xiaoyu Shi,
Dailan He,
Kangning Song,
Xiaogang Wang,
Hongwei Qin,
Hongsheng Li Arxiv, 2023
Paper
/
Code
A general-purpose backbone for image restoration tasks.
FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation Xiaoyu Shi*,
Zhaoyang Huang*,
Dasong Li,
Manyuan Zhang,
Kachun Cheung,
Simon See,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li CVPR, 2023
Paper
/
Code
Ranks 1st on Sintel Optical Flow benchmark on Mar. 1st, 2023.
A Simple Baseline for Video Restoration with Spatial-temporal Shift
Dasong Li,
Xiaoyu Shi,
Yi Zhang,
Kachun Cheung,
Simon See,
Xiaogang Wang,
Hongwei Qin,
Hongsheng Li CVPR, 2023
Project Page
/
Paper
/
Code
Our approach is based on grouped spatial-temporal shift, which is a lightweight technique that can implicitly capture inter-frame correspondences for multi-frame aggregation.
FlowFormer: A Transformer Architecture for Optical Flow
Zhaoyang Huang*,
Xiaoyu Shi*,
Chao Zhang,
Qiang Wang,
Kachun Cheung,
Hongwei Qin,
Jifeng Dai,
Hongsheng Li ECCV, 2022
Project Page
/
Paper
/
Code
Ranks 1st on Sintel Optical Flow benchmark on Mar. 17th, 2022.
Decoupled spatial-temporal transformer for video inpainting
Rui Liu,
Hanming Deng,
Yangyi Huang,
Xiaoyu Shi,
Lewei Lu,
Wenxiu Sun,
Xiaogang Wang,
Jifeng Dai,
Hongsheng Li
Arxiv, 2021
Paper
/
Code
We propose a novel decoupled spatial-temporal Transformer (DSTT) framework for video inpainting to improve video inpainting quality with higher running efficiency.
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
Rui Liu,
Hanming Deng,
Yangyi Huang,
Xiaoyu Shi,
Lewei Lu,
Wenxiu Sun,
Xiaogang Wang,
Jifeng Dai,
Hongsheng Li
ICCV, 2021
Paper
/
Code
A Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations.