Xiaoyu Shi 石晓宇

I am a 4th-year Ph.D student at MMLab of the Chinese University of Hong Kong, advised by Prof. Hongsheng Li. Prior to this, I obtained Bachelor of Engineer & Bachelor of Science degrees from Zhejiang University and Simon Fraser University, respectively. I'm interested in computer vision and machine learning, with a special focus on video generation and correspondence learning.

Email  /  Google Scholar  /  Github

profile photo
News
  • Three papers accepted to ECCV 2024.
  • One paper accepted to SIGGRAPH 2024.
  • Our VideoFlow, FlowFormer++, and FlowFormer occupy the TOP 3 places in the Sintel Optical Flow benchmark among published papers.
  • Two papers accepted to NeurIPS 2023.
  • One paper accepted to ICCV 2023.
  • One paper accepted to IROS 2023.
  • Two papers accepted to CVPR 2023.
  • One paper accepted to ECCV 2022.
Publication

Representative papers are highlighted.

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Xiaoyu Shi, Zhaoyang Huang, Fu-Yun, Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Kachun Cheung, Simon See,
Hongwei Qin, Jifeng Dai, Hongsheng Li
Siggraph, 2024
Paper / Project page

We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two stages with explicit motion modeling.

Context-TAP: Tracking Any Point Demands Spatial Context Features
Weikang Bian*, Zhaoyang Huang*, Xiaoyu Shi, Yitong Dong,
Yijin Li, Hongsheng Li
NeurIPS, 2023
Project page / Paper

We set new SOTA on the task of Tracking Any Point (TAP) by introducing rich spatial context features.

A Unified Conditional Framework for Diffusion-based Image Restoration
Yi Zhang, Xiaoyu Shi, Dasong Li, Xiaogang Wang, Hongsheng Li
NeurIPS, 2023
Project page / Paper / Code

A unified conditional framework based on diffusion models for image restoration.

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation
Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li,
Manyuan Zhang, Kachun Cheung, Simon See, Hongwei Qin,
Jifeng Dai, Hongsheng Li
ICCV, 2023
Paper / Code

First method to achieve sub-pixel accuracy on the Sintel benchmark. 19.2% error reduction from the best published result on the KITTI-2015 benchmark.

BlinkFlow: A Dataset to Push the Limits of Event-based Optical Flow Estimation
Yijin Li, Zhaoyang Huang, Shuo Chen, Xiaoyu Shi, Hongsheng Li,
Hujun Bao, Zhaopeng Cui, Guofeng Zhang
IROS, 2023
Paper

We build a benchmark BlinkFlow for training and evaluating event-based optical flow estimation method.

KBNet: Kernel Basis Network for Image Restoration
Yi Zhang, Dasong Li, Xiaoyu Shi, Dailan He, Kangning Song, Xiaogang Wang, Hongwei Qin, Hongsheng Li
Arxiv, 2023
Paper / Code

A general-purpose backbone for image restoration tasks.

FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation
Xiaoyu Shi*, Zhaoyang Huang*, Dasong Li, Manyuan Zhang,
Kachun Cheung, Simon See, Hongwei Qin,
Jifeng Dai, Hongsheng Li
CVPR, 2023
Paper / Code

Ranks 1st on Sintel Optical Flow benchmark on Mar. 1st, 2023.

A Simple Baseline for Video Restoration with Spatial-temporal Shift
Dasong Li, Xiaoyu Shi, Yi Zhang, Kachun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li
CVPR, 2023
Project Page / Paper / Code

Our approach is based on grouped spatial-temporal shift, which is a lightweight technique that can implicitly capture inter-frame correspondences for multi-frame aggregation.

FlowFormer: A Transformer Architecture for Optical Flow
Zhaoyang Huang*, Xiaoyu Shi*, Chao Zhang, Qiang Wang,
Kachun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li
ECCV, 2022
Project Page / Paper / Code

Ranks 1st on Sintel Optical Flow benchmark on Mar. 17th, 2022.

Decoupled spatial-temporal transformer for video inpainting
Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li
Arxiv, 2021
Paper / Code

We propose a novel decoupled spatial-temporal Transformer (DSTT) framework for video inpainting to improve video inpainting quality with higher running efficiency.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li
ICCV, 2021
Paper / Code

A Transformer model designed for video inpainting via fine-grained feature fusion based on novel Soft Split and Soft Composition operations.