Shengqu Cai 「蔡盛曲」

I'm a CS PhD student at Stanford University, advised by Prof. Gordon Wetzstein and Prof. Leonidas Guibas, affliated with Computational Imaging Lab and Geometric Computing Lab. I am partly supported by a Stanford School of Engineering Fellowship.

Before Stanford, I was a CS master student at ETH Zürich supervised by Prof. Luc Van Gool. I obtained my Bachelor degree in Computer Science with first honour from King's College London in United Kingdom, where I spent some time working on information theory.

In 2022, I spent a few wonderful months working on diffusion with Eric Chan and Songyou Peng. I started my research career back in 2021 working on NeRFs and GANs with Anton Obukhov. I consider them as mentors coming into research and who I try to learn from.

I am interested in solving graphics or inverse graphics tasks that are fundamentally ill-posed via traditional methods, slay the unslayable. I have been working primarily around neural rendering and generative models, including but not limited to diffusion models, inverse rendering, unsupervised learning methods, scene representations, etc. I like making cool theories, videos, demos and applications.

Email  /  CV  /  Google Scholar  /  Semantic Scholar  /  Github  /  Twitter  /  Linkedin

profile photo
* This is me prior-COVID. Since then I gained >40 pounds and lost my cool
;(
News
  • 2025-03: We received a NVIDIA Grant for our research!
  • 2025-02: DSD and X-Dyna are accepted to CVPR 2025, see you in Nashville!
  • 2024-09: CVD is accepted to NeurIPS 2024, see you in Vancouver!
  • 2024-02: Generative Rendering is accepted to CVPR 2024, see you in Seattle!
  • 2023-09: I joined Stanford University for PhD in Computer Science!
  • 2023-07: DiffDreamer is accepted by ICCV 2023, looking forward to Paris!
  • 2023-05: I graduated from ETH Zürich!
  • 2023-01: I will be working as a research intern at Adobe this summer!
  • 2022-03: Pix2NeRF is accepted by CVPR 2022. First submission first accept!
Publications

* indicates equal contribution

moc Mixture of Contexts for Long Video Generation
Shengqu Cai, Ceyuan Yang, Lvmin Zhang, Yuwei Guo, Junfei Xiao, Ziyan Yang, Yinghao Xu, Zhenheng Yang, Alan Yuille, Leonidas Guibas, Maneesh Agrawala, Lu Jiang, Gordon Wetzstein
In arXiv 2025
[Project Page][Paper]

Learnable sparse attention routing enables minute-long context with short-video cost, pruning most token pairs while preserving minute-long context coherence.

framepack FramePack: Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models
Lvmin Zhang, Shengqu Cai, Muyang Li, Gordon Wetzstein, Maneesh Agrawala
In arXiv 2025
[Project Page] [Paper] [Code]

Frame context packing for next-frame prediction enables longer contexts within fixed sequence budgets, with drift prevention to reduce error accumulation.

dsd Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein
In CVPR 2025
[Project Page][Paper][Code][Demo]

Training-free customized image generation model that scales to any instance and any context.

captain_cinema Captain Cinema: Towards Short Movie Generation
Junfei Xiao, Ceyuan Yang, Lvmin Zhang, Shengqu Cai, Yang Zhao, Yuwei Guo, Gordon Wetzstein, Maneesh Agrawala, Alan Yuille, Lu Jiang
In arXiv 2025
[Project Page] [Paper]

Top-down keyframe planning with bottom-up long-context video synthesis, using interleaved training to adapt MM-DiT for stable and efficient multi-scene generation.

clsplats CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization
Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas Guibas, Songyou Peng
In ICCV 2025
[Project Page][Paper][Code]

Efficiently updates Gaussian splatting-based 3D scene reconstructions from incremental images via change detection and local optimization.

bytemorph ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
Di Chang*, Mingdeng Cao*, Yichun Shi, Bo Liu, Shengqu Cai, Shijie Zhou, Weilin Huang, Gordon Wetzstein, Mohammad Soleymani, Peng Wang
In arXiv 2025
[Project Page] [Paper] [Code]

Large-scale benchmark and baseline for instruction-guided image editing with complex non-rigid motions such as viewpoint changes, articulations, and deformations.

restyle3d ReStyle3D: Scene-level Appearance Transfer with Semantic Correspondences
Liyuan Zhu, Shengqu Cai*, Shengyu Huang*, Gordon Wetzstein, Naji Khosravan, Iro Armeni
In SIGGRAPH 2025
[Project Page][Paper][Code]

Scene-level appearance transfer from a single style image to multi-view real-world scenes with semantic correspondences.

xdyna X-Dyna: Expressive Dynamic Human Image Animation
Di Chang, Hongyi Xu*, You Xie*, Yipeng Gao*, Zhengfei Kuang*, Shengqu Cai*, Chenxu Zhang*, Guoxian Song, Chao Wang, Yichun Shi, Zeyuan Chen, Shijie Zhou, Linjie Luo, Gordon Wetzstein, Mohammad Soleymani
In CVPR 2025 (Highlight)
[Project Page][Paper][Code]

Human image animation using facial expressions and body movements derived from a driving video.

cvd Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
Zhengfei Kuang*, Shengqu Cai*, Hao He, Yinghao Xu, Hongsheng Li, Leonidas Guibas, Gordon Wetzstein
In NeurIPS 2024
[Project Page][Paper][Code]

Multi-view/multi-trajectory generation of videos sharing the same underlying content and dynamics.

traj Robust Symmetry Detection via Riemannian Langevin Dynamics
Jihyeon Je*, Jiayi Liu*, Guandao Yang*, Boyang Deng*, Shengqu Cai, Gordon Wetzstein, Or Litany, Leonidas Guibas
In SIGGRAPH Asia 2024
[Project Page][Paper]

Render low fidelity animated mesh directly into animation using pre-trained 2D diffusion models, without the need of any further training/distillation.

genren Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai, Duygu Ceylan*, Matheus Gadelha*, Chun-Hao Paul Huang, Tuanfeng Y. Wang, Gordon Wetzstein
In CVPR 2024
[Project Page][Paper]

Render low fidelity animated mesh directly into animation using pre-trained 2D diffusion models, without the need of any further training/distillation.

diffdreamer DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models
Shengqu Cai, Eric Ryan Chan, Songyou Peng, Mohamad Shahbazi, Anton Obukhov, Luc Van Gool, Gordon Wetzstein
In ICCV 2023
[Project Page][Paper][Code]

A diffusion-model based unsupervised framework capable of synthesizing novel views depicting a long camera trajectory flying into an input image.

pix2nerf Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
Shengqu Cai, Anton Obukhov, Dengxin Dai, Luc Van Gool
In CVPR 2022
[Paper][Code]

3D-free unsupervised Single view NeRF-based novel view synthesis via conditional NeRF-GAN training and inversion.

Misc

  • Conference Review: CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, Eurographics, SIGGRAPH
  • Journal Review: IJCV, Computing Surveys

  • © Shengqu Cai | Last updated: March 25th, 2025 | Website Template