Perpetual view generation—the task of generating long- range novel views by flying into a given image—has been a novel yet promising task. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving both local and global consistency significantly better than prior GAN-based methods.
Overview of our pipeline. We train an image-conditional diffusion model to perform image-to-image refinement and inpainting given a corrupted image and its missing region mask. At inference, we perform stochastic conditioning on three conditionings: naive forward warping from the previous frame (black arrow), anchored conditioning by warping a further frame (blue arrow), and lookahead conditioning by warping a virtual future frame (red arrow). We repeat this render-refine-repeat pipeline to get sequences extrapolating a given image.
Consistent long camera trajectory synthesis.
@inproceedings{cai2022diffdreamer,
title = {DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models},
author = {Cai, Shengqu and Chan, Eric Ryan and Peng, Songyou and Shahbazi, Mohamad and Obukhov, Anton
and Van Gool, Luc and Wetzstein, Gordon},
booktitle = {arXiV},
year = {2022}
}