DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models

1Stanford University   2ETH Zürich   3MPI for Intelligent Systems, Tübingen   4KU Leuven
* Work done as a visiting researcher at Stanford


Perpetual view generation—the task of generating long- range novel views by flying into a given image—has been a novel yet promising task. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving both local and global consistency significantly better than prior GAN-based methods.



Overview of our pipeline. We train an image-conditional diffusion model to perform image-to-image refinement and inpainting given a corrupted image and its missing region mask. At inference, we perform stochastic conditioning on three conditionings: naive forward warping from the previous frame (black arrow), anchored conditioning by warping a further frame (blue arrow), and lookahead conditioning by warping a virtual future frame (red arrow). We repeat this render-refine-repeat pipeline to get sequences extrapolating a given image.


Consistent long camera trajectory synthesis.


