K-Planes:

Explicit Radiance Fields in Space, Time, and Appearance


Sara Fridovich-Keil*, Giacomo Meanti*, Frederik Rahbæk Warburg, Benjamin Recht, Angjoo Kanazawa

* Denotes Equal Contribution



3D
Static Scene

4D Mono-"Teleporting" Dynamic Scene

4D Multiview
Dynamic Scene

3D Varying
Appearance Scene

Planar factorization of d-dimensional spaces. We propose a simple planar factorization for radiance fields that naturally extends to arbitrary-dimensional spaces, and that scales gracefully with dimension in both optimization time and model size. We show the advantages of our approach on 3D static volumes, 3D photo collections with varying appearances, and 4D dynamic videos.

Abstract


Diagram describing the K-Planes model.

Abstract: We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our model uses d-choose-2 planes to represent a d-dimensional scene, providing a seamless way to go from static (d=3) to dynamic (d=4) scenes. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, and induces a natural decomposition of static and dynamic components of a scene. We use a linear feature decoder with a learned color basis that yields similar performance as a nonlinear black-box MLP decoder. Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity with low memory usage, achieving 1000x compression over a full 4D grid, and fast optimization with a pure PyTorch implementation.

K-Plane Factorization of 4D Dynamic Volumes


Diagram describing the pipeline for K-Planes.

Method overview. (a) The k-plane representation decomposes 4D spatiotemporal volumes into six planes, three for space and three for spatiotemporal variations. To obtain the value of a 4D point \(\textbf{q}=(x,y,z,t)\), we first project the point into each plane (b) in which we do multiscale bilinear interpolation. (c) The interpolated values are multiplied and concatenated over \(S\) scales. (d) These features are decoded either with a small MLP or our explicit linear decoder. (e) We follow the standard volumetric rendering formula to predict color and density. The model is optimized by (f) minimizing the reconstruction loss with simple regularization in space and time.

4D Dynamic Multiview Scenes


(a) Hybrid model

(b) Explicit model

Novel view synthesis for 4D dynamic volumes. Both our explicit and hybrid models yield state-of-the-art reconstructions on real multi-view dynamic scenes.

Space-time Decomposition


Full

Space-only

Time-only

Space-time decomposition. K-planes naturally decomposes a 4D video into static and dynamic components. We render the static part by setting the time planes to the identity, and the remainder is the dynamic part. This makes adding dimension-specific priors easy.

4D Dynamic Monocular "Teleporting-Camera" videos


(a) Hybrid model

(b) Explicit model

Novel view synthesis for 4D dynamic volumes. Reconstruction and novel view generation on a monocular teleporting-camera dataset of dynamic synthetic objects (the D-NeRF dataset). D-NeRF training data includes a single view per timestep, but the camera can “teleport” between adjacent timesteps. See "Monocular Dynamic View Synthesis: A Reality Check".

Space-time Decomposition


Full

Space-only

Time-only

Space-time decomposition. K-planes naturally decomposes a 4D video into static and dynamic components. We render the static part by setting the time planes to the identity, and the remainder is the dynamic part. This makes adding dimension-specific priors easy.

In the wild capture: Varying appearance


(a) Hybrid model

(b) Explicit model

Varying Appearance Phototourism Scenes. Like NeRF-W, we can interpolate our appearance code in either our explicit or hybrid model to alter the visual appearance of landmarks, such as changing time of day. Appearance codes affect color but not geometry; note that objects in the foreground appear and disappear as the camera moves past them.

3D Static Forward Facing Scenes


Static forward facing scenes. Reconstruction and novel view generation on a dataset of forward facing captures (the LLFF dataset).

3D Synthetic Static Scenes


Static 360 scenes. Reconstruction and novel view generation on a dataset of synthetic objects (the NeRF dataset).

Paper


thumbnail image of the paper

K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

Sara Fridovich-Keil*, Giacomo Meanti*, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa

description arXiv
insert_comment BibTeX
integration_instructions Code

The day before our arxiv release, a similar paper Hexplanes was released on arxiv. Please consider checking out that paper as well: https://caoang327.github.io/HexPlane/

Citation



@misc{sfk_kplanes_2023,
    title={K-Planes: Explicit Radiance Fields in Space, Time, and Appearance},
    author={Sara Fridovich-Keil and Giacomo Meanti and Frederik Rahbæk Warburg and Benjamin Recht and Angjoo Kanazawa},
    year={2023},
    eprint={2301.10241},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgments


Many thanks to Matthew Tancik and Ruilong Li, and other members of KAIR for helpful discussion and pointers to resources. We also thank the DyNeRF authors for their response to our questions about their method. We also thank Joey Litalien for providing us with the framework for this website.

Detailed Drum Set ©bryanajones (CC BY 2.0)
Lego 856 Bulldozer ©Håvard Dalen (CC BY-NC 2.0)
D-NeRF dataset ©Alberto Pumarola
DyNeRF dataset (CC BY-NC 4.0)
Phototourism dataset (CC BY 2.0)