Research

The following is a brief summary of selected past research projects, in reverse chronological order. In most cases, code is available on GitHub, and links are provided. For a more complete list with citation metrics, please refer to my Google Scholar.

ThermalNeRF: Thermal Radiance Fields

Thermal imaging has a variety of applications, from agricultural monitoring to building inspection to imaging under poor visibility, such as in low light, fog, and rain. However, reconstructing thermal scenes in 3D presents several challenges due to the comparatively lower resolution and limited features present in long-wave infrared (LWIR) images. To overcome these challenges, we propose a unified framework for scene reconstruction from a set of LWIR and RGB images, using a multispectral radiance field to represent a scene viewed by both visible and infrared cameras, thus leveraging information across both spectra. We calibrate the RGB and infrared cameras with respect to each other, as a preprocessing step using a simple calibration target. We demonstrate our method on real-world sets of RGB and LWIR photographs captured from a handheld thermal camera, showing the effectiveness of our method at scene representation across the visible and infrared spectra. We show that our method is capable of thermal super-resolution, as well as visually removing obstacles to reveal objects that are occluded in either the RGB or thermal channels.

This work is published at ICCP 2024, and has a project page with links to the paper, code, additional results, and our thermal dataset release. This is joint work with Yvette Lin, Xin-Yi Pan, and Gordon Wetzstein.

Gradient Descent Provably Solves Nonlinear Tomographic Reconstruction

In computed tomography (CT), the forward model consists of a linear Radon transform followed by an exponential nonlinearity based on the attenuation of light according to the Beer-Lambert Law. Conventional reconstruction often involves inverting this nonlinearity as a preprocessing step and then solving a convex inverse problem. However, this nonlinear measurement preprocessing required to use the Radon transform is poorly conditioned in the vicinity of high-density materials, such as metal. This preprocessing makes CT reconstruction methods numerically sensitive and susceptible to artifacts near high-density regions. In this paper, we study a technique where the signal is directly reconstructed from raw measurements through the nonlinear forward model. Though this optimization is nonconvex, we show that gradient descent provably converges to the global optimum at a geometric rate, perfectly reconstructing the underlying signal with a near minimal number of random measurements. We also prove similar results in the under-determined setting where the number of measurements is significantly smaller than the dimension of the signal. This is achieved by enforcing prior structural information about the signal through constraints on the optimization variables. We illustrate the benefits of direct nonlinear CT reconstruction with cone-beam CT experiments on synthetic and real 3D volumes. We show that this approach reduces metal artifacts compared to a commercial reconstruction of a human skull with metal dental crowns.

This project is available as a preprint, and is joint work with Fabrizio Valdivia, Gordon Wetzstein, Ben Recht, and Mahdi Soltanolkotabi.

Neural Microfacet Fields for Inverse Rendering

We propose Neural Microfacet Fields, a method for recovering materials, geometry, and environment illumination from images of a scene. Our method uses a microfacet reflectance model within a volumetric setting by treating each sample along the ray as a (potentially non-opaque) surface. Using surface-based Monte Carlo rendering in a volumetric setting enables our method to perform inverse rendering efficiently by combining decades of research in surface-based light transport with recent advances in volume rendering for view synthesis. Our approach outperforms prior work in inverse rendering, capturing high fidelity geometry and high frequency illumination details; its novel view synthesis results are on par with state-of-the-art methods that do not recover illumination or materials.

This work is published at ICCV 2023, and has a project page with links to the paper and code, as well as additional results. This is joint work with Alex Mai, Dor Verbin, and Falko Kuester.

K-Planes: Explicit Radiance Fields in Space, Time, and Appearance

We introduce k-planes, a white-box model for radiance fields in arbitrary dimensions. Our model uses d-choose-2 planes to represent a d-dimensional scene, providing a seamless way to go from static (d=3) to dynamic (d=4) scenes and beyond. This planar factorization makes adding dimension-specific priors easy, e.g. temporal smoothness and multi-resolution spatial structure, and induces a natural decomposition of static and dynamic components of a scene. We use a linear feature decoder with a learned color basis that yields similar performance as a nonlinear black-box MLP decoder. Across a range of synthetic and real, static and dynamic, fixed and varying appearance scenes, k-planes yields competitive and often state-of-the-art reconstruction fidelity with low memory usage, achieving 1000x compression over a full 4D grid, and fast optimization with a pure PyTorch implementation.

This work is published at CVPR 2023, and has a project page with links to the paper and code, as well as additional results. This is joint work with Giacomo Meanti, Frederik Warburg, Ben Recht, and Angjoo Kanazawa.

Plenoxels: Radiance Fields without Neural Networks

We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis. Plenoxels represent a bounded or unbounded scene as a sparse 3D grid with spherical harmonics. This representation can be optimized from calibrated images via gradient methods and regularization without any neural components. On standard, benchmark tasks, Plenoxels are optimized two orders of magnitude faster than Neural Radiance Fields with no loss in visual quality.

This work is published at CVPR 2022 (oral), and has a project page with links to the paper and code, as well as additional results. This is joint work with Alex Yu, Matthew Tancik, Qinhong Chen, Ben Recht, and Angjoo Kanazawa.

Models Out of Line: A Fourier Lens on Distribution Shift Robustness

Improving the accuracy of deep neural networks (DNNs) on out-of-distribution (OOD) data is critical to an acceptance of deep learning (DL) in real world applications. It has been observed that accuracies on in-distribution (ID) versus OOD data follow a linear trend, and models that outperform this baseline are exceptionally rare (and referred to as “effectively robust”). Recently, some promising approaches have been developed to improve OOD robustness: model pruning, data augmentation, and ensembling or zero-shot evaluating large pretrained models. However, there still is no clear understanding of the conditions on OOD data and model properties that are required to observe effective robustness. We approach this issue by conducting a comprehensive empirical study of diverse approaches that are known to impact OOD robustness on a broad range of natural and synthetic distribution shifts of CIFAR-10 and ImageNet. In particular, we view the “effective robustness puzzle” through a Fourier lens and ask how spectral properties of both models and OOD data influence the corresponding effective robustness. We find this Fourier lens offers some insight into why certain robust models, particularly those from the CLIP family, achieve OOD robustness. However, our analysis also makes clear that no known metric is consistently the best explanation (or even a strong explanation) of OOD robustness. Thus, to aid future research into the OOD puzzle, we address the gap in publicly-available models with effective robustness by introducing a set of pretrained models – RobustNets -— with varying levels of OOD robustness.

This work is published at NeurIPS 2022, and is available here. The code and associated RobustNets model set is available here. This is joint work with Brian Bartoldson, James Diffenderfer, Bhavya Kailkhura, and Peer-Timo Bremer.

Spectral Bias in Practice: The Role of Function Frequency in Generalization

Despite their ability to represent highly expressive functions, deep learning models seem to find simple solutions that generalize surprisingly well. Spectral bias – the tendency of neural networks to prioritize learning low frequency functions – is one possible explanation for this phenomenon, but so far spectral bias has primarily been observed in theoretical models and simplified experiments. In this work, we propose methodologies for measuring spectral bias in modern image classification networks on CIFAR-10 and ImageNet. We find that these networks indeed exhibit spectral bias, and that interventions that improve test accuracy on CIFAR-10 tend to produce learned functions that have higher frequencies overall but lower frequencies in the vicinity of examples from each class. This trend holds across variation in training time, model architecture, number of training examples, data augmentation, and self-distillation. We also explore the connections between function frequency and image frequency and find that spectral bias is sensitive to the low frequencies prevalent in natural images. On ImageNet, we find that learned function frequency also varies with internal class diversity, with higher frequencies on more diverse classes. Our work enables measuring and ultimately influencing the spectral behavior of neural networks used for image classification, and is a step towards understanding why deep models generalize well.

This work is published at NeurIPS 2022, and is available here. The code is available here. This is joint work with Raphael Gontijo-Lopes and Rebecca Roelofs.

When Does Dough Become a Bagel? Analyzing the Remaining Mistakes on ImageNet

Image classification accuracy on the ImageNet dataset has been a barometer for progress in computer vision over the last decade. Several recent papers have questioned the degree to which the benchmark remains useful to the community, yet innovations continue to contribute gains to performance, with today’s largest models achieving 90%+ top-1 accuracy. To help contextualize progress on ImageNet and provide a more meaningful evaluation for today’s state-of-the-art models, we manually review and categorize every remaining mistake that a few top models make in order to provide insight into the long-tail of errors on one of the most benchmarked datasets in computer vision. We focus on the multi-label subset evaluation of ImageNet, where today’s best models achieve upwards of 97% top-1 accuracy. Our analysis reveals that nearly half of the supposed mistakes are not mistakes at all, and we uncover new valid multi-labels, demonstrating that, without careful review, we are significantly underestimating the performance of these models. On the other hand, we also find that today’s best models still make a significant number of mistakes (40%) that are obviously wrong to human reviewers. To calibrate future progress on ImageNet, we provide an updated multi-label evaluation set, and we curate ImageNet-Major: a 68-example “major error” slice of the obvious mistakes made by today’s top models – a slice where models should achieve near perfection, but today are far from doing so.

This work is published at NeurIPS 2022, and is available here. The analysis code and associated ImageNet-M dataset of “major mistakes” is available here. This is joint work with Vijay Vasudevan, Ben Caine, Raphael Gontijo-Lopes, and Rebecca Roelofs.

DAB-quant: An open-source digital system for quantifying immunohistochemical staining with 3, 3′-diaminobenzidine (DAB)

Here, we describe DAB-quant, a novel, open-source program designed to facilitate objective quantitation of immunohistochemical (IHC) signal in large numbers of tissue slides stained with 3,3′-diaminobenzidine (DAB). Scanned slides are arranged into separate folders for negative controls and test slides, respectively. Otsu’s method is applied to the negative control slides to define a threshold distinguishing tissue from empty space, and all pixels deemed tissue are scored for normalized red minus blue (NRMB) color intensity. Next, a user-defined tolerance for error is applied to the negative control slides to set a NRMB threshold distinguishing stained from unstained tissue and this threshold is applied to calculate the fraction of stained tissue pixels on each test slide. Results are recorded in a spreadsheet and pseudocolor images are presented to document how each pixel was categorized. Slides can be analyzed in full, or sampled using small boxes scattered randomly and automatically across the tissue area. Quantitation of sampling boxes enables faster processing, reveals the degree of heterogeneity of signal, and enables exclusion of problem areas on a slide, if needed. This system should prove useful for a broad range of applications.

This work is published in PLOS One, 2022, and is available here. The code, usage instructions, and sample data are available here and at protocols.io. This is joint work with my mom, Judy Fridovich-Keil, and some wonderful folks in her lab, Sneh Patel and Shauna Rasmussen.

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.

This work is published at NeurIPS 2020 (oral), and has a project page with links to the paper and code, as well as additional results. This is joint work with Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Utkarsh Singhal, Nithin Raghavan, Ravi Ramamoorthi, Jon Barron, and Ren Ng.

Neural Kernels without Tangents

We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating “compositional” kernels from bags of features. We show that these operations correspond to many of the building blocks of “neural tangent kernels (NTK)”. Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.

This work is published at ICML 2020, and is available here. Code is available here. This is joint work with Vaishaal Shankar, Alex Fang, Wenshuo Guo, Ludwig Schmidt, Jonathan Ragan-Kelley, and Ben Recht.

A Meta-Analysis of Overfitting in Machine Learning

We conduct the first large meta-analysis of overfitting due to test set reuse in the machine learning community. Our analysis is based on over one hundred machine learning competitions hosted on the Kaggle platform over the course of several years. In each competition, numerous practitioners repeatedly evaluated their progress against a holdout set that forms the basis of a public ranking available throughout the competition. Performance on a separate test set used only once determined the final ranking. By systematically comparing the public ranking with the final ranking, we assess how much participants adapted to the holdout set over the course of a competition. Our study shows, somewhat surprisingly, little evidence of substantial overfitting. These findings speak to the robustness of the holdout method across different data domains, loss functions, model classes, and human analysts.

This work is published at NeurIPS 2019, and is available here. This is joint work with Rebecca Roelofs, John Miller, Vaishaal Shankar, Moritz Hardt, and Ben Recht.

Contact Surface Area: A Novel Signal for Heart Rate Estimation in Smartphone Videos

We consider the problem of smartphone video-based heart rate estimation, which typically relies on measuring the green color intensity of the user’s skin. We describe a novel signal in fingertip videos used for smartphone-based heart rate estimation: fingertip contact surface area. We propose a model relating contact surface area to pressure, and validate it on a dataset of 786 videos from 62 participants by demonstrating a statistical correlation between contact surface area and green color intensity. We estimate heart rate on our dataset with two algorithms, a baseline using the green signal only and a novel algorithm based on both color and area. We demonstrate lower rates of substantial errors (> 10 beats per minute) using the novel algorithm (4.1%), compared both to the baseline algorithm (6.4%) and to published results using commercial color-based applications (≥ 6%).

This work is based on my undergraduate senior thesis, and is published at GlobalSIP 2018. The paper is available here and code is available here. This is joint work with Peter Ramadge.