▲ Approach overview. We combine volumetric rendering with a new patch warping technique. Both approaches aggregate color from points sampled along the camera ray: radiance predicted by the radiance network for volumetric rendering and patch extracted from source views for our patch warping.
这篇文章虽然是基于表面渲染 IDR [7] 方法的,但我还是整理在这里了,因为它提出了几种比较重要的 regularizations。首先是使用 MVS 获得额外的点云数据,然后通过 PCA 得到他们的法向,可以获得几何约束:这种约束是十分有效且广泛使用的,在后续的几篇文章中也会经常看到。第二项约束是 SDF 的二阶导数项约束。在 surface reconstruction 中为了让 implicit function 是一个 signed distance function,我们要约束 implicit function 使得导数处处为 1,常常使用 IGR 中引入的 Eikonal Loss:但是只约束一阶导数是不够的,因为我们只能在离散的点上加约束,对于这样的锯齿型状 /\/\/\/\ 也是满足条件的,但我们并不希望 implicit function 发生突变。所以引入二阶导数,即 Hessian 也是很有必要的,可以让 surface 更加平滑:其实约束整个 Hessian 矩阵是没有必要的,DiGS [8] 表明只需要对 Divergence 增加约束即可,即 Hessian 矩阵对角线上的元素。第三项约束是 minimal surface loss。这项约束在 active contour, active surface 方法中非常常见,如ChanVese,Mumfold-Shah,Stereoscopic Segmentation 等模型,可以去除 noise,让轮廓更加平滑,了解 variational method 的同学应该对这项约束非常熟悉。物体内部的体积可以表示为:其中 即为著名的 Heaviside function,当 SDF 的值小于 0 的时候它的值为 1,当 SDF 的值大于 0 的时候它的值为 0。求导即可获得物体表面的面积,其中 Dirac function 即为 Heaviside function 的导数。然而这项约束在基于体渲染的方法中很少使用,因为基于体渲染的方法结果往往是 over-smooth 的,我们想要获得更多的细节。第二项第三项约束同时使用可以让 implicit function 有一个很好的形状,效果如下图所示:
▲ Effects of the regularization. The Hessian loss tendsto preserve normals, and the minimal surface constraint closes thesurface as planes. We can achieve a natural interpolation by thecombination of these two regularizations.
▲Comparison between sphere-based sampling and our proposed sampling strategy. Sphere-based sampling (a), used in NeuS, generates samples scattered throughout the unit sphere and spanning the whole scene, with the result that most samples lie in empty regions and are hence unnecessary. We propose voxel-guided sampling (b) to avoid unnecessary samples by sampling only within a sparse voxel volume around surfaces estimated from SfM point clouds (only a subset of voxels are shown for clarity). To further increase the sampling density around surfaces, we additionally propose a surface-guided sampling strategy (c), where we store SDF values from previous training iterations in the sparse voxels, and generate samples within a smaller range centered around the estimated surface positions. Note that each successive region of the volume considered by each sampling strategy from (a) to (b) to (c) is progressively smaller as suggested by the 2D blue and red bounding boxes.
▲ Using numerical gradients for higher-order derivatives distributes the back-propagation updates beyond the local hash grid cell, thus becoming a smoothed version of analytical gradients.
▲ We elaborate on the main differences between the hierarchical deformable anchors representation and some baseline variants. From left to right: (1) Methods such as NeuS, volSDF, and UNISUFR sample points along a single ray; (2, 3) Standard voxel grid approaches store a learnable embedding (or) feature at each vertex. Spatial context could be simply handled through the feature aggregation operation. The multi-resolution (or hierarchical) voxel grid representation can further explore different receptive fields; (4) Our method maintains a 3D position (or anchor point) instead of a feature vector at each vertex. We optimize the anchor points such that different geometry structures can be adaptively represented.
▲ Deformation Process of Anchor Points. The anchorpoints (e.g. orange points) are uniformly distributed in the 3Dbox at beginning and would move to object surfaces as trainingconvergences. Zoom in for better view.
PermutoSDF, CVPR 2023 [19]
这篇文章也是利用 grid 对训练进行加速,但和大多数论文使用立方体的 grid,而这篇论文提出使用三角锥型的 grid。这篇文章的工程量巨大,毕竟是从零实现了一种新的 grid,要写一套新的 CUDA 代码。而且作者还使用了许多 tricks 来加速训练,提供了许多值得思考的点。文章的流程如下图所示。
▲ Overview of our PermutoSDF pipeline. (1) For a batch of pixels from the posed images, we sample rays inside the volume of interest. (2) For each sample, we slice features from a multi-resolution permutohedral lattice. (3) The features from all lattice levels are concatenated. For the color network, we also concatenate additional features regarding normal n of the SDF, view direction v, and learnable features x from the SDF network. (4) Small MLPs decode the SDF and a view-dependent RGB color. (5) The output is rendered volumetrically and supervised only with RGB images. We visualize surface color and a 2D slice of the SDF.
Sphere-Guided Training of Neural Implicit Surfaces, CVPR 2023 [20]
这篇文章的核心思想也是 efficient sampling,不过是用一个个位置可学习的 sphere。之前也有很多文章用到了 efficient sampling,例如上面讲到的 Neural 3D Reconstruction in the Wild,不过它需要 SFM 得到的 sparse points 初始化 voxel,而作者提出的方法是可以 train from scratch 的。 而且之前的方法只改进了 ray-marching,ray-sampling 还是在图片中随机取样,作者提出的方法同时改进了 ray-marching 和 ray-sampling 过程。新的采样流程如下:
▲ Our method works by filtering the samples along the ray that lie outside of the surface region, approximated by a trainable sphere cloud. Such filtering improves the sample efficiency in the optimization process and allows the implicit function to converge to a better optimum.
对于 ray marching 而言,只在 sphere 内部进行采样,大大提高了采样的有效性。对于 ray sampling,为了避免选到和物体表面不相交的射线,作者从sphere包裹的空间中均匀的选取射线的末端点(我也没有读代码,也不太清楚作者是怎么实现的)。合理的 sphere 应该是均匀覆盖在物体表面的,sphere 初始化是均匀分布在空间中的,sphere 的优化过程要遵循两点,第一点是 sphere 的球心尽可能接近物体表面:另一点是球尽可能均匀分散,所以要惩罚球心之间的距离:球体的半径也是随着训练指数次减小 。球体的运动轨迹如下图所示:
▲ Visualization of the training process. Initially, we assign a large radius to all spheres in the cloud (a) and gradually reduce it during the optimization down to a minimum value (c). Our proposed repulsion loss prevents the clumping of the spheres and encourages exploration, which results in an improved reconstruction of the thin surfaces (d).