Learning Joint 2D-3D Representations for Depth Completion
University of Toronto, Uber ATG
In this paper, we tackle the problem of depth completion from RGBD data. Towards this goal, we design a simple yet effective neural network block that learns to extract joint 2D and 3D features. Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points, with their output features fused in image space. We build the depth completion network simply by stacking the proposed block, which has the advantage of learning hierarchical representations that are fully fused between 2D and 3D spaces at multiple levels. We demonstrate the effectiveness of our approach on the challenging KITTI depth completion benchmark and show that our approach outperforms the state-of-the-art.
1st place on KITTI Depth Completion
The proposed 2D-3D Fuse Block can fully exploit both 2D appearance and 3D geometric features. It contains 3 components:
FuseNet is built with proposed blocks plus a few 2D convolution layers at the input and output stages. It is trained from scratch without using any additional data or pretrain weight.