Kernel-based feature aggregation framework in point cloud networks
journal contribution
posted on 2024-11-17, 16:56authored byJianjia Zhang, Zhenxi Zhang, Lei Wang, Luping Zhou, Xiaocai Zhang, Mengting Liu, Weiwen Wu
Various effective deep networks have been developed for analysis of 3D point clouds. One key step in these networks is to aggregate the features of orderless points into a compact representation for the cloud. As a typical order-invariant aggregation method, max-pooling has been widely applied. However, while enjoying simplicity and high efficiency, max-pooling does not fully exploit the feature information since it not only ignores the non-maximum elements in each feature dimension but also neglects the interactions between different dimensions. These drawbacks of max-pooling motivate us to explore advanced feature aggregation methods for 3D point cloud analysis. The desired advanced method should be capable of modeling richer information between the point features than max-pooling, and, at the same time, it can readily replace max-pooling without the need to modify other parts of the existing network architecture. To this end, this paper proposes a novel kernel-based feature aggregation framework for 3D point cloud analysis for the first time. The proposed method effectively considers all the elements in each dimension and models the nonlinear interactions between feature dimensions as complementary information to max-pooling. In addition, it is a plug-in module that can be integrated to many common networks as a replacement of max-pooling. Comprehensive experiments are conducted to demonstrate the consistently superior performance and high generality of the proposed method over max-pooling. Specifically, the proposed kernel-based feature aggregation framework consistently improves the max-pooling with three representative backbones of PointNet, DGCNN and PCT across four 3D point cloud based analysis tasks, including supervised 3D object classification, 3D part segmentation, indoor semantic segmentation and one additional unsupervised place retrieval task. Especially, it shows remarkable performance improvement over max-pooling in the unsupervised retrieval task, demonstrating its advantage in forming 3D point cloud representation.