Publication Details

Comor, I., Zhao, Y., Gao, Z., Zhou, L. & Wang, L. (2016). Image Descriptors from ConvNets: Comparing Global Pooling Methods for Image Retrieval. 2016 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2016 (pp. 1-8). IEEE Xplore: IEEE.


2016 IEEE.A major component of a generic image retrieval pipeline is producing concise and effective descriptors for each image. Previous works have shown impressive results in image retrieval when using descriptors from the black-box output of the fully-connected stage of pretrained Convolutional Neural Networks (ConvNets). However, previous work on descriptors pooled from the deep feature maps from late convolutional layers can produce more discriminative descriptors for generic image retrieval, while being relatively concise. When planning to globally pool such feature maps from a ConvNet, some options to consider are (1) the depth of the network, (2) choice of layer to pool, and (3) the level of dimension reduction. The previous work on global pooling methods uses differing techniques without a clear consensus on which method is best. This motivates us to establish a baseline pipeline from which to compare these options and their effect on retrieval results. Our contribution is a systematic and comprehensive experimental study of different pooling strategies of deep features for image retrieval, and the various options. Our results show that the nature of the dataset (object- heavy or scene-heavy) warrants a different pooling strategy. Significantly, we visualise the level of image discrimination brought by the different pooling methods on the datasets, and show that pooling need not have a priori spatial weights to effectively find objects within the image. The results underline the need to consider the context of the image dataset when developing image retrieval pipelines using ConvNets.



Link to publisher version (DOI)