Semantic and spatial content fusion for scene recognition
In the field of scene recognition, it is usually insufficient to use only one visual feature regardless of how discriminative the feature is. Therefore, the spatial location and semantic relationships of local features need to be captured together with the scene contextual information. In this paper we proposed a novel framework to project image contextual feature space with semantic space of local features into a map function. This embedding is performed based on a subset of training images denoted as an exemplar-set. This exemplar-set is composed of images that describe better the scene category's attributes than the other images. The proposed framework learns a weighted combination of local semantic topics as well as global and spatial information, where the weights represent the features' contributions in each scene category. An empirical study was performed on two of the most challenging scene datasets 15-Scene categories and 67-Indoor Scenes and the promising results of 89.47 and 45.0 were achieved respectively.