Part-Based Feature Aggregation Method for Dynamic Scene Recognition
RIS ID
141462
Abstract
2019 IEEE. Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.
Publication Details
X. Peng & A. Bouzerdoum, "Part-Based Feature Aggregation Method for Dynamic Scene Recognition," in 2019 Digital Image Computing: Techniques and Applications, DICTA 2019, 2019,