University of Wollongong
Browse

File(s) not publicly available

A part-based spatial and temporal aggregation method for dynamic scene recognition

journal contribution
posted on 2024-11-17, 12:56 authored by Xiaoming Peng, Abdesselam Bouzerdoum, Son Lam Phung
Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed to aggregate local features from video frames. A pre-trained Fast R-CNN model is used to extract local convolutional features from the regions of interest of training images. These features are clustered to locate representative parts. A set cover problem is then formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN model. Local features from a video segment are extracted at different layers of the fine-tuned Fast R-CNN model and aggregated both spatially and temporally. Extensive experimental results show that the proposed method is very competitive with state-of-the-art approaches.

History

Journal title

Neural Computing and Applications

Volume

33

Issue

13

Pagination

7353-7370

Language

English

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC