University of Wollongong
Browse

Triple critical feature capture network: A triple critical feature capture network for weakly supervised object detection

journal contribution
posted on 2024-11-17, 14:15 authored by Zhoufeng Liu, Kaihua Wang, Chunlei Li, Shunmin Ding, Jiangtao Xi
Weakly supervised object detection (WSOD) is becoming increasingly important for computer vision tasks, as it alleviates the burden of manual annotation. Most WSOD techniques rely on multiple instance learning (MIL), which tends to localise the discriminative parts of salient objects instead of the whole object. In addition, network training is often supervised using simple image-level annotations, without including object quantities or location information. However, this can lead to ambiguous differentiation of object instances, both in terms of location and semantics. To address these issues, propose an end-to-end triple critical feature capture network (TCFCNet) for WSOD is proposed. Specifically, a multi-task branch, which can perform fully supervised classification and regression task, was integrated with a PCL in an end-to-end network for refining object locations in an online method. A cyclic parametric dropblock module (CPDM) was then designed to help the detector focus on the contextual information by using cyclic masking techniques to maximise the removal of the discriminative components of an object instance to alleviate the part domination problem. Finally, a feature decoupling module (FDM) is proposed to further reduce the ambiguous distinction of object instances by adaptively constructing robust critical features that adapt to multi-task branch for classification and regression tasks, which contains a feature enhancement module and task-specific polarisation functions. Comprehensive experiments are carried out on the challenging Pascal VOC 2007 and VOC 2012 datasets. The proposed method achieves a 54.6% mAP and a 44.3% mAP on the Pascal VOC 2007 and VOC 2012 datasets respectively, showed that our method outperformed existing mainstream techniques by a considerable margin.

Funding

Utah Science Technology and Research (21IRTSTHN013)

History

Journal title

IET Computer Vision

Language

English

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC