University of Wollongong
Browse

File(s) not publicly available

Cost-effective Data Labelling for Graph Neural Networks

journal contribution
posted on 2024-11-17, 15:02 authored by Shixun Huang, Ge Lee, Zhifeng Bao, Shirui Pan
Active learning (AL), that aims to label limited data samples to effectively train the model, stands as a very cost-effective data labelling strategy in machine learning. Given the state-of-the-art performance GNNs have achieved in graph-based tasks, it is critical to design proper AL methods for graph neural networks (GNNs). However, existing GNN-based AL methods require considerable supervised information to guide the AL process, such as the GNN model to use, and initially labelled nodes and labels of newly selected nodes. Such dependency on supervised information limits both flexibility and scalabilty. In this paper, we propose an unsupervised, scalable and flexible AL method - it incurs low memory footprints and time cost, is flexible to the choice of underlying GNNs, and operates without requiring GNN-model-specific knowledge or labels of selected nodes. Specifically, we leverage the commonality of existing GNNs to reformulate the unsupervised AL problem as the Aggregation Involvement Maximization (AIM) problem. The objective of AIM is to maximize the involvement or participation of all nodes during the feature aggregation process of GNNs for nodes to be labelled. In this way, the aggregated features of labelled nodes can be diversified to a large extent, thereby benefiting the training of feature transformation matrices which are major trainable components in GNNs. We prove that the AIM problem is NP-hard and propose an efficient solution with theoretical guarantees. Extensive experiments on public datasets demonstrate the effectiveness, scalability and flexibility of our method. Our study is highly relevant to the track "Graph Algorithms and Modeling for the Web"since we focus one of the major listed topics "Graph Embedding and GNNs for the Web"and AL for GNNs, as an important research problem, is faced by aforementioned challenges to be tackled in this paper.

Funding

Australian Research Council (DP220101434)

History

Journal title

WWW 2024 - Proceedings of the ACM Web Conference

Pagination

353-364

Language

English

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC