Cost-effective Data Labelling for Graph Neural Networks

Publication Name

WWW 2024 - Proceedings of the ACM Web Conference

Abstract

Active learning (AL), that aims to label limited data samples to effectively train the model, stands as a very cost-effective data labelling strategy in machine learning. Given the state-of-the-art performance GNNs have achieved in graph-based tasks, it is critical to design proper AL methods for graph neural networks (GNNs). However, existing GNN-based AL methods require considerable supervised information to guide the AL process, such as the GNN model to use, and initially labelled nodes and labels of newly selected nodes. Such dependency on supervised information limits both flexibility and scalabilty. In this paper, we propose an unsupervised, scalable and flexible AL method - it incurs low memory footprints and time cost, is flexible to the choice of underlying GNNs, and operates without requiring GNN-model-specific knowledge or labels of selected nodes. Specifically, we leverage the commonality of existing GNNs to reformulate the unsupervised AL problem as the Aggregation Involvement Maximization (AIM) problem. The objective of AIM is to maximize the involvement or participation of all nodes during the feature aggregation process of GNNs for nodes to be labelled. In this way, the aggregated features of labelled nodes can be diversified to a large extent, thereby benefiting the training of feature transformation matrices which are major trainable components in GNNs. We prove that the AIM problem is NP-hard and propose an efficient solution with theoretical guarantees. Extensive experiments on public datasets demonstrate the effectiveness, scalability and flexibility of our method. Our study is highly relevant to the track "Graph Algorithms and Modeling for the Web"since we focus one of the major listed topics "Graph Embedding and GNNs for the Web"and AL for GNNs, as an important research problem, is faced by aforementioned challenges to be tackled in this paper.

Open Access Status

This publication is not available as open access

First Page

353

Last Page

364

Funding Number

DP220101434

Funding Sponsor

Australian Research Council

Share

COinS
 

Link to publisher version (DOI)

http://dx.doi.org/10.1145/3589334.3645339