This research introduces an innovative approach to few-shot sound classification applied to classroom sound recordings that integrates Label Set Operation (LaSO) features with Prototypical Networks. Traditional audio classification methods often require extensive labeled datasets, which can be impractical in real-world scenarios where obtaining large amounts of labeled audio data is challenging. This is particularly the case for the target application of automatically annotating long recordings of classroom audio to understand student learning in classrooms. This paper proposes an enhanced few-shot learning approach based on Prototypical Networks by incorporating LaSO features, to augment the feature space for the Prototypical Network. This methodology focuses on detecting and classifying teacher and student voices for future understanding and analysis of classroom interactions. Experimental results indicate the proposed approach incorporating LaSO features significantly improves classification accuracy of a prototypical network used for few-shot learning. This work paves the way for more advanced and automated solutions in educational environments, facilitating better monitoring and understanding of classroom dynamics.
Funding
Australian Research Council | DP130100481
Pedagogies for knowledge-building: investigating subject-appropriate, cumulative teaching : Australian Research Council | DP130100481