Scopus Harvesting Series

A Five-Step Workflow to Manually Annotate Unstructured Data into Training Dataset for Natural Language Processing

Yunshu Zhu, University of Wollongong
Ting Song, University of Wollongong
Zhenyu Zhang, University of Wollongong
Mengyang Yin, Opal Healthcare
Ping Yu, University of Wollongong

Publication Name

Studies in health technology and informatics

Abstract

Natural Language Processing (NLP) is a powerful technique for extracting valuable information from unstructured electronic health records (EHRs). However, a prerequisite for NLP is the availability of high-quality annotated datasets. To date, there is a lack of effective methods to guide the research effort of manually annotating unstructured datasets, which can hinder NLP performance. Therefore, this study develops a five-step workflow for manually annotating unstructured datasets, including (1) annotator training and familiarising with the text corpus, (2) vocabulary identification, (3) annotation schema development, (4) annotation execution, and (5) result validation. This framework was then applied to annotate agitation symptoms from the unstructured EHRs of 40 Australian residential aged care facilities. The annotated corpus achieved an accuracy rate of 96%. This suggests that our proposed annotation workflow can be used in manual data processing to develop annotated training corpus for developing NLP algorithms.

Open Access Status

This publication may be available as open access

Volume

310

First Page

109

Last Page

113

Link to Full Text

COinS

Link to publisher version (DOI)

http://dx.doi.org/10.3233/SHTI230937

Scopus Harvesting Series

A Five-Step Workflow to Manually Annotate Unstructured Data into Training Dataset for Natural Language Processing

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Link to publisher version (DOI)

Search

Browse

Links

Scopus Harvesting Series

A Five-Step Workflow to Manually Annotate Unstructured Data into Training Dataset for Natural Language Processing

Authors

Publication Name

Abstract

Open Access Status

Volume

First Page

Last Page

Share

Link to publisher version (DOI)

Search

Browse

Links