University of Wollongong
Browse

Evaluation of part of speech tagging on Persian text

Download (385.45 kB)
conference contribution
posted on 2024-11-13, 12:43 authored by F Raja, H Amiri, S Tasharofi, M Sarmadi, H Hojjat, Farhad Oroumchian
One of the fundamental tasks in natural language processing is part of speech (POS) tagging. A POS tagger is a piece of software that reads text in some language and assigns a part of speech tag to each one of the words. Our main interest in this research was to see how easy it is to apply methods used in a language such as English to a new and different language such as Persian (Farsi) and what would be the performance of such approaches. This paper presents evaluation of several part of speech tagging methods on Persian text. These are a statistical tagging method, a memory based tagging approach and two different versions of Maximum Likelihood Estimation (MLE) tagging on Persian text. The two MLE versions differ in the way they handle the unknown words. We also demonstrate the value of simple heuristics and post-processing in improving the accuracy of these methods. These experiments have been conducted on a manually part of speech tagged Persian corpus with over two million tagged words. The results of the experiments are encouraging and comparable with the other languages such as English, German or Spanish.

History

Citation

Raja, F, Amiri, H, Tasharofi, S, Sarmadi, M, Hojjat, H and Oroumchian, F, Evaluation of part of speech tagging on Persian text, Proceedings of the Second Workshop on Computational Approaches to Arabic Script-based Languages, Stanford, California, 21-22 July 2007. Original conference information available here

Pagination

120-127

Language

English

RIS ID

23496

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC