University of Wollongong
Browse

Hamshahri: A standard Persian Text Collection

Download (159.72 kB)
journal contribution
posted on 2024-11-14, 17:20 authored by Abolfazl AleAhmad, Hadi Amiri, Masoud Rahgozar, Farhad Oroumchian
The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is lack of a standard test collection. In this paper we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgment are presented in this paper. We believe that this collection is the largest Persian text collection, so far.

History

Citation

Aleahmad, A., Amiri, H., Rahgozar, M. & Oroumchian, F. 2009, Hamshahri: A standard Persian Text Collection, Knowledge-Based Systems, vol. 22, no. 5, pp. 382-387.

Journal title

Knowledge-Based Systems

Volume

22

Issue

5

Pagination

382-387

Language

English

RIS ID

30261

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC