"Hamshahri: a standard Persian text collection" by Abolfazl AleAhmad, Hadi Amiri et al.

ERA - University of Wollongong (restricted)

Title

Hamshahri: a standard Persian text collection

Authors

Abolfazl AleAhmad, University of Tehran
Hadi Amiri, University of Tehran
Ehsan Darrudi, University of Tehran
Masoud Rahgozar, University of Tehran
Farhad Oroumchian, University of Wollongong - Dubai Campus

Document Type

Journal Article

Abstract

The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the different nature of the Persian language compared to the other languages such as English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is the lack of a standard test collection. In this paper, we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgments are presented in this paper. We believe that this collection is the largest Persian text collection, so far.

RIS ID

30261

Download

COinS

ERA - University of Wollongong (restricted)

Title

Authors

Document Type

Abstract

RIS ID

Search

Browse

Author Corner

Links

ERA - University of Wollongong (restricted)

Title

Authors

Document Type

Abstract

RIS ID

Share

Search

Browse

Author Corner

Links