Publication Details

Aleahmad, A., Amiri, H., Rahgozar, M. & Oroumchian, F. 2009, Hamshahri: A standard Persian Text Collection, Knowledge-Based Systems, vol. 22, no. 5, pp. 382-387.


The Persian language is one of the dominant languages in the Middle East, so there are significant amount of Persian documents available on the Web. Due to the special and different nature of the Persian language compared to other languages like English, the design of information retrieval systems in Persian requires special considerations. However, there are relatively few studies on retrieval of Persian documents in the literature and one of the main reasons is lack of a standard test collection. In this paper we introduce a standard Persian text collection, named Hamshahri, which is built from a large number of newspaper articles according to TREC specifications. Furthermore, statistical information about documents, queries and their relevance judgment are presented in this paper. We believe that this collection is the largest Persian text collection, so far.



Link to publisher version (DOI)