What corpora are available?



Publication Details

Lee, D. Y. W.. 2010, 'What corpora are available?', in M. McCarthy & A. O'Keeffe (eds), The Routledge Handbook of Corpus Linguistics, Routledge, Abingdon. pp. 107


Given how rapidly new electronic corpora come into existence, a chapter such as this may, on the face of it, seem to run the risk of going out of date quite quickly. However, there is a case to be made for having a general survey of currently available corpora: it will help give newcomers to the field a quick overview of what is available (the corpus universe) as well as an understanding of the differences that can be found between one corpus and another (the different stars and constellations withing the universe and their distinguishing characteristics). Many types of corpora for a huge variety of languages have sprung up all over the world (particularly in the last decade, in the case of languages other than English), and the trend looks set to continue apace largely because research withing the corpus paradigm has proven so fruitful. The bird's-eye view if currently available corpora in this chapter will provide a launching pad for those seeking out resources for research or pedagogical applications, but it is not intended to be exhaustive. This chapter is restricted to the description of 'ready-made' corpora - i.e. specially organised collections of text that are called copora by the creators. There are many text-collection or text library sites such Gutenberg or American Rhetoric where you may download individual books, plays, speeches, interviews, and so forth, to create your won personal corpus. Such electronic text libraries are certainly worth visiting, but are not the focus of this chapter.

