2. The creation of unannotated corpora is the initial stage in corpus development. We know that this initial stage has been already reached both in Finland and Estonia at least as far as their national languages are concerned. Finland's Kielipankki project boasts of a total of over 20 million words of Finnish written text as of February 2000 [2]. The Corpus of Estonian Literary Language (CELL) at the University of Tartu contains a total of c. 4.8 million words of Estonian text from between the 1890s and the 1990s [3].
Though plain text corpora of minor Baltic Finnic languages are yet to be created, corpus development in the Finnish and Estonian languages is now in its second stage: the creation of annotated corpora plus the development of corpus-linguistic tools with which linguists can make full use of the corpora in their research. done manually.