言語コーパス

corpus, pl. corpora — cf. corps 「隊，班」, corpse 「死体」

「書かれたテキストないしは録音された発話の文字転写からなる言語データの集合体。特定の言語に関する記述の基礎として，またその言語に関する様々な仮説を検証するための手段として用いられる」

A collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language (David Crystal, A Dictionary of Linguistics and Phonetics, Blackwell, 3rd Edition, 1991.)

「自然な状態で話され，または書かれた言語テキストの集合体で，特定の言語のある時代の状態や社会的変種などの特徴を明らかにするサンプルとして用いられる」

A collection of naturally occurring language text, chosen to characterize a state or variety of a language. (John Sinclair, Corpus, Concordance, Collocation, OUP, 1991)

British National Corpus

The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written.