Linguistics
WHAT IS A LINGUISTIC CORPUS?
A linguistic corpus is a collection of language and linguistic data which has been collected for the purpose of analysis. A corpus is useful in many ways; it enables researchers to formulate hypotheses about the workings of language as well as provides statistics and metrics to reinforce theories and research.
Corpus linguistics refers to a field of study that analyzes naturally-occurring language structure and use through the collection of samples of spoken or written language.
RESOURCES
- Online Corpora of English, Spanish, and PortugueseA collection of corpora created by Professor Mark Davies at Brigham Young University. Each corpus provides information on how native speakers actually speak and write, language variation, bibliometrics, and the design of language teaching material and resources. The site provides direct access to the Corpus of Contemporary American English (COCA) and NGram viewers.
- British National CorpusThe British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written.
- Brown University Corpus of American EnglishCompiled in the 1960's, the Brown Corpus was the first computerized corpus in existence.
- French Language Corpus CollectionThis is a collection of French language corpora. It includes spoken, written and a variety of other more specialized corpuses. (Website is in French)
- International Corpus of EnglishThe International Corpus of English (ICE) began in 1990 with the primary aim of collecting material for comparative studies of English worldwide. Twenty-four research teams around the world are preparing electronic corpora of their own national or regional variety of English. Each ICE corpus consists of one million words of spoken and written English produced after 1989. For most participating countries, the ICE project is stimulating the first systematic investigation of the national variety.
- Centre for English Corpus LinguisticsThe Centre for English Corpus Linguistics (CECL) specializes in the collection and use of corpora for linguistic and pedagogical purposes.
- Corpus of Contemporary American English (COCA)This corpus of 450 million words of American English is maintained by Brigham Young University.
- Survey of English UsageThe Survey of English Usage carries out research in English language Corpus Linguistics, and was the first centre in Europe to undertake this type of research. From its inception in 1959, the Survey collected samples of naturally-occurring language for the purposes of description and analysis.
- The Speech Accent ArchiveThe speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same paragraph and are carefully transcribed. The archive is used by people who wish to compare and analyze the accents of different English speakers.
- Last Updated: Jun 1, 2022 4:56 PM
- URL: https://guides.library.umass.edu/linguistics
- Print Page