OLAC, the Open Language Archives Community, is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources. The catalog provides access to information about thousands of languages, including details of text collections, audio recordings, dictionaries, and software, sourced from dozens of digital and traditional archives.
Website with free partial access to different corpora including: News on the Web (NOW), Global Web-Based English (GloWbE), Wikipedia Corpus, Hansard Corpus (British Parliament), Early English Books Online, Corpus of Contemporary American English (COCA), Corpus of Historical American English (COHA), Corpus of US Supreme Court Opinions, TIME Magazine Corpus, Corpus of American Soap Operas, British National Corpus (BYU_BNC), Strathy Corpus (Canada), CORE Corpus, Corpus del Español, Corpus do Português, and Google Books corpora in American English, British English, and Spanish.
A large structured corpus of historical English, contains more than 400 million words of text from the 1810s-2000s. Reed has purchased and downloaded the full text of COHA, contact the Data Services Librarian to learn how to access the data.
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories that creates and distributes a wide array of language resources including corpora.
PHOIBLE (Phonetics Information Base and Lexicon) Online is a repository of cross-linguistic phonological inventory data, which have been extracted from source documents and tertiary databases and compiled into a single searchable convenience sample.
Shared databases of recordings and coded transcripts within subfields studying communication, including aphasia, audiology, bilingualism, Child Language Data Exchange System (CHILDES), conversational analysis, dementia, phonological and phonetic analysis, second language acquisition, and traumatic brain injury.