arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Text Repositories

The following resources provide text for text analysis projects.

  • Internet Archive Booksarrow-up-right (includes plain-text [“full text”] access to books, issues of magazines, etc.)

  • Early English Books Online (EEBO)arrow-up-right (BC library resource)

  • (large number of texts available in variety of forms, including plain text; texts are accessed one at a time)

  • (16 million volumes, mostly in English)

  • (12.8 million pages of American newspapers)

  • (narratives & literature from the American South)

  • (large collection of classical texts, much of it encoded in TEI/XML)

  • (ca. 50,000 early English books, many encoded in TEI/XML)

  • (197,745 London criminal trials, 1674-1913)

  • (debates & journals of the Canadian Senate & House of Commons)

  • (Parliamentary debates, 1901-1980)

  • (UK Parliamentary debates)

  • (see also ; 10,000 premodern Islamicate texts)

  • and (efforts to use computer vision to recognize handwriting)

  • (557 classical texts linked with a gazetteer of the ancient world)

  • (widely used corpora of American English)

  • (American adult fiction, 1774–1900)

  • (170K hours of captioned news programs; see for information on access)

  • (nearly 2 million pages of media-related books and articles, 1875-1995)

  • (classic Christian texts)

  • (1.8 million NYT articles + NYT-supplied metadata)

  • (many datasets from European libraries & archives, from papyri to photographs to newspapers)

  • (nearly complete run of Foreign Relations of the United States; see to obtain full text)

  • (a huge collection of websites, texts, audio, and other media, available for bulk download via wget)

  • (a catalog of Twitter datasets that are publicly available on the web)

  • (an effort to develop tools to analyze features of digital texts)

  • (“220,579 conversational exchanges between 10,292 pairs of movie characters”)

  • (repository of life sciences books, articles, and preprints)

  • (565 million documents collected by the National Library of Australia, including a sizeable collection of newspapers)

  • (4 million-word sub corpus of the 100 million-word British National Corpus, with parts-of-speech tagging in XML)

TEI-Encoded

  • (BC library resource)

Resources from

  • Early Caribbean Digital Archive (ECDA)arrow-up-right
    Oxford Text Archivearrow-up-right
    Project Gutenbergarrow-up-right
    HATHITrustarrow-up-right
    Chronicling Americaarrow-up-right
    DocSouth Dataarrow-up-right
    Perseus Digital Libraryarrow-up-right
    EEBO-TCParrow-up-right
    Old Bailey Onlinearrow-up-right
    Canadian Hansardarrow-up-right
    Australian Hansardarrow-up-right
    UK Hansardarrow-up-right
    Open Islamicate Texts Initiativearrow-up-right
    repositoriesarrow-up-right
    Transkribus Corpusarrow-up-right
    READarrow-up-right
    ToposTextarrow-up-right
    BYU Corporaarrow-up-right
    Wright American Fictionarrow-up-right
    UCLA Broadcast NewsScapearrow-up-right
    Red Hen Labarrow-up-right
    Media History Digital Libraryarrow-up-right
    Christian Classics Ethereal Libraryarrow-up-right
    NYT Annotated Corpusarrow-up-right
    Europeana Collectionsarrow-up-right
    Foreign Records of the USarrow-up-right
    these toolsarrow-up-right
    Internet Archivearrow-up-right
    Twitter Datasetsarrow-up-right
    BitCuratorarrow-up-right
    Movie Quotes Corpusarrow-up-right
    Europe PMCarrow-up-right
    Trove Australiaarrow-up-right
    BNC-Babyarrow-up-right
    Women Writers Onlinearrow-up-right
    Eighteenth Century Collections Onlinearrow-up-right
    Documenting the American Southarrow-up-right
    Laura Nelson’s “Analyzing Complex Digitized Data”arrow-up-right
    Demonstration Corpora, by Alan Liuarrow-up-right