LogoLogo
  • About DS Learn
  • Tutorials
    • ¶ Digital Exhibits
      • Getting Started with Digital Exhibits
        • Considerations
        • Basic Steps
          • Site Organization
          • Usability & Accessibility
        • Platforms
    • ¶ Digital Storytelling
      • Introduction to ArcGIS StoryMaps
        • Getting Started
        • Using Content Blocks
        • Importing Maps from David Rumsey
      • Introduction to KnightLab StoryMap JS
      • TimelineJS
    • ¶ 3D Modeling & Immersive Technology
      • Adding 3D Models in Omeka
      • Intro to Photo Processing with Agisoft Metashape for 3D Model Making
      • Tips and Tricks for Taking Photos for 3D Model Creation
      • An Introduction to Apple's Reality Composer AR
      • Importing SketchFab Models into AR for the iPad or iPhone
      • Creating Basic 3D Objects for AR in Blender
      • Introduction to Meshlab
    • ¶ Data Visualization
      • Introduction to Tableau
        • Download and Install Tableau
        • Using Tableau to Visualize COVID-19 Data
        • Tableau DH
        • Resources
      • Beyond Simple Chart in Tableau
        • Beyond Simple chart Examples
      • Google Colab
        • Get Started
        • Data Import
        • Data Wangling
        • Visualization
        • Results Export
      • Out of Box Data Visualization Tools
        • How to use Google Data Studio with Google Sheets
        • Google Data Studio Interface
        • Creating Visualizations in Google Data Studio
    • ¶ Mapping
      • Tiling High-Resolution Images for Knightlab StoryMapJS
      • Hosting and Displaying Zoomable Images on Your Webpage
      • Georectifying Historical Maps using MapWarper
      • Making a Starter Map using Leaflet
    • ¶ REST API
      • How does REST API work?
      • JSON File
      • Get Started with Google Sheets Script Editor
      • Example 1: Extract Data by One Cell
      • Example 2: Extract Data by A Cell Range
    • ¶ Text Analysis
      • Introduction to Text Analysis
        • Step 1: Exercise One
        • Step 2: What is Text Analysis?
        • Step 3: Important Considerations
        • Step 4: Why Voyant and Lexos?
        • Step 5: Exercise Two
      • Text Repositories
      • Text Analysis in JSTOR
        • Overview of Constellate
        • Build A Dataset
        • Create A Stopwords List
        • Word Frequency
  • Digital Scholarship Incubator
    • Schedule
    • Getting Started
    • People
    • Project Guidelines
    • Topics
      • 3D Modeling and Immersive Technologies
        • Part 1: 3D Photogrammetry & Laser Scanning
          • Exercise: Experiment with 3D creation tools
        • Part 2: An Introduction to Apple's Reality Composer AR
          • Exercise: Experiment with Apple RealityComposer AR
      • Anatomy of a DS Project
        • Parts of a DS Project
        • Some DS Project Examples
        • Exercise: Evaluating a DS Project
      • Pedagogy
      • Data and Data Visualization
        • Introduction to Data
        • Introduction to Data Visualization
        • Introduction to Tableau
          • Download and Install Tableau
        • Introduction to Network Visualization
      • Digital Exhibits
        • Exercise 1: Exploring Exhibits
        • Exercise 2: Exhibit.so
      • DS Intro & Methodologies
      • User Experience
        • Usability Exercise
      • Mapping and GIS
        • An Introduction to Mapping, GIS and Vector Data
          • Workshop: Exploring and Creating Vector Data
          • Quick Review: Spatial Data
        • An Introduction to Raster Data and Georeferencing Historical Maps
          • Workshop: Finding and Georeferencing an Historical Map
          • Tutorial: Georectifying Historical Maps using MapWarper
        • Presentation + Workshop: Putting it together in ArcGIS Online
        • Workshop: A Brief Introduction to QGIS
          • Adding Base-maps and Raster Data
          • Adding and Creating Basic Vector Data
          • Styling your data and preparing it for exporting
      • Story Maps
        • Story Map Exercise
      • Text Analysis
        • Exercise 1: Voyant
        • Exercise 2: Python
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Tutorials
  2. ¶ Text Analysis

Text Repositories

PreviousStep 5: Exercise TwoNextText Analysis in JSTOR

Last updated 4 years ago

Was this helpful?

The following resources provide text for text analysis projects.

  • (includes plain-text [“full text”] access to books, issues of magazines, etc.)

  • (BC library resource)

  • (large number of texts available in variety of forms, including plain text; texts are accessed one at a time)

  • (16 million volumes, mostly in English)

  • (12.8 million pages of American newspapers)

  • (narratives & literature from the American South)

  • (large collection of classical texts, much of it encoded in TEI/XML)

  • (ca. 50,000 early English books, many encoded in TEI/XML)

  • (197,745 London criminal trials, 1674-1913)

  • (debates & journals of the Canadian Senate & House of Commons)

  • (Parliamentary debates, 1901-1980)

  • (UK Parliamentary debates)

  • (see also ; 10,000 premodern Islamicate texts)

  • and (efforts to use computer vision to recognize handwriting)

  • (557 classical texts linked with a gazetteer of the ancient world)

  • (widely used corpora of American English)

  • (American adult fiction, 1774–1900)

  • (170K hours of captioned news programs; see for information on access)

  • (nearly 2 million pages of media-related books and articles, 1875-1995)

  • (classic Christian texts)

  • (1.8 million NYT articles + NYT-supplied metadata)

  • (many datasets from European libraries & archives, from papyri to photographs to newspapers)

  • (nearly complete run of Foreign Relations of the United States; see to obtain full text)

  • (a huge collection of websites, texts, audio, and other media, available for bulk download via wget)

  • (a catalog of Twitter datasets that are publicly available on the web)

  • (an effort to develop tools to analyze features of digital texts)

  • (“220,579 conversational exchanges between 10,292 pairs of movie characters”)

  • (repository of life sciences books, articles, and preprints)

  • (565 million documents collected by the National Library of Australia, including a sizeable collection of newspapers)

  • (4 million-word sub corpus of the 100 million-word British National Corpus, with parts-of-speech tagging in XML)

TEI-Encoded

(BC library resource)

Resources from

Internet Archive Books
Early English Books Online (EEBO)
Early Caribbean Digital Archive (ECDA)
Oxford Text Archive
Project Gutenberg
HATHITrust
Chronicling America
DocSouth Data
Perseus Digital Library
EEBO-TCP
Old Bailey Online
Canadian Hansard
Australian Hansard
UK Hansard
Open Islamicate Texts Initiative
repositories
Transkribus Corpus
READ
ToposText
BYU Corpora
Wright American Fiction
UCLA Broadcast NewsScape
Red Hen Lab
Media History Digital Library
Christian Classics Ethereal Library
NYT Annotated Corpus
Europeana Collections
Foreign Records of the US
these tools
Internet Archive
Twitter Datasets
BitCurator
Movie Quotes Corpus
Europe PMC
Trove Australia
BNC-Baby
Women Writers Online
Eighteenth Century Collections Online
Documenting the American South
Laura Nelson’s “Analyzing Complex Digitized Data”
Demonstration Corpora, by Alan Liu