LogoLogo
  • About DS Learn
  • Tutorials
    • ¶ Digital Exhibits
      • Getting Started with Digital Exhibits
        • Considerations
        • Basic Steps
          • Site Organization
          • Usability & Accessibility
        • Platforms
    • ¶ Digital Storytelling
      • Introduction to ArcGIS StoryMaps
        • Getting Started
        • Using Content Blocks
        • Importing Maps from David Rumsey
      • Introduction to KnightLab StoryMap JS
      • TimelineJS
    • ¶ 3D Modeling & Immersive Technology
      • Adding 3D Models in Omeka
      • Intro to Photo Processing with Agisoft Metashape for 3D Model Making
      • Tips and Tricks for Taking Photos for 3D Model Creation
      • An Introduction to Apple's Reality Composer AR
      • Importing SketchFab Models into AR for the iPad or iPhone
      • Creating Basic 3D Objects for AR in Blender
      • Introduction to Meshlab
    • ¶ Data Visualization
      • Introduction to Tableau
        • Download and Install Tableau
        • Using Tableau to Visualize COVID-19 Data
        • Tableau DH
        • Resources
      • Beyond Simple Chart in Tableau
        • Beyond Simple chart Examples
      • Google Colab
        • Get Started
        • Data Import
        • Data Wangling
        • Visualization
        • Results Export
      • Out of Box Data Visualization Tools
        • How to use Google Data Studio with Google Sheets
        • Google Data Studio Interface
        • Creating Visualizations in Google Data Studio
    • ¶ Mapping
      • Tiling High-Resolution Images for Knightlab StoryMapJS
      • Hosting and Displaying Zoomable Images on Your Webpage
      • Georectifying Historical Maps using MapWarper
      • Making a Starter Map using Leaflet
    • ¶ REST API
      • How does REST API work?
      • JSON File
      • Get Started with Google Sheets Script Editor
      • Example 1: Extract Data by One Cell
      • Example 2: Extract Data by A Cell Range
    • ¶ Text Analysis
      • Introduction to Text Analysis
        • Step 1: Exercise One
        • Step 2: What is Text Analysis?
        • Step 3: Important Considerations
        • Step 4: Why Voyant and Lexos?
        • Step 5: Exercise Two
      • Text Repositories
      • Text Analysis in JSTOR
        • Overview of Constellate
        • Build A Dataset
        • Create A Stopwords List
        • Word Frequency
  • Digital Scholarship Incubator
    • Schedule
    • Getting Started
    • People
    • Project Guidelines
    • Topics
      • 3D Modeling and Immersive Technologies
        • Part 1: 3D Photogrammetry & Laser Scanning
          • Exercise: Experiment with 3D creation tools
        • Part 2: An Introduction to Apple's Reality Composer AR
          • Exercise: Experiment with Apple RealityComposer AR
      • Anatomy of a DS Project
        • Parts of a DS Project
        • Some DS Project Examples
        • Exercise: Evaluating a DS Project
      • Pedagogy
      • Data and Data Visualization
        • Introduction to Data
        • Introduction to Data Visualization
        • Introduction to Tableau
          • Download and Install Tableau
        • Introduction to Network Visualization
      • Digital Exhibits
        • Exercise 1: Exploring Exhibits
        • Exercise 2: Exhibit.so
      • DS Intro & Methodologies
      • User Experience
        • Usability Exercise
      • Mapping and GIS
        • An Introduction to Mapping, GIS and Vector Data
          • Workshop: Exploring and Creating Vector Data
          • Quick Review: Spatial Data
        • An Introduction to Raster Data and Georeferencing Historical Maps
          • Workshop: Finding and Georeferencing an Historical Map
          • Tutorial: Georectifying Historical Maps using MapWarper
        • Presentation + Workshop: Putting it together in ArcGIS Online
        • Workshop: A Brief Introduction to QGIS
          • Adding Base-maps and Raster Data
          • Adding and Creating Basic Vector Data
          • Styling your data and preparing it for exporting
      • Story Maps
        • Story Map Exercise
      • Text Analysis
        • Exercise 1: Voyant
        • Exercise 2: Python
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Tutorials
  2. ¶ Text Analysis
  3. Introduction to Text Analysis

Step 3: Important Considerations

PreviousStep 2: What is Text Analysis?NextStep 4: Why Voyant and Lexos?

Last updated 4 years ago

Was this helpful?

When conducting a text analysis, it is important to keep in mind that:

1.) Word meaning changes over time.

While it might be understood, it's important not to forget that word meaning changes. One can use a source like the Oxford English Dictionary to look up the particular meaning of a word at a particular time.

2.) The word context is key.

In many, if not most, text analysis undertakings, word context is crucial to the analysis. Exceptions can occur when, for example, one is only interested in the number of times a word appears and not in the way the word is used.

3.) There may be issues of omission in the corpus.

It's important to keep in mind when exploring or creating a corpus that there may be issues of omission. People of color, women, and other marginalized groups have been published less throughout history and, therefore, a massive corpus--like Google Books or HathiTrust--will skew white and male. (Other areas of omission can be based on things like language, geography, time period, etc.) Moreover, it's important to consider what gets digitized. There can be (and no doubt is) bias in the decisions that drive the selection and funding of what ends up online.

4.) There can be quality issues with the corpus.

Often texts used in text analysis come from books and documents that have been OCR'd. (or optical character recognition) converts images of text into digital (machine-readable) text. Due to things like the quality of images and scanning mistakes, there can be OCR quality issues and, therefore, text errors.

Below are two examples of how OCR errors can occur. The one on the left is from a first edition of the 18th-century novel, The Life and Opinions of Tristram Shandy, Gentleman. With books from this period, you get characters such as the long s ( ſ ) and, often, ink bleed through, and foxing (all of the little dots that come from age) which can impact OCR. (These kinds of issues used to be much more of a factor before advancements in OCR technology.) The example on the right shows a scanning mistake made when the book was moved during the process. (Even with the advancements of technology, OCR issues are unavoidable in this case.) When working with a large or massive corpus, these kinds of errors might be inconsequential as long as there is a small enough number of them. With smaller corpora, such errors can have a greater impact and skew text analysis results.

OCR