LogoLogo
  • About
  • Digital Scholarship
    • DS Methods Overview
      • ¶ Data Visualization
        • Basic Charts
        • Timeline
        • Treemap
        • Network
      • ¶ Mapping
        • GIS
        • Story Maps
        • Maps as Interface
      • ¶ 3D & Immersive Technologies
        • Augmented Reality & Virtual Reality
        • 3D Modeling & Laser Scanning
        • Immersive Games
        • 360 Degree Capturing
      • ¶ Digital Exhibits
        • Example Exhibits
      • ¶ Hypertext
        • Publishing & Presenting
        • Multimedia
        • Narratives & Games
      • ¶ Textual Encoding Initiative
        • What Does TEI Markup Look Like?
        • Facsimiles & Critical Editions
      • ¶ Text Analysis
        • Out of the Box vs Coding and Scripting
        • Text Analysis Examples
    • Introduction to Data
      • ¶ What is Data?
        • Structured & Unstructured Data
        • Quantitative & Qualitative Data
        • Humanities & Data
      • ¶ What is Data Visualization?
      • ¶ DS Data Projects
        • Getting Started Questions
        • Project Examples
        • Visualization Tools
      • ¶ Research Data Lifecycle
        • Data Management Best Practices
      • ¶ Glossary
    • Introduction to Mapping
      • ¶ What is Spatial Data?
      • ¶ Vector and Raster Data
        • Vector and Raster Data Examples
        • File Format Examples
      • ¶ Starting a Mapping Project
        • Getting Started Questions
        • Project Examples
        • Mapping Tools and Platforms
    • Introduction to Digital Exhibits
      • ¶ What is a Digital Exhibit?
        • Related Concepts
      • ¶ Starting a Digital Exhibit
      • ¶ Exhibit Examples
      • ¶ Platforms
  • Digital Pedagogy
    • ¶ What is Digital Pedagogy?
    • ¶ Considerations
    • ¶ Recommendations
    • ¶ Assignment Design
      • Learning Outcomes
      • Mode/Method/Tool Process
      • Assignment Examples
    • ¶ Evaluation
      • Assignment Criteria
    • ¶ Maintenance & Archiving
      • Recommended File Formats
  • Accessibility
  • Skills
  • Tools
Powered by GitBook
On this page
  • Out of the Box
  • Coding and Scripting
Export as PDF
  1. Digital Scholarship
  2. DS Methods Overview
  3. ¶ Text Analysis

Out of the Box vs Coding and Scripting

Previous¶ Text AnalysisNextText Analysis Examples

Last updated 4 years ago

Text analysis can be done using "out of the box" tools or coding and scripting with the latter approach enabling scholars to explore more nuanced research questions.

Out of the Box

Using "out of the box" tools, which don't require coding or scripting, is a good way to get started in text analysis as it will help users begin to understand possibilities and techniques. and are examples of such tools. (, used for , is an example of a tool that requires coding but also provides users with a lot of guidance and preexisting code.)

Here is a Voyant instance that contains all of Shakespeare's plays. like "thou" and "sir" have been applied to prevent them from dominating the results. (The selection of stopwords is part of the scholarly decision making that goes into text analysis.)

Coding and Scripting

Coding is an umbrella term that involves using coding (or programming) languages to do things like create applications and websites. Scripting falls under coding and involves using coding languages to do things like automate processes and make websites more dynamic. Coding and scripting are typically done using a computer's command line or platforms like Jupiter Notebooks.

import nltk
text = word_tokenize("And now for something completely different")
nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
wordstring = 'it was the best of times it was the worst of times '
wordstring += 'it was the age of wisdom it was the age of foolishness'

wordlist = wordstring.split()

wordfreq = []
for w in wordlist:
    wordfreq.append(wordlist.count(w))

print("String\n" + wordstring +"\n")
print("List\n" + str(wordlist) + "\n")
print("Frequencies\n" + str(wordfreq) + "\n")
print("Pairs\n" + str(list(zip(wordlist, wordfreq))))

String
it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness

List
['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was',
'the', 'worst', 'of', 'times', 'it', 'was', 'the', 'age',
'of', 'wisdom', 'it', 'was', 'the', 'age', 'of',
'foolishness']

Frequencies
[4, 4, 4, 1, 4, 2, 4, 4, 4, 1, 4, 2, 4, 4, 4, 2, 4, 1, 4,
4, 4, 2, 4, 1]

Pairs
[('it', 4), ('was', 4), ('the', 4), ('best', 1), ('of', 4),
('times', 2), ('it', 4), ('was', 4), ('the', 4),
('worst', 1), ('of', 4), ('times', 2), ('it', 4),
('was', 4), ('the', 4), ('age', 2), ('of', 4),
('wisdom', 1), ('it', 4), ('was', 4), ('the', 4),
('age', 2), ('of', 4), ('foolishness', 1)]

To get a sense of what coding and scripting look like in text analysis, here is a basic example from the , which uses the Python language. Here you can see a script being run that tags the parts of speech in the sentence, "And now for something completely different." (CC = coordinating conjunction, RB = adverb, IN = preposition, NN = noun, JJ=adjective. )

In this example from , you see a portion of a Python script used for counting word frequencies.

Natural Language Toolkit
Programming Historian
Voyant
Lexos
Mallet
topic modeling
Stopwords
Visit to interact with this Voyant instance.
Caption goes here