LogoLogo
  • About DS Learn
  • Tutorials
    • ¶ Digital Exhibits
      • Getting Started with Digital Exhibits
        • Considerations
        • Basic Steps
          • Site Organization
          • Usability & Accessibility
        • Platforms
    • ¶ Digital Storytelling
      • Introduction to ArcGIS StoryMaps
        • Getting Started
        • Using Content Blocks
        • Importing Maps from David Rumsey
      • Introduction to KnightLab StoryMap JS
      • TimelineJS
    • ¶ 3D Modeling & Immersive Technology
      • Adding 3D Models in Omeka
      • Intro to Photo Processing with Agisoft Metashape for 3D Model Making
      • Tips and Tricks for Taking Photos for 3D Model Creation
      • An Introduction to Apple's Reality Composer AR
      • Importing SketchFab Models into AR for the iPad or iPhone
      • Creating Basic 3D Objects for AR in Blender
      • Introduction to Meshlab
    • ¶ Data Visualization
      • Introduction to Tableau
        • Download and Install Tableau
        • Using Tableau to Visualize COVID-19 Data
        • Tableau DH
        • Resources
      • Beyond Simple Chart in Tableau
        • Beyond Simple chart Examples
      • Google Colab
        • Get Started
        • Data Import
        • Data Wangling
        • Visualization
        • Results Export
      • Out of Box Data Visualization Tools
        • How to use Google Data Studio with Google Sheets
        • Google Data Studio Interface
        • Creating Visualizations in Google Data Studio
    • ¶ Mapping
      • Tiling High-Resolution Images for Knightlab StoryMapJS
      • Hosting and Displaying Zoomable Images on Your Webpage
      • Georectifying Historical Maps using MapWarper
      • Making a Starter Map using Leaflet
    • ¶ REST API
      • How does REST API work?
      • JSON File
      • Get Started with Google Sheets Script Editor
      • Example 1: Extract Data by One Cell
      • Example 2: Extract Data by A Cell Range
    • ¶ Text Analysis
      • Introduction to Text Analysis
        • Step 1: Exercise One
        • Step 2: What is Text Analysis?
        • Step 3: Important Considerations
        • Step 4: Why Voyant and Lexos?
        • Step 5: Exercise Two
      • Text Repositories
      • Text Analysis in JSTOR
        • Overview of Constellate
        • Build A Dataset
        • Create A Stopwords List
        • Word Frequency
  • Digital Scholarship Incubator
    • Schedule
    • Getting Started
    • People
    • Project Guidelines
    • Topics
      • 3D Modeling and Immersive Technologies
        • Part 1: 3D Photogrammetry & Laser Scanning
          • Exercise: Experiment with 3D creation tools
        • Part 2: An Introduction to Apple's Reality Composer AR
          • Exercise: Experiment with Apple RealityComposer AR
      • Anatomy of a DS Project
        • Parts of a DS Project
        • Some DS Project Examples
        • Exercise: Evaluating a DS Project
      • Pedagogy
      • Data and Data Visualization
        • Introduction to Data
        • Introduction to Data Visualization
        • Introduction to Tableau
          • Download and Install Tableau
        • Introduction to Network Visualization
      • Digital Exhibits
        • Exercise 1: Exploring Exhibits
        • Exercise 2: Exhibit.so
      • DS Intro & Methodologies
      • User Experience
        • Usability Exercise
      • Mapping and GIS
        • An Introduction to Mapping, GIS and Vector Data
          • Workshop: Exploring and Creating Vector Data
          • Quick Review: Spatial Data
        • An Introduction to Raster Data and Georeferencing Historical Maps
          • Workshop: Finding and Georeferencing an Historical Map
          • Tutorial: Georectifying Historical Maps using MapWarper
        • Presentation + Workshop: Putting it together in ArcGIS Online
        • Workshop: A Brief Introduction to QGIS
          • Adding Base-maps and Raster Data
          • Adding and Creating Basic Vector Data
          • Styling your data and preparing it for exporting
      • Story Maps
        • Story Map Exercise
      • Text Analysis
        • Exercise 1: Voyant
        • Exercise 2: Python
Powered by GitBook
On this page

Was this helpful?

Export as PDF
  1. Tutorials
  2. ¶ Text Analysis
  3. Text Analysis in JSTOR

Word Frequency

This notebook finds the word frequencies for a dataset.

PreviousCreate A Stopwords ListNextDigital Scholarship Incubator

Last updated 4 years ago

Was this helpful?

Explore word frequency of your own extracted data

Create a bar chart for the 20 most frequently used words

import matplotlib.pyplot as plt 

a = transformed_word_frequency.most_common(20)
bar_values = list(list(zip(*a)))

x_val = list(bar_values[0])
y_val = list(bar_values[1])

plt.figure(figsize=(12,8))    #Customize plot size
plt.barh(x_val, y_val, color='blue',height=0.3)
plt.xlabel("Word Counts")
plt.gca().invert_yaxis()

Create a wordcloud chart for the extracted text data

Modify 4 Find Word Frequencies by:

#4 Find Word Frequencies
word_str = " "

# from collections import Counter

# # Hold our word counts in a Counter Object
# transformed_word_frequency = Counter()

# # Apply filter list
# for document in tdm_client.dataset_reader(dataset_file):
#     if use_filtered_list is True:
#         document_id = document['id']
#         # Skip documents not in our filtered_id_list
#         if document_id not in filtered_id_list:
#             continue
#     unigrams = document.get("unigramCount", [])
#     for gram, count in unigrams.items():
#         clean_gram = gram.lower() # Lowercase the unigram
        word_str += " " + clean_gram  #Added: string of all words
#         if clean_gram in stop_words: # Remove unigrams from stop words
#             continue
#         if not clean_gram.isalpha(): # Remove unigrams that are not alphanumeric
#             continue
#         transformed_word_frequency[clean_gram] += count
#Install wordcloud
pip install wordcloud
#Install matplotlib for word plot cloud
from wordcloud import WordCloud, STOPWORDS 
import matplotlib.pyplot as plt 
#Added: plot word cloud
wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = stop_words, 
                min_font_size = 10).generate(word_str) 
  
# plot the WordCloud image                        
plt.figure(figsize = (8, 8), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
plt.show() 
Research Notebook: Exploring Word Frequencies for Research