# Word Frequency

#### [Research Notebook:  Exploring Word Frequencies for Research](https://docs.tdm-pilot.org/exploring-word-frequencies-for-research/)

#### Explore word frequency of your own extracted data

![](/files/-MUZH10m0lq5lo5aGx6n)

#### Create a bar chart for the 20 most frequently used words

```python
import matplotlib.pyplot as plt 

a = transformed_word_frequency.most_common(20)
bar_values = list(list(zip(*a)))

x_val = list(bar_values[0])
y_val = list(bar_values[1])

plt.figure(figsize=(12,8))    #Customize plot size
plt.barh(x_val, y_val, color='blue',height=0.3)
plt.xlabel("Word Counts")
plt.gca().invert_yaxis()

```

![](/files/-MUZUbuly_WP1V4QAfVS)

#### Create a wordcloud chart for the extracted text data

#### Modify 4 Find Word Frequencies by:

```python
#4 Find Word Frequencies
word_str = " "

# from collections import Counter

# # Hold our word counts in a Counter Object
# transformed_word_frequency = Counter()

# # Apply filter list
# for document in tdm_client.dataset_reader(dataset_file):
#     if use_filtered_list is True:
#         document_id = document['id']
#         # Skip documents not in our filtered_id_list
#         if document_id not in filtered_id_list:
#             continue
#     unigrams = document.get("unigramCount", [])
#     for gram, count in unigrams.items():
#         clean_gram = gram.lower() # Lowercase the unigram
        word_str += " " + clean_gram  #Added: string of all words
#         if clean_gram in stop_words: # Remove unigrams from stop words
#             continue
#         if not clean_gram.isalpha(): # Remove unigrams that are not alphanumeric
#             continue
#         transformed_word_frequency[clean_gram] += count
```

```python
#Install wordcloud
pip install wordcloud
```

```python
#Install matplotlib for word plot cloud
from wordcloud import WordCloud, STOPWORDS 
import matplotlib.pyplot as plt 
```

```python
#Added: plot word cloud
wordcloud = WordCloud(width = 800, height = 800, 
                background_color ='white', 
                stopwords = stop_words, 
                min_font_size = 10).generate(word_str) 
  
# plot the WordCloud image                        
plt.figure(figsize = (8, 8), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
plt.show() 
```

![](/files/-MUZZvxSPM7buX9eh6z6)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bcds.gitbook.io/learn/tutorials/text-analysis/text-analysis-in-jstor/work-frequency.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
