Step 5: Exercise Two

For this exercise, you will be given an imaginary research topic and questions.

The Premise

Imagine you are studying Frederick Douglass' rhetorical arguments against slavery and you notice a lot of mentions of family. This sparks your curiosity and makes you wonder: How Douglass evokes the idea of family in his arguments? How much and when does he mention it? What rhetorical purpose might these mentions serve? Does Douglass more often talk about family in the context of slaveholders or slaves?

You decide to focus on words like wife, mother, husband, father, child, baby, infant, family, and parent with the understanding that you can expand your list later.

Creating a Corpus

To get started, you need to create your corpus. You will be acquiring the text from Project Gutenberg, which has thousands of texts covering a range of genres and topics. (There are numerous other text sources, some of which can be found on this text repositories list.)

If you want to skip this part of the process and go straight to the Working in Voyant section, you can download the prepared text files below. (You will need to unzip the file to upload the individual text files to Voyant.)

1.) Finding the Texts

Go to Project Gutenberg and search "Frederick Douglass" (or see his works here). The results should look like the following:

From this list, Narrative of the Life of Frederick Douglass, an American Slave; My Bondage My Freedom; Abolition Fanaticism in New York; and Collected Articles of Frederick Douglass will be used.

2.) Scraping the Text

a.) Now you will "scrape" (or extract) the text from the site. This begins with getting the text URLs (web addresses). To do this in Project Gutenberg, you go to each text's landing page, select "Plain Text UTF-8, and copy the URL.

For Example, here is the Narrative of the Life of Frederick Douglass landing page:

And here is the URL that you get after clicking on Plain Text UTF-8:

For a shortcut, you can get all of the URLs here:

https://www.gutenberg.org/cache/epub/23/pg23.txt
https://www.gutenberg.org/files/202/202.txt
https://www.gutenberg.org/cache/epub/34915/pg34915.txt
https://www.gutenberg.org/cache/epub/99/pg99.txt

b.) Go to Lexos. Copy and paste the URLs into the "Scrape" box on the Lexos landing page (also the "Upload" page). Click the "Scrape" button. (It should only take a few seconds since the texts aren't that big.) When it is done, you will see the texts in the "Upload List" box.

3.) Preparing the Text

Now you will prepare the text, this means things like getting rid of punctuation, making all the text lower case, using lemmas, getting rid of tags, cutting up the text if one wants it to be divided into smaller units, and any other choices one might make.

a.) To prepare the text, click on "Prepare" in the top navigation menu (see image below) and then click on "Scrub." (A little bit below, you will find an explanation of the preparation choices being made.)

b.) On the Scrub page, you can make multiple decisions that will affect the text. It can be necessary to experiment with how you scrub your text. For now, select "Make Lowercase," "Remove Digits," "Scrub Tags," "Remove Punctuation," and "Keep Hyphens."

It should look like this:

Click "Apply."

c.) Now you are going to apply lemmas. It is important that you scrub the text before applying them. Doing so with the settings being used will get rid of the punctuation, which is necessary for some of the lemmas to work.

Cut and paste these lemmas into the "Lemmas" box:

slave-child:children
slave-mother:mother
slave:slaves
master:masters
wife, wifes:wives
husband's:husband
husband:husbands
child, childs:children
baby, babys:babies
infant:infants
mother:mothers
father's:father
father:fathers
parent:parents
family, familys:famlies

It should look like this:

Click "Apply."

"Make lowercase" made all characters lowercase. This choice is best when using case-sensitive tools, which treat capitalized and lowercase words differently. For example, in certain tools, words capitalized at the beginning of sentences are seen as being different from the same word appearing in lowercase within the sentence.

"Remove Digits" and "Scrub Tags" got rid of unnecessary digits or distracting HTML tags that might be in the text.

"Remove Punctuation" and "Keep Hyphens" got rid of punctuation that might impact the effectiveness of the lemmas being used but kept hyphenated words intact.

b.) When it is done, click the "Download" button, and a zip file should download to your computer. Find the file and open it. You should see a folder with individual text files. (They are likely in your download folder or on your desktop.)

c.) Open each file to look for text not related to Douglass's work, e.g., Project Gutenberg boilerplate information at the beginning (pictured below) and end of the text.

This is when you can also decide whether to delete paratextual information, e.g., introductions, prefaces, table of contents, indexes, etc. Choosing whether or not to keep this kind of information is part of the intellectual decision-making that goes into text analysis.

d.) When you are done. Save your files.

e.) Rename the files (by clicking on each file name) to clarify which file is which text. Below are the recommended names, "abolition," "articles," "bondage," and "narrative." They use the first keyword from each title. Once this is done, you are ready to upload your files to Voyant.

Working in Voyant

For this second half of the tutorial, you will be introduced to some of Voyant's functions that will help you explore the proposed research questions. Here again, is the premise and research question:

Imagine you are studying Frederick Douglass' rhetorical arguments against slavery and you notice mentions in various works that evoke the idea of family. This sparks your curiosity and makes you wonder: How Douglass evokes the idea of family in his arguments? How much and when does he mention family? What rhetorical purpose might these mentions serve? Does Douglass more often talk about the family in the context of slaveholders or slaves?

Going forward, you are encouraged to follow the various steps presented and to explore on your own.

1.) Upload the text Files

Launch a new Voyant instance, click on the "Upload" button, navigate to and select the edited Douglass files. When selecting the files, it should look something like this:

The results should look something like this:

2.) Explore The Interface

The particular tools and layout you initially see is called the "default skin." It displays the tools:

a.) Notice that there are other view options within each box. For example, "Cirrus" also has "Terms" and "Links." Take a moment to explore the tools and their various options.

b.) Notice that when you hover to the left of any of the question marks (even the one in the blue field at the very top right of the page) a toolbar of icons appears:

These provide access to a range of options and functionalities: The arrow icon [a] allows you to export a URL or embed code for a specific tool or the entire project. It also allows you to export images. The window icon [b] is where you go to change the tool to a different one. The switch icon [c] is where you go to define options for that specific tool. For example, it is where you go to add stopwords, create categories, and change fonts. The question mark provides information about that specific tool.

c.) Take a little time to explore this toolbar as it is key to using Voyant effectively, and we will be using it quite a bit below.

3.) Add Stopwords

Stopwords are words that you don't want to incorperate in the results you see in certain tools, i.e., Cirrus and Summary's "most frequent words in the corpus." When applying stopwords, you are not deleting them; they are just not visible.

Choosing stopwords is part of the intellectual decision-making that goes into text analysis. For example, in the context of the research question posed here, one could decide to stop the words "slavery," "masters," and "slaves" since those terms are pervasive and are understood to be there. Getting rid of words that are considered inconsequential, at least within the context of the research question (e.g., "like" and "mr"), can also be helpful. By stopping these words, other words will become more visible and might inspire new ideas and inform the analysis.

a.) To add stopwords, click on the switch or "define options" icon in the Cirrus tool (or in any other tool).

b.) Click "Edit List" next to the stopwords dropdown menu.

c.) Add the words, "slaves," "slavery," "masters," "mr," and "like," putting each one on a new line.

Save them and click "Confirm." You should see a change in the word cloud. If one of the words does not disappear, add it again. There could be a typo issue.

If you only wanted to apply stopwords in that particular tool, you would uncheck "apply globally."

Please note: When your text is first loaded in Voyant, the app automatically applies stopwords. You can turn this off by selecting "None" in the stopword dropdown menu. You can also remove stopwords from the list simply by deleting them, saving, and confirming the changes.

2.) Using White Lists

A white list is essentially the opposite of stopwords. It involves creating a list of words that you only want to see in the Cirrus results.

a.) To create a white list, click on the switch or "define options" icon in the Cirrus tool (or any other tool).

b.) Click "Edit List" next to the white list dropdown menu.

c.) Add the words: mothers, fathers, children, babies, infants, husbands, wives, families, parents, putting each one on a new line and save them and click "Confirm."

The results should look something like the below. (If one of the words does not disappear, add it again. There could be a typo issue.)

You can turn off the white list by selecting "None" in the white list dropdown menu. You can also remove words by deleting them from the list and resaving and confirming the changes.

3.) Creating Catagories

You can also create categories that group words. They can be applied in many but not all tools.

a.) To create a category, click the switch or "define options" icon in the Cirrus tool (or any other tool).

b.) Click the "Edit" next to the Categories dropdown menu.

When you open up Categories, you will see that Voyant has two default ones, "positive" and "negative." To add a new category, click "Add Category" [b] and give it a title such as "family." To add terms to that list, search for them in the search box [c], and when they appear in the Terms box [d], drag them to the new categories list. To remove a term from a list, select the word and then click "Remove Selected Term" [a].

c.) Create a new category using "family" as the name and add the terms "mothers," "fathers," "children," "babies," "infants," "husbands," "wives," "families," and "parents." (It should look like the image above.)

After creating your category, you can apply the category to a specific tool. In the tool's search box, search the list name with an @ at the beginning, for example, @family. (Do not leave a space between the @ and the name). The result will be only ones containing terms in that category.

d.) In the Contexts tool, search @family in the search box and explore the results:

4.) Changing Tools

The following will show you how to change out tools. (To learn about the many Voyant tools to choose from see the tools list.)

a.) In the Trends (or any other tool), click on the window or "choose tool" icon.

b.) Click on "Visualization" and then "MicroSearch" and the tool should appear.

As Voyant describes MicroSearch, "each document in the corpus is represented as a vertical block where the height of the block indicates the relative size of the document compared to others in the corpus. The location of occurrences of search terms is located as red blocks...Multiple search terms are collapsed together." In the tool, you search the term(s) you want to see appear in the visualization.

c.) Search "children" in the search box.

5.) Select & Modify Documents

You can choose individual texts that you want to visualize by selecting and deselecting them. For example, you can choose only to use Douglass' Narrative of a Slave and Articles.

a.) To select the texts, go to Summary (on the lower left) and click "Documents." Then select or deselect texts. Here is where you also can modify texts, meaning you can delete them and uploading new ones. To make these changes, click "Modify."

6.) Exporting (Sharing & Saving)

You can save or share an entire Voyant instance or individual tools by exporting the URL. (Exporting a tool launches that tool in its own window.)

To export the URL of an entire Voyant instance or a specific tool (see this Cirrus white list example), click on the arrow or "Export URL" icon in the very top right corner of the Voyant instance. Then click "Export," and a new window will launch. Copy and keep the URL for that window. To get the embed code, select the option "an HTML snippet for embedding this view..." and then click "Export." If you make changes to your project, you need to re-export to get a URL or embed code that reflects those changes.

Exporting the entire Voyant instance options:

Exporting specific tools options (notice that you can also export images. Select "export a PNG image..."):

You have completed Exercise Two and the tutorial. If you haven't already, now is a good time to explore the Voyant user's guide. You are also encouraged to experiment with texts that are part of your own research interests.

Last updated