Text Analysis Examples

Humanities Example

In this text analysis example, Ted Underwood and David Bamman used BookNLP, a Java-based natural language processing code, to explore gender in 93,708 English-language fiction volumes. They articulate one of their major discoveries as follows:

There is a clear decline from the nineteenth century (when women generally take up 40% or more of the “character space” in fiction) to the 1950s and 60s, when their prominence hovers around a low of 30%. A correction, beginning in the 1970s, almost restores fiction to its nineteenth-century state. (One way of thinking about this: second-wave feminism was a desperately-needed rescue operation.)

Visit their blog post to learn more about their methods and discoveries.

A visualization from Underwood and Bamman's text analysis.

Science Example

Here CORD-19, a database containing thousands of scholarly articles about COVID-19 and other related coronaviruses, provides a topic model and visualization of 2437 journal articles. The approach they used, latent Dirichlet allocation (LDA), is a natural language processing based generative statistical model.

Visit to interact with the visualization.

The topics identified
The visualization (color indicates the topic and the node size reflects the number of citations)

Last updated