# Text Analysis Examples

### Humanities Example &#x20;

In this text analysis example, Ted Underwood and David Bamman used BookNLP, a Java-based natural language processing code, to explore gender in 93,708 English-language fiction volumes. They articulate one of their major discoveries as follows:&#x20;

> There is a clear decline from the nineteenth century (when women generally take up 40% or more of the “character space” in fiction) to the 1950s and 60s, when their prominence hovers around a low of 30%. A correction, beginning in the 1970s, almost restores fiction to its nineteenth-century state. (One way of thinking about this: second-wave feminism was a desperately-needed rescue operation.)

[**Visit their blog post to learn more about their methods and discoveries.**](https://tedunderwood.com/2016/12/28/the-gender-balance-of-fiction-1800-2007/)&#x20;

![A visualization from Underwood and Bamman's text analysis.](https://1449868658-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MRQTcbFY8LmyitpJ2uH%2F-MRyrDvVqJn_-b9jyFua%2F-MRz9HuINkHYPyYJ5Psv%2Ftextanalysis_victorian.jpg?alt=media\&token=c54df6ca-3406-42dd-969c-5ed832f3a6c2)

### Science Example

Here [CORD-19](https://allenai.org/data/cord-19), a database containing thousands of scholarly articles about COVID-19 and other related coronaviruses,  provides a [topic model](https://en.wikipedia.org/wiki/Topic_model) and visualization of 2437 journal articles. The approach they used, [latent Dirichlet allocation (LDA)](https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation), is a natural language processing based generative statistical model. &#x20;

[**Visit to interact with the visualization.**](https://dash-gallery.plotly.host/dash-cytoscape-lda/)

![The topics identified](https://1449868658-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MRQTcbFY8LmyitpJ2uH%2F-MRzR9yLObQ9zFcTb_5l%2F-MRzRdJAIyXWO2WhZIhG%2Ftextanalysis-topics_cord19.png?alt=media\&token=196a0003-ce6f-4b51-98b1-1f1c8b4e2d49)

![The visualization (color indicates the topic and the node size reflects the number of citations)](https://1449868658-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MRQTcbFY8LmyitpJ2uH%2F-MRzBmidO3bQV8gp8c_0%2F-MRzQDjAMNY4N76WwAXK%2Ftextanalysis_cord19.png?alt=media\&token=8fba1931-8c62-4356-834c-621e2fb4dae4)
