> For the complete documentation index, see [llms.txt](https://bcds.gitbook.io/learn/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://bcds.gitbook.io/learn/tutorials/text-analysis/introduction-to-text-analysis/considerations.md).

# Step 3: Important Considerations

When conducting a text analysis, it is important to keep in mind that:

**1.) Word meaning changes over time.**&#x20;

While it might be understood, it's important not to forget that word meaning changes. One can use a source like the Oxford English Dictionary to look up the particular meaning of a word at a particular time.&#x20;

**2.)** **The word context is key.**

In many, if not most, text analysis undertakings, word context is crucial to the analysis. Exceptions can occur when, for example, one is only interested in the number of times a word appears and not in the way the word is used.

**3.)** **There may be issues of omission in the corpus.**&#x20;

It's important to keep in mind when exploring or creating a corpus that there may be issues of omission.  People of color, women, and other marginalized groups have been published less throughout history and, therefore, a massive corpus--like Google Books or HathiTrust--will skew white and male. (Other areas of omission can be based on things like language, geography, time period, etc.) Moreover, it's important to consider what gets digitized. There can be (and no doubt is) bias in the decisions that drive the selection and funding of what ends up online.&#x20;

**4.) There can be quality issues with the corpus.**

Often texts used in text analysis come from books and documents that have been OCR'd. [OCR](https://en.wikipedia.org/wiki/Optical_character_recognition) (or optical character recognition) converts images of text into digital (machine-readable) text. Due to things like the quality of images and scanning mistakes, there can be OCR quality issues and, therefore, text errors.&#x20;

Below are two examples of how OCR errors can occur. The one on the left is from a first edition of the 18th-century novel, *The Life and Opinions of Tristram Shandy, Gentleman*. With books from this period,  you get characters such as the long s ( ſ ) and, often, ink bleed through, and foxing (all of the little dots that come from age) which can impact OCR. (These kinds of issues used to be much more of a factor before advancements in OCR technology.) The example on the right shows a scanning mistake made when the book was moved during the process. (Even with the advancements of technology, OCR issues are unavoidable in this case.) When working with a large or massive corpus, these kinds of errors might be inconsequential as long as there is a small enough number of them. With smaller corpora, such errors can have a greater impact and skew text analysis results.&#x20;

![](/files/-MTaxD-0wwJsCwuOEfVY)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://bcds.gitbook.io/learn/tutorials/text-analysis/introduction-to-text-analysis/considerations.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.