Text Analysis of Carver’s Travels Through the Interior Parts of North America

Text Analysis of Carver’s Travels Through the Interior Parts of North America

This project is the midterm exam for DGAH 110: Hacking the Humanities.

This project is dedicated to demonstrating my ability to use technologies learned in class and conducting text analysis on a designated novel, Travels Through the Interior Parts of North America, in the Years 1766, 1767, and 1768, by Jonathan Carver. The text document is obtained from Project Gutenberg, a library of over 60,000 free eBooks.

In the process of obtaining the text document from Gutenberg, I used Gutenberg’s package in R, gutenbergr, to download and clean up the text data.

# load library
library(gutenbergr)

# download book by book id
travels_crude <- gutenberg_download(49753) 

# turn text into data frame
travels_crude <- travels_crude %>% 
  mutate(linenumber=row_number()) %>%  
  unnest_tokens(word, text)

The gutenbergr library has a built-in data frame of stop words. I took the word “great” out of the stop word data frame because its word count is so large that we may gain some insights by looking more closely at it. I then removed all the stop words and punctuation from the data frame.

# remove stop words
travels_nostop <- travels_crude %>% 
  anti_join(stop_words_modified, by="word")

# remove punctuation
travels_final <- travels_nostop %>% 
  filter(!str_detect(word, "[:punct:]"))

I created two data files with the cleaned up text data.

  1. A CSV file containing each word in the text and its word count, which is used for generating a word cloud in Flourish.
  2. A TXT file containing all of the words from the cleaned up text data, which is used for text analysis in Voyant.

Here is a summary of the text analysis from Voyant:

Total Words39,391
Unique Word Forms8,308
Vocabulary Density0.211
Readability Index23.004
Most Frequent Words in the Corpusindians (364)
great (334)
lake (271)
river (229)
time (183)
Text Analysis Summary Table

A word cloud is an extremely useful tool to visualize word frequency in a source text. I created a word cloud of the top 50 most frequently used words in Travels Through the Interior Parts of North America. It clearly demonstrates some of the major themes of this journal including the indigenous Naudowessies people, the description of the water bodies, the plantation, and the wildlife of the territory, the measurement of the landscape, and his interactions with the French.

Trend lines of relative word frequencies can also give us insights into the content of the text. The following plot displays the relative word frequency of the five most common words in the text. One can interact with the trend lines by clicking on the lines and filtering or searching for specific words in the filter section. Based on the plot, Carver focuses more on writing about the landscape in the first half of the journal than the second half of the journal as the relative frequency of “great”, “lake”, and “river” peak around segment 2 and 4 and decrease drastically afterwards. The relative frequency of “indians” appears to be the highest among the five between segment 5 and 7, and we can infer that Carver spends more paragraphs describing the indigenous people in those segments. At the end of the journal, Carver seems to briefly shift away from the landscape and the people before mentioning them again. It would probably be interesting to examine what else Carver is writing about during segment 8 and 9.

Exploring the links among words can further provide us insightful information about the text collocation. For example, when we look at the word “lake” and its links, we can find out that “Lake Superior” is a common collocation in the text. Thus, we have the confidence to say that Carver specifically writes about Lake Superior a lot more than other lakes in his journal.

I wanted to further investigate the connections between words in order to gain more insights into Carver’s travels. I found the word tree to be very helpful in terms of understanding how certain key words are used in the journal. For instance, the word tree shows that the word ”indians” is used with ”naudowessie” very often which suggests that the Naudowessie people is an important subject in the text. I also used the word tree to learn about how Carver uses the word ”great” in his journal. I initially thought that Carver is using it to describe how great the landscape is or how great the life is in America, but it turns out that he is often using it in the context of quantitative measurement as the words associated with ”great” are “abundance”, ”plenty”, and ”lakes”.

Finally, I used dream scape to visualize Carver’s travels by examining the location names in the journal. I did some cleaning to remove all the falsely identified locations. The map below displays how frequent a city name is mentioned and it connects with other cities. As shown, Green Bay is the most frequently mentioned city in the text, followed by Pontiac and Detroit. I am not sure whether the animation on this map accurately reflects Carver’s journey, and I cannot seem to get rid of it.

Doing text analysis definitely gives me an overall sense of Carver’s travels through the interior parts of North America in the late 1760s. Now I know that he mainly explored around the Green Bay Area and spent a large amount of time documenting the lakes, rivers, and the life of indigenous people without actually reading the book.

However, there are certainly limitations in these text analysis. For example, I was intrigued by the fact that words like ”war” and ”enemies” are in the top 50 most frequently used words in the journal. I was using links and word tree to try to figure out what kind of rivalry Carver is referring to here. Is it English vs. French, English vs. indigenous people, tribal conflicts among the indigenous people, or all of the above? So far, I have not found significant evidence to support any of these possibilities. Further analysis is required to make a conclusion.


2 responses to “Text Analysis of Carver’s Travels Through the Interior Parts of North America”

Leave a Reply

Your email address will not be published.

css.php