Thinking about Size, Scope, and Samples–And Music

My project is going well, but it has not been without a few pitfalls. Per usual, the process of actually doing the project precipitated a rethinking of a number of my design decisions. The issues all have to do with either size and scope or whether or not representative samples are important.

1. Size: I have decided to venture into the world of topic modeling. I think it, more so than any other mode of analysis, is essential to getting what I want out of  my analysis. The problem, though, is that I have read that topic modeling is best when used on an extremely large dataset. One manual-esqe style article suggested that one would need at least a thousand documents for topic modeling to really be effective. Needless to say, I do not have a thousand documents. Currently, I’ve cataloged only three hundred blues lyrics and three hundred old time lyrics, but because I am going to be comparing the two, they are mutually exclusive in terms of a combined document count. I could increase my collection of lyrics but doing so would take quite a bit of time and cause me to alter the projects scope.

2. Scope: I initially decided that I would restrict my search for lyrics to artist born prior to 1900, very much what I took to be the founding generation of commercially successful Southern musical artists. The thought behind this decision was if I could categorize artists based on generation, I could then potentially track change over time if I decided to expand my project forward. I now realize, however, that this logic has quite a few flaws. For one, what about the “tweeters?” And what I mean by that are those artist born right on the cusp of  1900. For instance, is Blind Willie McTell, born 1898, and Son House, born in 1902, really a part of two different generations just because their births fell on opposite sides of the century? In hindsight, I say no. Another problem is that I naively believed that those born after 1900 might not be a part of the same artistic community as those born prior to 1900. Of course, I now realize that age has little to do with whether or not an artist will become popular and when. The popular artists of the 1920s comprised of men and women of varying ages.

3. Sample: Another decision that I made, perhaps in too much haste, was to try to make my blues samples congruent with my old time samples. What I mean is that I wanted to have an equal number of  blues songs to old time songs from an equal number of artists. This endeavor, though, is next to impossible. The blues have been much more accessible than old time country. And naturally, some artists were much more popular than others, creating an imbalance between what will be accessible and what is not.

I am somewhat torn about what to do if I wanted to expand the project further than what is required in the class. I think that the project’s comparative nature is what is most important, but, at the same time, topic modeling both and doing it correctly would require a much larger set of data. Therefore, if I were to expand it, I am very tempted to just analyze one of the two genres. Doing so, though, would eliminate its comparative element.

Image URL: http://ccriderblues.com/wp-content/uploads/CC-Rider-Banner-2014-DRAFT-Mar5.jpg

Thinking About Space

Historyonic’s post “Place and the Politics of Past” hints at what I find to be the real value behind digital mapping and geo-referencing. Sadly, as the author admits, the technology is still not there yet, so to speak, to be able to capture this capability in its entirety. In fact, the very idea is almost too nebulous to pin down. I am thinking about “networks” and what they might mean for the historian.

The word itself has three definitions–As a noun, it’s an arrangement of intersecting horizontal and vertical lines or a group or system of interconnected people or things. As a verb, it’s to connect as or operate with a network. For our sake, I think of a network as a group or system of interconnected people or things, which is still rather ambiguous. But the ambiguity is, perhaps, a good thing for the historian because it suggests that nothing is out of reach.

Anyway, I find the concept of a “historical network” rather interesting and pertinent to my own project. I am attempting to text mine a database of blues and old time country lyrics in an effort to compare the two. In addition, I want to be able to create a database of place names, meaning various locations like states, cities, towns, and counties, and then build a heat map from that database. The goal is to be able to see which places registered the most “hits” and then create an imagined geography for both the Blues and Old Time music based on those locations.

From these place names, we can then pin people to them. For instance, we can tag each bluesman that sang about Memphis to the city, and we can do the same, to be impartial, for each old time artist that mentions Nashville or Atlanta. What we will then have is data allowing us to see how interconnected each genre’s artists were based on the places they sang about. Who knows, this interconnectivity may even cross genres, revealing that the U.S. South, at least in its music landscape, was much more integrated than we might think.

On a related note, “Toward Critical Spatial Thinking in the Social Sciences and the Humanities” brings up another good point about spatial projects and, in my mind at least, networks. The article suggests that rather than thinking “spatially,” we should really be thinking in”spatio-temporal” terms. As in, we should be thinking about and trying to capture not a static representation of space but a dynamic one, where change over time can be easily visualized and understood. Thinking in this way only enhances our understanding of historical networks. If we can project a supposed network over time and space, can we not then see at what points in time the network changes? For instance, drawing again from my own example, if we can situate “memphis’s blues network” over a timeline, can we not start to see changes in that network over time or across certain time periods, say decades or years? Then, we can start making broader historical conclusions about what caused or even what altered human interaction to space.

I am sure there are holes in what I am trying to do, and I am sure this type of project may be more than I can do in a semester. But thinking about both articles has helped me formulate just exactly what I want to do with my project and where my limits might be.

Image Url: http://stanford-history.github.io/Farman-week-5-discussion/