The Difficulties of Text Mining: The Southern Sounds Evaluation

My digital project, Southern Sounds: A comparative Study of Early Old Time and Blues Music, did not turn out as I had hoped. The source of the problem was an issue I wrote about at length a few posts back: my project was just not big enough. The softwares that I used, Voyent tools and Antconc, were excellent. I could see myself using them in future projects. But they are designed to handle large amounts of texts, upwards of 1,000 DOCUMENTS! My project consisted of only 600 songs separated into two categories of 300 songs a piece. Needless to say, I fell very short of the expected document number. As a result, I had skewed results. When trying to cluster words in context, the softwares produced a number of interesting clusters, but they did so with a very small range. In lay mans terms, the clusters were only being generated from one or two documents, meaning that particular songs were throwing off my results.

I did though make a really cool map of all the artists whose lyrics I used. You can view the map by clicking here. The site for the entire project can be accessed here.


Image Url: http://www.nunwood.com/engineering-techniques-identifying-root-causes-poor-customer-experience-text-analysis/

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>