View on GitHub

chrisgilmerproj

import chris; chris.blog()

Download this project as a .zip file Download this project as a tar.gz file

Words and Numbers

11 Jun 2009

At work my friend Rob and I have spent some time over lunch talking about analyzing words numerically.  Our thought was that you could use words, connect them to other words via a thesaurus, and show their relationships visually.  Why do this?  Entirely because it's interesting. This all began with a discussion about how you could analyze books to discover if they would be best sellers or not.  By connecting words to other words it might be possible to gain insight on themes, moods, and even context.  These attributes of a book might show patterns that could be statistically connected to how well a book had sold. But the best things always start simple.  The first thing to do would be to find a reliable way to connect words.  This would be done by connecting words via a thesaurus like bonds between molecules.  These bonds would be treated like springs in a physical system.  The next thing would be to use a sort of physical thermodynamic modeling.  That's a genius idea of Rob's. Basically the words could be visualized like a fluid that when heated would flow and move.  Then, as the heat was taken away slowly, groups of connected words would clump together and eventually crystalize.  Our hope is that the crystalized words would show patterns and themes. What I would find interesting is coloring the word crystals by region, by origin date, or by some other form of patterning.  I think it would be very illuminating to view words in this way.  But it might also be interesting to see different languages compared to each other using this same qualitative method. Thus far neither Rob nor I have done much about this idea.  We have really enjoyed talking about it and perhaps might code up some simple programs using Python, Dot, and some open source packages.  There are also a number of great research institutes doing things like this and I really want to look into their current work.  I do hope we follow up on this idea because it seems like a lot of fun.