The Sound of Science - 'The Size of Data'
Newt: You're listening to The Sound of Science on WNIJ. I'm Newt with NIU STEAM.
Natalia: And I'm Natalia. I've been doing research on Twitter with Chynar Amanova, a postdoctoral fellow at NIU STEAM. How people tweet, what and why they tweet and the hashtags they use. All of these provide invaluable information about the culture we live in.
Newt: How much data do Twitter users create? How do you even go about measuring such a thing?
Natalia: Twitter manages nearly 400 million users worldwide. Those users generate approximately 84 terabytes per week of data. To give you a sense of scale, early floppy disks could hold 160 kilobytes. To hold even one terabyte, we will need over 6 million floppy disk. Thankfully, data storage has gotten a lot more refined since the 80s.
Newt: All data has weight, however minuscule. This weight comes from the electrons which don't weigh very much on their own. As you might imagine, the reason we bring up floppy disks, and how many of them you need to store even a smidgen of Twitter's data, is that the weight of data relies on the place that data is stored.
Natalia: 6 million floppy disks weigh a lot more than one modern hard drive. Recent estimates put the weight of the entire internet at about five ounces. Using another method of calculation and other estimates the weight to be 50 grams.
Newt: Either way, the weight of the entire internet is pretty negligible and Twitter's admittedly large part of that data is even less than that.
Natalia: To sort through all this data, we built a program that will pull tweets from Twitter's database based on keywords and timeframes we identify. Then we narrow parameters again to get a more concise selection of tweets.
Newt: They ended up with over 6,000 tweets to analyze.
Natalia: This has been The Sound of Silence on WNIJ where you learn something new every day.