Library of Congress Has Archived 170 Billion Tweets

The ever-growing dataset isn't yet available to outside researchers.

The U.S. Library of Congress has announced that it has collected 170 billion tweets for its Twitter archive. However, the organization still hasn't solved the Big Data-related technical challenges that would allow it to make a searchable archive available to outside researchers.

CNN's Doug Gross reported, "An effort by the Library of Congress to archive Twitter posts has amassed more than 170 billion tweets, which the library is now seeking to make available to researchers and other interested parties."

In a blog post, the Library's Gayle Osterberg explained, "The Library’s first objectives were to acquire and preserve the 2006-10 archive; to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date. This month, all those objectives will be completed. We now have an archive of approximately 170 billion tweets and growing. The volume of tweets the Library receives each day has grown from 140 million beginning in February 2011 to nearly half a billion tweets each day as of October 2012."

PCMag's David Murphy noted, "With those goals achieved, the Library now plans to tackle the equally large elephant in the room: How to process and display this volume of Twitter posts so they can be accessed by researchers, 'in a comprehensive, useful way.' Interest in the Library's Twitter archives – ranging from research about citizen journalism and elected officials tweets to stock market predictions – has generated approximately 400 inquiries from researchers thus far, and that's even before the Library has been able to grant any kind of access to its 170 billion-large tweet archive."

Graeme McMillan from Digital Trends elaborated, "In a five-page report updating progress on the project, the Library notes that it has already received more than 400 requests for access to the archive, but it hasn’t as yet approved any. The reason is that right now, even just searching the fixed 2006-2010 archive Twitter shared before offering 'live' updates to the ongoing record can take up to one day – something that the Library describes as 'an inadequate situation in which to begin offering access to researchers.'"

Tags: Twitter, tweet, archive, library, big data

0 Comments (click to add your comment)
Comment and Contribute


(Maximum characters: 1200). You have characters left.