Wednesday, May 12, 2021

Library of Congress Has Archived 170 Billion Tweets

The U.S. Library of Congress has announced that it has collected 170 billion tweets for its Twitter archive. However, the organization still hasn’t solved the Big Data-related technical challenges that would allow it to make a searchable archive available to outside researchers.

CNN’s Doug Gross reported, “An effort by the Library of Congress to archive Twitter posts has amassed more than 170 billion tweets, which the library is now seeking to make available to researchers and other interested parties.”

In a blog post, the Library’s Gayle Osterberg explained, “The Library’s first objectives were to acquire and preserve the 2006-10 archive; to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date. This month, all those objectives will be completed. We now have an archive of approximately 170 billion tweets and growing. The volume of tweets the Library receives each day has grown from 140 million beginning in February 2011 to nearly half a billion tweets each day as of October 2012.”

PCMag’s David Murphy noted, “With those goals achieved, the Library now plans to tackle the equally large elephant in the room: How to process and display this volume of Twitter posts so they can be accessed by researchers, ‘in a comprehensive, useful way.’ Interest in the Library’s Twitter archives – ranging from research about citizen journalism and elected officials tweets to stock market predictions – has generated approximately 400 inquiries from researchers thus far, and that’s even before the Library has been able to grant any kind of access to its 170 billion-large tweet archive.”

Graeme McMillan from Digital Trends elaborated, “In a five-page report updating progress on the project, the Library notes that it has already received more than 400 requests for access to the archive, but it hasn’t as yet approved any. The reason is that right now, even just searching the fixed 2006-2010 archive Twitter shared before offering ‘live’ updates to the ongoing record can take up to one day – something that the Library describes as ‘an inadequate situation in which to begin offering access to researchers.'”

Similar articles

Latest Articles

Database-Tuning Platform Launches and...

PITTSBURGH — A team out of Carnegie Mellon University is launching its automatic database-tuning product today with the help of $2.5 million in funding.   OtterTune,...

Top 10 Professional Services...

Professional services automation (PSA) software aims to offer service-based companies most of the software they will need to run their businesses in one package....

What is Data Aggregation?

Data aggregation is the process where raw data is gathered and presented in a summarized format for statistical analysis. The data may be gathered...

Dell APEX: Our...

One of the missteps IBM made last century was collapsing their sales model, which was services based, to generate a short-term revenue spike. Up...