Sunday, September 8, 2024

The Open Source Technology Behind Every Tweet

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

SAN DIEGO. Twitter has become one of the most pervasive forms of real time social media interactions in recent years and it’s largely powered by open source technology. That’s the message coming that Chris Aniszcyzyk, the open source manager at Twitter, delivered today at the LinuxCon conference.

Twitter’s infrastructure runs on open source technology using the JVM (Java Virtual Machine) and the Scala programming language. Aniszcyzyk noted that Twitter was first built with the open source Ruby on Rails framework, but ended up moving away from Rails for performance reasons.

Twitter uses Git as its source code version control system and the Jenkin Continuous Integration project for builds. On the storage side, Twitter uses MySQL for long term data storage and Hadoop for Big Data analytics.

Aniszcyzyk stressed that Twitter isn’t just a consumer of open source technologies, it also contributes, too.

At the the top of Aniszcyzyk’s list of Twitter’s open source contributions is the Bootstrap HTML framework. Bootstrap is routinely one of the most popular project on the GitHub open source code repository.

Overall, Aniszcyzyk says that Twitter now has over 80 projects on GitHub. Rather than list them all alphabetically, he detailed the anatomy of how a Tweet is sent and delivered using open source tech.

Twitter averages about 5,000 tweets per second, according to Aniszcyzyk, which is a number that can spike upward at any time.

Tweet Flow

The first step is the user sends a tweet and it hits Twitter’s API for status update. Each tweet has its own unique ID. Twitter has a technology that it created and open sourced known as snowflake that generates the ID.

Geographical information is then added by a project known as Rockdove, which is not yet open source, though Aniszcyzyk said that it will be eventually. All that tweet data is then stored in a MySQL database version that Twitter has created known as Gizzard. Aniszcyzyk explained that what Gizzard does is provide a replication layer on top of MySQL. Gizzard has also been open sourced by Twitter and is available on GitHub.

Once the tweet is processed by Twitter, the message is then sent out to a user’s followers with the help of a Twitter open source project known as Flock DB. Aniszcyzyk explained that Flock DB provides additional scale to Gizzard.

Bare Metal

When it comes to the hardware that Twitter is running to deliver the tweet, open source is playing a key role there, too. Twitter is using the Apache Mesos project, a clustering framework, to help keep the service reliable.

Overall, Twitter is running tens of thousands of machines, running with mostly Linux 2.6.39 kernels. Aniszcyzyk stressed that while the service might seem simple on the surface, the scale at which Twitter now operates makes it a difficult and non-trivial undertaking to keep running.

Twitter is now also officially becoming a member of the Linux Foundation and the plan is to accelerate collaboration with Linux as well.

“We’re mostly a consumer of Linux,” Aniszcyzyk said. “In the future, we want to push our changes upstream to Linux, too, we want to collaborate with the community and that’s why we joined the Linux Foundation.”

Ricky Santos, Dell VP Cloud Computing

Twitter’s Chris Aniszcyzyk

Sean Michael Kerner is a senior editor at InternetNews.com, the news service of the IT Business Edge Network, the network for technology professionals Follow him on Twitter @TechJournalist.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles