Wednesday, April 17, 2024

Open Source Artificial Intelligence: 50 Top Projects

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Since the earliest days of computers, creating machines that could “think” like humans has been a key goal for researchers. In the past few years, computer scientists have made huge leaps forward in artificial intelligence (AI), to the point where the technology is becoming commonplace.

In fact, Gartner predicts, “By 2020, AI technologies will be virtually pervasive in almost every new software product and service.” And IDC forecasts that companies will spend $12.5 billion on AI technology in 2017, 59.3 percent more than in 2016. That tremendous growth is likely to continue through 2020, when revenues could top $46 billion.

Open source software development has played a huge role in the rise of artificial intelligence, and many of the top machine learning, deep learning, neural network and other AI software is available under open source licenses.

For this list, we selected 50 of the most well-known of these open source artificial intelligence projects. They are organized into categories and then alphabetized within those categories. The lines between some of the categories can be fuzzy, so we used the project owners’ descriptions of their applications to determine where to place the various tools.

As always, if you know of additional open source AI tools that you believe should be on this list, feel free to note them in the comments section below.

Cognitive Architecture

1. ACT-R

Developed at Carnegie Mellon University, ACT-R is the name of both a theory of human cognition and software based on that theory. The software is based on Lisp, and extensive documentation is available. Operating System: Windows, Linux, macOS.

Deep Learning

2. Caffe

Originally created by a UC Berkeley PhD student, Caffe has become a very popular deep learning framework. Its claims to fame include expressive architecture, extensible code, and speed. Operating System: Windows, Linux, macOS.

3. CaffeOnSpark

First developed at Yahoo, this effort brings the Caffe deep learning framework to Hadoop and Spark clusters. It’s been used for image search and content classification, among other use cases. Operating System: Windows, Linux, macOS.

4. ConvNetJS

This JavaScript library allows users to train deep learning models from a browser. It promises “no software requirements, no compilers, no installations, no GPUs, no sweat.” Operating System: Linux.

5. DeepDetect

Used by organizations like Airbus and Microsoft, DeepDetect is an open source deep learning server based on Caffe, TensorFlow and XGBoost. It offers an easy-to-use API for image classification, object detection, and text and numerical data analysis. Operating System: Windows, Linux, macOS.

6. Deeplearning4j

Deeplearning4j claims to be “the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala.” Commercial support is available through Skymind. Operating System: Windows, Linux, macOS.


Short for “Deep Scalable Sparse Tensor Network Engine,” DSSTNE (pronounced “destiny”) is the software library Amazon uses to train and deploy its recommendation engine. Key features include multi-GPU scale, large layers and operation with sparse datasets. Operating System: Windows, Linux, macOS.

8. H2O

With more than 100,000 users, H2O claims to be “the world’s leading open source deep learning platform.” In addition to the Open Source version, the company also offers a Premium edition with paid support. Operating System: Windows, Linux, macOS.

9. Microsoft Cognitive Toolkit

Formerly known as CNTK, the Microsoft Cognitive Toolkit promises to train deep-learning algorithms to think like the human brain. It boasts speed, scalability, commercial-grade quality and compatibility with C++ and Python. Microsoft uses it to power the AI features in Skype, Cortana and Bing. Operating System: Windows, Linux.

10. Theano

Useful for deep learning, Theano describes itself as “a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.” Key features include GPU support, integration with NumPy, efficient symbolic differentiation, dynamic C code generation and more. Operating System: Windows, Linux, macOS.


11. DeepMind Labs

Intended for use in AI research, DeepMind Lab is a 3D game environment. It was created by the DeepMind group at Google and is said to be especially good for deep reinforcement learning research. Operating System: Linux.

12. Project Malmo

Project Malmo is a Microsoft-led effort to use Minecraft as an AI research platform. According to the website, “Minecraft is ideal for artificial intelligence research for the same reason it is addictively appealing to the millions of fans who enter its virtual world every day. Unlike other computer games, Minecraft offers its users endless possibilities, ranging from simple tasks, like walking around looking for treasure, to complex ones, like building a structure with a group of teammates.” Operating System: Windows, Linux, macOS.

13. StarCraft II API Library

Google’s DeepMind and Blizzard Entertainment are collaborating on a project that makes it possible to use the StarCraft II video game as an AI research platform. It’s a cross-platform C++ library for building scripted bots. Operating System: Windows, Linux, macOS, Android, iOS.

14. Stockfish

This open source chess engine is one of the best in the world and can beat most human grandmasters. Note that it is also available as a mobile app. Operating System: Windows, Linux, macOS.

Gradient Boosting

15. XGBoost

XGBoost supports gradient boosted trees, a type of decision tree that is easy to train and offers an alternative to neural networks. It supports regression, classification, ranking and other types of algorithms. Operating System: Windows, Linux, macOS.

Machine Intelligence

16. Numenta

The Numenta organization offers numerous open source projects related to hierarchical temporal memory. Essentially, these projects attempt to create machine intelligence based on current biological understandings of the human neocortex. Operating System: Windows, Linux, macOS.

17. Open Cog

Rather than focus on a narrow aspect of AI such as deep learning or neural networks, Open Cog aims to create beneficial artificial general intelligence (AGI). The project is working toward creating systems and robots with the capacity for human-like intelligence. Operating System: Linux.

Machine Learning

18. Accord.NET Framework

Accord.NET promises machine learning “made in a minute.” Based on Microsoft technologies, it includes sample applications and extensive documentation to help developers create production-grade computer vision, computer audition, signal processing and statistics applications quickly. Operating System: Windows.

19. AForge.NET Framework

Designed for computer vision and artificial intelligence applications, AForge.NET is a C# framework for image processing, neural networks, genetic algorithms, fuzzy logic, machine learning, robotics and more. It includes several libraries and sample applications. Operating System: Windows.

20. Aerosolve

This “machine learning package built for humans” was created by Airbnb to help with dynamic pricing recommendations for hosts. It’s based on Java and is particularly good for projects with geography-related variables. Operating System: Windows, Linux, macOS.

21. Distributed Machine Learning Toolkit

This Microsoft machine learning project includes the DMTK Framework, the Light LDA topic model algorithm, the Distributed (Multisense) Word Embedding algorithm and the LightGBM gradient boosting tree framework. The company plans to add more algorithms and components to the toolkit as research progresses. Operating System: Windows, Linux.

22. Dlib

Dlib offers a set of C++ machine learning libraries that are quick to execute. It includes algorithms for binary classification, multiclass classification, regression, structured prediction, deep learning, clustering, unsupervised learning, semi-supervised/metric learning, reinforcement learning and feature selection. Operating System: Windows, Linux, macOS.

23. Encog

Under active development since 2008, Encog is a machine learning framework created by data scientist Jeff Heaton. It supports neural networks, support vector machines, bayesian networks, hidden markov models, genetic programming and genetic algorithms. Operating System: Windows, Linux, macOS.

24. GoLearn

GoLearn describes itself as a “batteries included” machine learning library for the Go programming language. It aims for simplicity and customizability. Operating System: Linux, macOS.

25. Mahout

One of many machine learning projects sponsored by the Apache Software Foundation, Mahout offers a programming environment and framework for building scalable machine-learning applications. It also includes premade algorithms and a vector math experimentation environment called Samsara. Operating System: Windows, Linux, macOS.

26. MLlib

Part of the Apache Spark project, MLlib is a machine learning library that promises performance 100 times faster than MapReduce. It includes numerous algorithms for classification, regression, decision trees, recommendation, clustering, topic modeling, pattern mining and more. Operating System: Windows, Linux, macOS.

27. Pattern

Python-based Pattern offers tools for data mining, natural language processing, machine learning, network analysis and visualization. It is especially useful for web mining applications. Operating System: Windows, Linux, macOS.

28. Prophet

Developed and used by Facebook, Prophet forecasts time series data. It’s implemented in R or Python and is fully automatic, accurate, fast and tunable. Operating System: Windows, Linux.

29. Oryx 2

Created by Cloudera, Oryx 2 implements lambda architecture for machine learning. It is based on Apache Spark and Kafka. Operating system: Windows, Linux, macOS.

30. PredictionIO

Now an Apache incubating project, PredictionIO is a machine-learning server with customizable templates, real-time query response, the ability to ingest data from multiple platforms and more. It integrates with other open source tools like Spark, MLlib, HBase, Spray and Elasticsearch. Operating System: Windows, Linux, macOS.


An Apache incubating project, SAMOA stands for “Scalable Advanced Massive Online Analysis.” It’s a machine learning framework for distributed streaming applications. Operating System: Linux.

32. Scikit-learn

Based on NumPy, SciPy and matplotlib, scikit-learn offers Python tools for machine learning. It handles data mining and data analysis with algorithms for classification, regression, clustering, dimensionality reduction and more. Operating System: Windows, Linux.

33. Shark

Shark describes itself as a “fast, modular, feature-rich open-source C++ machine learning library.” It offers algorithms for supervised learning, unsupervised learning, evolutionary algorithms and basic linear algebra and optimization. Operating System: Windows, Linux, macOS.

34. Shogun

Under development since 1999, Shogun is a mature set of machine learning tools with support for Python, Octave, R, Java/Scala, Lua, C#, Ruby and other languages. It also has a free cloud service where users can try out the software. Operating System: Windows, Linux, macOS.

35. Smile

Short for “Statistical Machine Intelligence and Learning Engine,” Smile boasts extremely fast machine learning for Java, Scala and other JVM languages. It claims that it “outperforms R, Python, Spark, H2O, xgboost significantly.” Operating System: Windows, Linux, macOS.

36. SystemML

Originally an IBM Research project, SystemML is now a top-level Apache project. It describes itself as “an optimal workplace for machine learning using big data,” and it integrates with Spark. Operating System: Windows, Linux, macOS.

37. TensorFlow

Developed by the Google Brain team for internal use at Google, TensorFlow is now one of the most well-known open source machine learning platforms. Google is also making a cloud-based version of TensorFlow available for free to researchers. Operating System: Windows, Linux, macOS, Android.

38. Torch

Based on LuaJIT, Torch is a “scientific computing framework with wide support for machine learning algorithms.” Key features include a powerful N-dimensional array, GPU support, linear algebra routines, neural network and more. Operating System: Linux, macOS.

39. WEKA

Java-based WEKA offers a wide variety of machine learning algorithms that are useful for data mining. It was developed at the University of Waikato in New Zealand and is named for a New Zealnd bird known for its inquisitiveness. Operating System: Windows, Linux, macOS.

Natural Language Processing

40. Stanford CoreNLP

This Java-based natural language processing software can identify the base forms of words, their parts of speech and whether they are names of companies, people, etc., as well as normalizing dates and times, marking up the structure of sentences in terms of phrases and syntactic dependencies, indicating which noun phrases refer to the same entities, identifying sentiment, extracting particular or open-class relations between entity mentions and getting quotes. It was designed for English but also supports Arabic, Chinese, French, German, and Spanish. Operating System: Windows, Linux, macOS.


Short for “Machine Learning LanguagE Toolkit,” MALLET includes Java-based tools for statistical natural language processing, document classification, clustering, topic modeling, information extraction and more. It was first created in 2002 by faculty and graduate students at the University of Massachusetts Amherst and the University of Pennsylvania. Operating System: Windows, Linux.

42. OpenNLP

An Apache project, OpenNLP performs natural language processing tasks like tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution. It powers the Air New Zealand chatbot named Oscar. Operating System: Windows, Linux, macOS.

Neural Networks

43. Darknet

Written in C and CUDA, Darknet supports neural networks with CPU or GPU computation. It offers excellent capabilities for image classification. Operating System: Linux.

44. DyNet

Formerly known as cnn, DyNet is a neural network library for C++ and Python that was developed primarily at Carnegie Mellon University. It is useful for creating applications for syntactic parsing, machine translation, morphological inflection and more. Operating System: Windows, Linux, macOS.

45. Neuroph

Initially created as a graduate thesis project, Neuroph is a Java-based lightweight neural network framework. It aims to be easy enough to use that beginners can get started quickly, while also providing the flexibility and tools that more advanced users need. Operating System: Windows, Linux.

46. OpenNN

OpenNN, short for “Open Neural Networks,” is a C++ library for implementing neural networks. It boasts high performance and deep architecture. Commercial support is available. Operating system: Windows, Linux, macOS.

47. Sonnet

Created by Google’s DeepMind team, Sonnet is a neural network library that runs on top of TensorFlow. According to its developers, it offers greater flexibility than other TensorFlow frameworks. Operating System: Linux, macOS.

Virtual Assistant

48. Mycroft

Mycroft boasts that it is “the world’s first open source assistant.” It answers questions, plays audio and video, controls IoT-connected appliances and more. It has very minimal system requirements, and it can even run on a Raspberry Pi. Operating System: Windows, Linux, macOS.

49. Open Assistant

Still under heavy development, Open Assistant aims to offer an open source alternative to Siri, Cortana and Google Now. Its goal is to create a completely customizable AI that can engage in conversation. Operating System: Linux.

50. SNePS

Developed at the University of Buffalo, SNePS is a knowledge representation, reasoning and acting system. The group behind the project has used the research to create a virtual agent called Cassie. Operating System: Windows, Linux.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles