Although Big Data Startups reviews startups who deal with big data and who sell licensed products, it is important to note that the landscape of open source tools is growing rapidly as well. They have proven to be efficient and cost-effective in getting big data stored, analysed and visualised. Open source data and open source products are no longer as risky as they used to be, so more and more companies are adapting and implementing open source tools.
The best-known big data open source tool is Hadoop. Although it is widely accepted as the open source tool for big data, it is not suitable for every big data question. Hadoop for example, does not support real-time business needs, but luckily there are other open source tools filling this gap such as Kafka. Another well-regarded and increasingly more popular open source tool is the statistical programming language R. It is incredibly powerful and more than 2 million analysts use it. So, there are many other tools for all kind of different aspects of big data. The landscape is big and complex and it looks something like this:
In the coming future, we will definitely see an increasing adoption of the above open source tools. This happens for the following reasons:
- Open source tools do not require a huge investment to get going; just download the tool and get started;
- The community around open source tools is usually big and active, meaning that the product gets developed and improved fast compared to closed tools that tend to have a longer time-to-market. It also helps when encountering problems, as someone in the community might have had the same problem and solved it already. This prevents companies for reinventing the wheel;
- Open source tools have a flexible, scalable architecture that is cost-effective to manage huge quantities of data. This is especially desirable for SME’s.
- Open source tools help democratize the big data arena where more and more people can start working with it;
- Open source tools are developed in such away that they operate on commodity hardware, making it unnecessary to invest in expensive hardware.
That open source is gaining importance is also showed by the fact that more and more vendors, who traditionally rely on proprietary models, are embracing open source. For example, last year VMware launched a new open-source project that is called Serengeti and it is designed to let Hadoop run on top of VMware vSphere Cloud. Also EMC Greenplum made its new Chorus social framework open source last year.
Open source will continue to grow and innovate the coming years. The moment Hadoop allows real-time analyses, or another company will provide this open-source such a Google’s Dremel is already doing, open source will be hard to resist for organizations.