Hadoop is the primary big data platform in use today. What started out as an indexing engine originally developed by Google and then by Yahoo has become the leading engine for managing very large indexing tasks.
Hadoop is a JAVA project governed by the Apache Software Foundation. Yet what is interesting is Microsoft’s support for the project in multiple ways:
- With their partnership with Hortonworks, Microsoft is actually one of the biggest contributors to the Hadoop project. Microsoft has contributed more than 16,000 lines of code to the Hadoop open-source project.
- Hadoop can be run in the Azure Cloud through HDInsight. One of the key competitive advantages of running Hadoop in the cloud is the simplicity of spinning up additional HDFS nodes as needed.
- Excel 2013 can now access Hadoop data using Power Query.
- Microsoft has a technology called Polybase built into their Parallel Data Warehouse that allows you to combine non-relational data with traditional relational databases.
It wasn’t always the case that Microsoft supported Hadoop – for a while they were actively competing it. However, Microsoft has changed direction and has made significant contributions, platform support and started to build a competitive offering based on ensuring access to Hadoop based data is easy to manage and accessible to power users through Excel, Office 365 and Azure. Microsoft seems to be committed to the Hadoop platform for the long term.