Hadoop - how to learn

Enterprise level data analysis and processing requirement is keep on changing day by day and it is proved that in order to make correct judgments better to deep dive into as much data as possible. Companies are adopting BIG DATA technologies to analysis the big volume of data that comes to their eco system. Some years back market analysts thought that this trend is purely due to peer pressure but later pioneers who implemented Hadoop (Open source big data frame work) started reaping the benefit. Most of the companies are adopting Hadoop framework for their data problems.

There are multiple technologies which can handle big volume of data such as massively parallel processing systems like green plum, Microsoft SQL Server MPP systems, Teradata etc.  Problem with these database is cost involved for implementation and maintenance. Hadoop is completely open source and can implement using in-expensive commodity servers. There are lot of players in the market who is specialized in their own version of Hadoop and provides support to implement such as Apache hadoop, Cloudera’s Distribution including Apache Hadoop, IBM distribution of Apache Hadoop, DataStax Brisk, Amazon Elastic MapReduce, and Horton works etc.

I choose to learn Horton works Hadoop distribution as this is the only one company which provides support for windows. Horton works and Microsoft have tie up to build BI technologies to work on Hadoop framework.

 Whoever interested to learn Hadoop can download the free sandbox from here http://hortonworks.com/products/hortonworks-sandbox/. Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials provided by Horton networks.
To install hortonworks sandbox, you need to install virtual box in your machine. I have installed oracle virtual VM VirtualBox managerto install this and you can find the same from here   http://www.oracle.com/us/technologies/virtualization/virtualbox/overview/index.html

Open the virtual machine as shown in Fig 1 and import the downloaded sandbox appliance which will start installing the hortonworks Hadoop!!

Fig 1

Enjoy learning…


Popular posts from this blog

Microsoft BI Implementation - Cube back up and restore using XMLA command

Databricks - incorrect header check