Getting started with a distributed system like Hadoop can be a daunting task for developers. From installing and configuring Hadoop to learning the basics of MapReduce and other add-on tools, the learning curve is pretty high.
Hortonworks recently released the Hortonworks Sandbox for anyone interested in learning and evaluating enterprise Hadoop.
The Hortonworks Sandbox provides:
- A virtual machine with Hadoop preconfigured.
- A set of hands-on tutorials to get you started with Hadoop.
- An environment to help you explore related projects in the Hadoop ecosystem like Apache Pig, Apache Hive, Apache HCatalog and Apache HBase.
You can download the Sandbox from Hortonworks website:
The Sandbox download is available for both VirtualBox and VMware Fusion/Player environments. Just follow the instruction to import the Sandbox into your environment.
The download is an OVA (open virtual appliance), which is really a TAR file.
Untar it and the archive consists of an OVF (Open Virtualization Format) descriptor file, a manifest file and a disk image of vmdk format.
Rackspace Cloud doesn’t let you upload your own images, but if you have an OpenStack based cloud, you can boot a virtual machine with the image provided.
First, you can convert the vmdk image to a more familiar format like qcow2.
1 2 3 4
Now, let’s upload the image to Glance.
Now let’s create a virtual server off of the new image – give at least 4GB of RAM.
Once the instance goes to ACTIVE status and that the instance pings, you can ssh into the instance using
- Username: root
- Password: hadoop
/var/log/boot.log as the services are coming up, and it will let you know when the installation is complete. This can take about 10 minutes.
At the end, you should have these java processes running:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Go to the browser at http://instance_ip and your single node Hadoop cluster should be running. Just follow through the UI; it has demos, videos and step-by-step hands-on tutorials on Hadoop, Pig, Hive and HCatalog.