For those who wanted to get started with Hadoop as with any other technology there are a couple of steps involved from downloading/installing/configuring Linux and all the way to Hadoop and related frameworks. These steps can be avoided using a Cloud services like Amazon EMR.
HortonWorks recently announced a VM to get started easily with Hadoop and related frameworks. MapR, Cloudera and others also had similar VMs from some time. Using these VMs make simple to get started with Big Data, but there are some challenges using them. Like not all frameworks are installed in the Cloudera’s CDH (not sure why???), there is no option for automatically installing Cassandra or Pig and have to be installed manually. Also, all the services are started during startup which makes the whole VM slow on some of the machines.
To overcome some of these limitations a Big Data VM has been created for those who are novice to Linux and the Big Data frameworks. Check below screencast on how easy it is use the Big Data VM (play in VLC player). The steps involved to get started with the VM are to download/install VirtualBox and then configure the Big Data VM in it. VirtualBox is available on Windows/Mac/Linux/Solaris, so the Bit Data VM can run on any of these Operating Systems. All that is needed is 12GB of free Hard Disk, 3-4 GB RAM and a Laptop/Desktop with a decent processor and we are ready to jump into the Big Data world.
Read More: Hadoop Tips – VM for learning Hadoop