Getting started with Apache Spark

With so much noise around Apache Spark, let’s look into how to get started with Spark in local mode and execute a simple Scala program. A lot of complex combinations are possible, but we will look at the minimum steps required to get started with Spark.

Most of the Big Data software’s are developed with Linux as the platform and porting to Windows has been an after thought. It is interesting to see how Big Data on Windows will morph in the future. Spark can run on both Windows/Linux, but we will take Linux (Ubuntu 14.04 64-bit Desktop) into consideration.

So, here are the steps:

1) Download and install Oracle VirtualBox.

2) Download and install Ubuntu.

3) Update the patches on Ubuntu from a terminal and reboot it.

1. sudo apt update;sudo apt-get dist-upgrade

4) Oracle Java doesn’t come with Linux distributions, so has to be installed manually on top of Ubuntu as mentioned here.

5) Spark has been developed in Scala, so we need install Scala.

1. sudo apt-get install scala

Read More: Getting started with Apache Spark

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s