Installing Spark on Ubuntu
This article describes the step-by-step approach to build and run Apache Spark 1.6.2 on Ubuntu. I’ve used Ubuntu 16 on VirtualBox 5.0.24 for the purpose of this blog post.
Below is the detailed steps to set up.
Installation Steps:
- Install Virtualbox
- Install Ubuntu on virtualbox
- Install Java
- Setting up Spark on Ubuntu
Step 1: Install Virtualbox on Windows Machine
Step 2: Install Ubuntu on virtualbox
- First download Ubuntu 16.04 Xenial
Step 3: Install Java
For running Spark in Ubuntu machine should install Java. Using following commands easily install Java in Ubuntu machine.
To check the Java installation is successful
Step 4: Setting up Spark on Ubuntu
Download Spark
Download Spark
I) Go to this site and choose the following options:
- Choose a Spark release: pick the latest
- Choose a package type: Source code [can build several Hadoop versions]
- Choose a download type: Select Direct Download
II) Unizip the spark folder and rename it as spark.
III) Edit your BASH profile to add Spark to your
PATH
and to set the SPARK_HOME
environment
variable. These helpers will assist you on the command line. On Ubuntu, simply
edit the ~/.bash_profile
or ~/.profile
files and
add the following:Type pyspark to run Spark
During loading pyspark module into ipython following error may come up:
No module named py4j.java_gateway
To resolve this use Run the following command to find the py4j.java_gateway.
PySpark find py4j.java_gateway?
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
IV) After you source your profile (or simply restart your terminal), you should now be able to run a
pyspark
interpreter locally. Execute the pyspark
command, and you should see a result as follows:
V) To check the Spark installation is successful
No comments:
Post a Comment