Name This Data Science Blog: Setting up Spark Standalone on Windows

Wednesday, November 2, 2016

Setting up Spark Standalone on Windows

Doing this today with an eye toward Spark Scala development.

Even though I'm running Windows 10, I followed the directions at http://nishutayaltech.blogspot.in/2015/04/how-to-run-apache-spark-on-windows7-in.html with success, with the following notes:

I used the latest versions of Scala and Spark, and I already had Java 8 and Python 3.4 installed.
I used a Spark prebuilt package for Hadoop; initially I was confused about the language surrounding what to do with the environment variables, but realized this meant:

Create a new environment variable SPARK_HOME and set it to where I unzipped Spark (in this case, C:\spark-2.0.1-bin-hadoop2.7)
Edit the PATH environment variable to add %SPARK_HOME%\bin

Since I have Spark prebuilt for Hadoop 2.7, I downloaded the winutils.exe for hadoop-2.7.1 at https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin
I set log4j.properties to only show WARN and above, per http://stackoverflow.com/questions/28189408/how-to-reduce-the-verbosity-of-sparks-runtime-output

And, success! running spark-shell, opening the Spark UI, and running the SparkPi example.

Up next: choosing and setting up a Scala IDE

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)