Spark Python 环境搭建

主要步骤如下:

本次试用virtualbox安装,因此涉及virtualbox的一些组件安装

1.使用debian 8 X64 network install 版本安装系统

2.安装dwm 以及相关的XORG组件

3.安装java-package,下载JDK,执行make-jpkg XXX.tar.gz,安装之

4.安装IPython,默认python已经安装

5.安装spyder (PythonIDE)

6.安装Spyder

7.下载解压spark (hadoop2.6包含)版本

8.修改.bashrc

内容如下

#export SPARK export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-x64 export SPARK_HOME=/home/UID/dev/spark export PATH=$SPARK_HOME/bin:$PATH export SPARK_LOCAL_IP=127.0.0.1 export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$SPARK_HOME/python/lib/pyspark.zip:$PYTHONPATH export CLASSPATH=.:$JAVA_HOME/lib/tools.jar export PATH=$JAVA_HOME/bin:$PATH export TOMCAT_HOME=/home/UID/dev/tomcat export HADOOP26_HOME=/home/UID/dev/hadoop26 export LD_LIBRARY_PATH=$HADOOP26_HOME/lib/native

9.修改/spark/sbin/spark-config.sh

export HADOOP_OPS="-Djava.library.path=$HADOOP26/lib:$HADOOP26/lib/native"

10.执行python测试例子,保存为test.py

spark-submit test.py

test.py内容如下:


Spark Application - execute with spark-submit

Imports

from pyspark import SparkConf, SparkContext

Module Constants

APP_NAME = "My Spark App"

Closure Functions

Main functionality

def main(sc): input = sc.textFile("file:///home/XXXXX/README.md") print input.first() pass

if name == "main":

Configure Spark

conf = SparkConf().setAppName(APP_NAME) conf = conf.setMaster("local[*]") sc = SparkContext(conf=conf)

Execute Main functionality

main(sc)