Spark Python 环境搭建
主要步骤如下:
本次试用virtualbox安装,因此涉及virtualbox的一些组件安装
1.使用debian 8 X64 network install 版本安装系统
2.安装dwm 以及相关的XORG组件
3.安装java-package,下载JDK,执行make-jpkg XXX.tar.gz,安装之
4.安装IPython,默认python已经安装
5.安装spyder (PythonIDE)
6.安装Spyder
7.下载解压spark (hadoop2.6包含)版本
8.修改.bashrc
内容如下
#export SPARK export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-x64 export SPARK_HOME=/home/UID/dev/spark export PATH=$SPARK_HOME/bin:$PATH export SPARK_LOCAL_IP=127.0.0.1 export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$SPARK_HOME/python/lib/pyspark.zip:$PYTHONPATH export CLASSPATH=.:$JAVA_HOME/lib/tools.jar export PATH=$JAVA_HOME/bin:$PATH export TOMCAT_HOME=/home/UID/dev/tomcat export HADOOP26_HOME=/home/UID/dev/hadoop26 export LD_LIBRARY_PATH=$HADOOP26_HOME/lib/native
9.修改/spark/sbin/spark-config.sh
export HADOOP_OPS="-Djava.library.path=$HADOOP26/lib:$HADOOP26/lib/native"
10.执行python测试例子,保存为test.py
spark-submit test.py
test.py内容如下:
Spark Application - execute with spark-submit
Imports
from pyspark import SparkConf, SparkContext
Module Constants
APP_NAME = "My Spark App"
Closure Functions
Main functionality
def main(sc): input = sc.textFile("file:///home/XXXXX/README.md") print input.first() pass
if name == "main":
Configure Spark
conf = SparkConf().setAppName(APP_NAME) conf = conf.setMaster("local[*]") sc = SparkContext(conf=conf)
Execute Main functionality
main(sc)