I have a conda installation of python 3.7
$python3 --version
Python 3.7.6
pyspark was installed via pip3 install (conda does not have a native package for it).
$conda list | grep pyspark
pyspark 2.4.5 pypi_0 pypi
Here is what pip3 tells me:
$pip3 install pyspark
Requirement already satisfied: pyspark in ./miniconda3/lib/python3.7/site-packages (2.4.5)
Requirement already satisfied: py4j==0.10.7 in ./miniconda3/lib/python3.7/site-packages (from pyspark) (0.10.7)
jdk 11 is installed:
$java -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
When attempting to import pyspark things are not going so well. Here is a mini test program:
from pyspark.sql import SparkSession
import os, sys
def setupSpark():
os.environ["PYSPARK_SUBMIT_ARGS"] = "pyspark-shell"
spark = SparkSession.builder.appName("myapp").master("local").getOrCreate()
return spark
sp = setupSpark()
df = sp.createDataFrame({'a':[1,2,3],'b':[4,5,6]})
df.show()
That results in :
Error: Unable to initialize main class org.apache.spark.deploy.SparkSubmit Caused by: java.lang.NoClassDefFoundError: org/apache/log4j/spi/Filter
Here is full details:
$python3 sparktest.py
Error: Unable to initialize main class org.apache.spark.deploy.SparkSubmit
Caused by: java.lang.NoClassDefFoundError: org/apache/log4j/spi/Filter
Traceback (most recent call last):
File "sparktest.py", line 9, in <module>
sp = setupSpark()
File "sparktest.py", line 6, in setupSpark
spark = SparkSession.builder.appName("myapp").master("local").getOrCreate()
File "/Users/steve/miniconda3/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/Users/steve/miniconda3/lib/python3.7/site-packages/pyspark/context.py", line 367, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/Users/steve/miniconda3/lib/python3.7/site-packages/pyspark/context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Users/steve/miniconda3/lib/python3.7/site-packages/pyspark/context.py", line 316, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/Users/steve/miniconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 46, in launch_gateway
return _launch_gateway(conf)
File "/Users/steve/miniconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 108, in _launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
Any pointers or info on working environment in conda would be appreciated.
from Unable to initialize main class org.apache.spark.deploy.SparkSubmit when trying to run pyspark
No comments:
Post a Comment