Sunday 30 May 2021

PYSPARK_PYTHON setup in jupyter notebooks is ignored

I've been trying to setup PYSPARK_PYTHON from a juputer notebook(using jupyter lab) to use a specific conda env but i cannot find a way to make it work, I have found some examples using:

import os

os.environ['PYSPARK_PYTHON'] = "<the path>"

But it did not work so I also tried:

spark = pyspark.sql.SparkSession.builder \
       .master("yarn-client") \
       .appName(session_name) \
       .config("spark.yarn.appMasterEnv.PYSPARK_PYTHON","<the path>") \
       .enableHiveSupport() \
       .getOrCreate(cluster=cluster)

sc = spark.sparkContext
sqlContext = SQLContext(sc)

But it never uses the specified python version in the specified path , question is, is it possible the config is being ignored? do something else needs to be done in notebook?

I'm using yarn-client mode



from PYSPARK_PYTHON setup in jupyter notebooks is ignored

No comments:

Post a Comment