Wednesday, 19 May 2021

spark-submit log4j configuration has no effect in spark context

After specifying a configuration file in a spark-submit as in this answer:

spark-submit \
    --master local \
    --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"\
    --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"\
    --py-files ./dist/src-1.0-py3-none-any.whl\
    --files "/job/log4j.properties"\ # path in docker container
     main.py -input $1 -output $2 -mapper $3 $4 # app args

With the dockerized application structure being:

job/
|--  entrypoint.sh
|--  log4j.properties
|--  main.py

I'm getting the following error:

log4j:ERROR Ignoring configuration file [file:/log4j.properties].log4j:ERROR Could not read configuration file from URL [file:/log4j.properties].

java.io.FileNotFoundException: /log4j.properties (No such file or directory)

It works fine if I set the configuration from the spark context method: PropertyConfigurator.configure:

logger = sc._jvm.org.apache.log4j.Logger
sc._jvm.org.apache.log4j.PropertyConfigurator.configure("/job/log4j.properties")
Logger = logger.getLogger("MyLogger")

However, if I just instanciate a logger as (desirable behaviour):

log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger("MyLogger")

It isn't behaving as at it does setting it via PropertyConfigurator.configure, which I've set to silence all spark INFO level loggings. Any idea on how to use the logging configuration set in the spark-submit to control the application's logs?



from spark-submit log4j configuration has no effect in spark context

No comments:

Post a Comment