When I spin up an EMR cluster manually in the AWS console, I run the following after SSH'ing into my cluster:
spark-submit --master yarn-cluster --deploy-mode cluster --class spark_pkg.SparkMain
s3://mybucket/scala-1.0.jar -s arg1 -l arg1
How do I do this when using Boto3 in Python? Here is my steps
code:
steps = [
{
'Name': 'Running jar file',
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 's3://mybucket/{0}'.format(jar_file),
'Args': ['spark-submit', '--master yarn-cluster',
'--deploy-mode cluster', '--class spark_pkg.SparkMain',
'-s', arg1, '-l', arg2
]
}
}
]
It looks like these arguments are incorrect: 'spark-submit', '--master yarn-cluster', '--deploy-mode cluster', '--class spark_pkg.SparkMain'
And the error I am getting is below. How can I correctly define those arguments?
Error: Unknown argument 'spark-submit'
Error: Unknown option --master yarn-cluster
Error: Unknown option --deploy-mode cluster
Error: Unknown option --class spark_pkg.SparkMain
Usage: spark-zoning [options]
-l, --id1 <value>
-s --id2 <value>
Exception in thread "main" scala.MatchError: None (of class scala.None$)
at spark_pkg.SparkMain$.main(SparkMain.scala:208)
at spark_pkg.SparkMain.main(SparkMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
from Spark submit Scala jar using Boto3
No comments:
Post a Comment