I'm switching from Spark 2.4 to Spark 3, I solved most of the bugs I encountered but I don't know where this one is coming from.
I have a dataframe df
which I'm trying to write to a table in an Azure SQL Server, here is the line provoking the error and the stacktrace:
--> 105 df.write.jdbc(url=JDBCURL, table=table, mode=mode)
106 if verbose:
107 print("Yay it worked!")
/databricks/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, mode, properties)
1080 for k in properties:
1081 jprop.setProperty(k, properties[k])
-> 1082 self.mode(mode)._jwrite.jdbc(url, table, jprop)
1083
1084
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
131 # Hide where the exception came from that shows a non-Pythonic
132 # JVM exception message.
--> 133 raise_from(converted)
134 else:
135 raise
/databricks/spark/python/pyspark/sql/utils.py in raise_from(e)
PythonException: An exception was thrown from a UDF: 'KeyError: None'. Full traceback below:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/worker.py", line 654, in main
process()
File "/databricks/spark/python/pyspark/worker.py", line 646, in process
serializer.dump_stream(out_iter, outfile)
File "/databricks/spark/python/pyspark/serializers.py", line 231, in dump_stream
self.serializer.dump_stream(self._batched(iterator), stream)
File "/databricks/spark/python/pyspark/serializers.py", line 145, in dump_stream
for obj in iterator:
File "/databricks/spark/python/pyspark/serializers.py", line 220, in _batched
for item in iterator:
File "/databricks/spark/python/pyspark/worker.py", line 467, in mapper
result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
File "/databricks/spark/python/pyspark/worker.py", line 467, in <genexpr>
result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
File "/databricks/spark/python/pyspark/worker.py", line 91, in <lambda>
return lambda *a: f(*a)
File "/databricks/spark/python/pyspark/util.py", line 109, in wrapper
return f(*args, **kwargs)
KeyError: None
Well, there is no UDF as far as I'm concerned!
Some of the columns in my dataframe contain null
, but this is not (should not be!) a problem since my table accepts null
in these columns.
My code in Spark 2.4 could send this kind of dataframe to my SQL server, but now that I've switched to Spark 3 this line fails.
I am using Databricks, with a 7.3 runtime, Python 3.7
from "An exception was thrown from a UDF" but there's no UDF
No comments:
Post a Comment