Wednesday, 18 November 2020

"An exception was thrown from a UDF" but there's no UDF

I'm switching from Spark 2.4 to Spark 3, I solved most of the bugs I encountered but I don't know where this one is coming from.

I have a dataframe df which I'm trying to write to a table in an Azure SQL Server, here is the line provoking the error and the stacktrace:

--> 105       df.write.jdbc(url=JDBCURL, table=table, mode=mode)
    106       if verbose:
    107         print("Yay it worked!")

/databricks/spark/python/pyspark/sql/readwriter.py in jdbc(self, url, table, mode, properties)
   1080         for k in properties:
   1081             jprop.setProperty(k, properties[k])
-> 1082         self.mode(mode)._jwrite.jdbc(url, table, jprop)
   1083 
   1084 

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1303         answer = self.gateway_client.send_command(command)
   1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
   1307         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
    131                 # Hide where the exception came from that shows a non-Pythonic
    132                 # JVM exception message.
--> 133                 raise_from(converted)
    134             else:
    135                 raise

/databricks/spark/python/pyspark/sql/utils.py in raise_from(e)

PythonException: An exception was thrown from a UDF: 'KeyError: None'. Full traceback below:
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/worker.py", line 654, in main
    process()
  File "/databricks/spark/python/pyspark/worker.py", line 646, in process
    serializer.dump_stream(out_iter, outfile)
  File "/databricks/spark/python/pyspark/serializers.py", line 231, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File "/databricks/spark/python/pyspark/serializers.py", line 145, in dump_stream
    for obj in iterator:
  File "/databricks/spark/python/pyspark/serializers.py", line 220, in _batched
    for item in iterator:
  File "/databricks/spark/python/pyspark/worker.py", line 467, in mapper
    result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
  File "/databricks/spark/python/pyspark/worker.py", line 467, in <genexpr>
    result = tuple(f(*[a[o] for o in arg_offsets]) for (arg_offsets, f) in udfs)
  File "/databricks/spark/python/pyspark/worker.py", line 91, in <lambda>
    return lambda *a: f(*a)
  File "/databricks/spark/python/pyspark/util.py", line 109, in wrapper
    return f(*args, **kwargs)
KeyError: None

Well, there is no UDF as far as I'm concerned!

Some of the columns in my dataframe contain null, but this is not (should not be!) a problem since my table accepts null in these columns.

My code in Spark 2.4 could send this kind of dataframe to my SQL server, but now that I've switched to Spark 3 this line fails.

I am using Databricks, with a 7.3 runtime, Python 3.7



from "An exception was thrown from a UDF" but there's no UDF

No comments:

Post a Comment