Saturday, 17 October 2020

PySpark DataFrame filter using logical AND over list of conditions -- Numpy All Equivalent

I'm trying to filter rows of a PySpark dataframe if the values of all columns are zero.

I was hoping to use something like this, (using the numpy function np.all() ):

from pyspark.sql.functions import col
df.filter(all([(col(c) != 0) for c in df.columns]))

But I get the ValueError:

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

Is there any way to perform the logical and on a list of conditions? What is the corresponding np.all functionality in PySpark?



from PySpark DataFrame filter using logical AND over list of conditions -- Numpy All Equivalent

No comments:

Post a Comment