I'm trying to filter rows of a PySpark dataframe if the values of all columns are zero.
I was hoping to use something like this, (using the numpy function np.all()
):
from pyspark.sql.functions import col
df.filter(all([(col(c) != 0) for c in df.columns]))
But I get the ValueError:
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Is there any way to perform the logical and on a list of conditions? What is the corresponding np.all
functionality in PySpark?
from PySpark DataFrame filter using logical AND over list of conditions -- Numpy All Equivalent
No comments:
Post a Comment