Wednesday, 21 April 2021

How spark distributes partitions to executors

I have a performance issue and after analyzing Spark web UI i found what it seems to be data skewness: enter image description here

Initially i thought partitions were not evenly distributed, so i performed an analysis of rowcount per partitions, but it seems normal(with no outliers): how to manually run pyspark's partitioning function for debugging

But the problem persists and i see there is one executor processing most of the data:

enter image description here

So the hypothesis now is partitions are not evenly distributed across executors, question is: how spark distributes partitions to executors? and how can i change it to solve my skewness problem?



from How spark distributes partitions to executors

No comments:

Post a Comment