Thursday, 3 September 2020

Is there an alternative for zip(*iterable) when the iterable consists of millions of elements?

I have come across a code like this:

from random import randint

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

points = [Point(randint(1, 10), randint(1, 10)) for _ in range(10)]
xs = [point.x for point in points]
ys = [point.y for point in points]

And I think this code is not Pythonic because it repeats itself. If another dimension is added to Point class, a whole new loop needs to be written like:

zs = [point.z for point in points]

So I tried to make it more Pythonic by writing something like this:

xs, ys = zip(*[(point.x, point.y) for point in p])

If a new dimension is added, no problem:

xs, ys, zs = zip(*[(point.x, point.y, point.z) for point in p])

But this is almost 10 times slower than the other solution when there are millions of points, although it has only one loop. I think it is because * operator needs to unpack millions of arguments to the zip function which is horrible. Is there a way to change the code above so that it is both fast and Pythonic?

Edit: My question is not only about speed. I know numpy would be faster but numpy will be faster anyways. I'm just trying to understand why the zip solution is slow and how could I write something that is as fast as the original code and Pythonic at the same time in pure Python (without importing any 3rd party packages).



from Is there an alternative for zip(*iterable) when the iterable consists of millions of elements?

No comments:

Post a Comment