Hemant Vishwakarma: Efficiency of sorting by multiple keys in Python

Saturday, 16 October 2021

Efficiency of sorting by multiple keys in Python

I have a list of strings that I want to sort by two custom key functions in Python 3.6. Comparing the multi-sort approach (sorting by the lesser key then by the major key) to the multi-key approach (taking the key as the tuple (major_key, lesser_key)), I could see the latter being more than 2x slower than the former, which was a surprise as I thought they are equivalent. I would like to understand why this is so.

import random
from time import time

largest = 1000000
length = 10000000
start = time()
lst = [str(x) for x in random.choices(range(largest), k=length)]
t0 = time() - start

start = time()
tmp = sorted(lst, key=lambda x: x[::2])
l1 = sorted(tmp, key=lambda x: ''.join(sorted(x)))
t1 = time() - start

start = time()
l2 = sorted(lst, key=lambda x: (''.join(sorted(x)), x[::2]))
t2 = time() - start

print(f'prepare={t0} multisort={t1} multikey={t2} slowdown={t2/t1}')

assert l1 == l2

from Efficiency of sorting by multiple keys in Python

Hemant Vishwakarma

Saturday, 16 October 2021

Efficiency of sorting by multiple keys in Python

No comments:

Post a Comment