Thursday 10 December 2020

Dictionary of (named) tuples in Python and speed/RAM performance

I'm creating a dictionary d of one million of items which are tuples, and ideally I'd like to access them with:

d[1634].id       # or  d[1634]['id']
d[1634].name     # or  d[1634]['name']
d[1634].isvalid  # or  d[1634]['isvalid']

rather than d[1634][0], d[1634][1], d[1634][2] which is less explicit.

According to my test:

import os, psutil, time, collections, typing
Tri = collections.namedtuple('Tri', 'id,name,isvalid')
Tri2 = typing.NamedTuple("Tri2", [('id', int), ('name', str), ('isvalid', bool)])
t0 = time.time()
# uncomment only one of these 4 next lines:
d = {i: (i+1, 'hello', True) for i in range(1000000)}                                 # tuple
# d = {i: {'id': i+1, 'name': 'hello', 'isvalid': True} for i in range(1000000)}      # dict
# d = {i: Tri(id=i+1, name='hello', isvalid=True) for i in range(1000000)}            # namedtuple
# d = {i: Tri2(id=i+1, name='hello', isvalid=True) for i in range(1000000)}            # NamedTuple
print('%.3f s  %.1f MB' % (time.time()-t0, psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))

"""
tuple:       0.257 s  193.3 MB
dict:        0.329 s  363.6 MB
namedtuple:  1.253 s  193.3 MB  (collections)
NamedTuple:  1.250 s  193.5 MB  (typing)
"""
  • using a dict doubles the RAM usage, compared to a tuple
  • using a namedtuple or NamedTuple multiplies by 5 the time spent, compared to a tuple!

Question: is there a tuple-like data structure in Python 3 which allows to access the data with x.id, x.name, etc. and also is RAM and CPU efficient?


Notes:

  • in my real use case, the tuple is something like a C-struct of type (uint64, uint64, bool).

  • I've also tried with:

    • slots (to avoid the interal object's __dict__, see Usage of __slots__?)

    • dataclass:

      @dataclasses.dataclass
      class Tri3:
          id: int
          ...
      
    • ctypes.Structure:

      class Tri7(ctypes.Structure):
          _fields_ = [("id", ctypes.c_int), ...]
      

    but it was not better (all of them ~ 1.2 sec.), nothing close to a genuine tuple in terms of performance

  • Here are other options: C-like structures in Python



from Dictionary of (named) tuples in Python and speed/RAM performance

No comments:

Post a Comment