I'm creating a dictionary d
of one million of items which are tuples, and ideally I'd like to access them with:
d[1634].id # or d[1634]['id']
d[1634].name # or d[1634]['name']
d[1634].isvalid # or d[1634]['isvalid']
rather than d[1634][0]
, d[1634][1]
, d[1634][2]
which is less explicit.
According to my test:
import os, psutil, time, collections, typing
Tri = collections.namedtuple('Tri', 'id,name,isvalid')
Tri2 = typing.NamedTuple("Tri2", [('id', int), ('name', str), ('isvalid', bool)])
t0 = time.time()
# uncomment only one of these 4 next lines:
d = {i: (i+1, 'hello', True) for i in range(1000000)} # tuple
# d = {i: {'id': i+1, 'name': 'hello', 'isvalid': True} for i in range(1000000)} # dict
# d = {i: Tri(id=i+1, name='hello', isvalid=True) for i in range(1000000)} # namedtuple
# d = {i: Tri2(id=i+1, name='hello', isvalid=True) for i in range(1000000)} # NamedTuple
print('%.3f s %.1f MB' % (time.time()-t0, psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2))
"""
tuple: 0.257 s 193.3 MB
dict: 0.329 s 363.6 MB
namedtuple: 1.253 s 193.3 MB (collections)
NamedTuple: 1.250 s 193.5 MB (typing)
"""
- using a
dict
doubles the RAM usage, compared to atuple
- using a
namedtuple
orNamedTuple
multiplies by 5 the time spent, compared to atuple
!
Question: is there a tuple-like data structure in Python 3 which allows to access the data with x.id
, x.name
, etc. and also is RAM and CPU efficient?
Notes:
-
in my real use case, the
tuple
is something like a C-struct of type(uint64, uint64, bool)
. -
I've also tried with:
-
slots
(to avoid the interal object's__dict__
, see Usage of __slots__?) -
dataclass
:@dataclasses.dataclass class Tri3: id: int ...
-
ctypes.Structure
:class Tri7(ctypes.Structure): _fields_ = [("id", ctypes.c_int), ...]
but it was not better (all of them ~ 1.2 sec.), nothing close to a genuine
tuple
in terms of performance -
-
Here are other options: C-like structures in Python
from Dictionary of (named) tuples in Python and speed/RAM performance
No comments:
Post a Comment