I have a package which is like
mypkg
|-mypkg
|- data
|- data.csv
|- __init__.py # Required for importlib.resources
|- scripts
|- module.py
|- __init__.py
The module module.py
requires data.csv
to perform a certain task.
The first naive approach I used to access data.csv
was
# module.py - Approach 1
from pathlib import Path
data_path = Path(Path.cwd().parent, 'data', 'data.csv')
but this obviously breaks when we have imported module.py
via from mypkg.scripts import module
or similar. I need a way to access data.csv
regardless of where mypkg
is imported from.
The next naive approach is to use __file__
attribute to get access to the path wherever the module.py
module is located.
# module.py - Approach 2
from pathlib import Path
data_path = Path(Path(__file__).resolve().parents[1], 'data', 'data.csv')
However, researching around about this problem I find that this approach is discouraged. See, for example, How to read a (static) file from inside a Python package?.
Though there doesn't seem to be total agreement as to the best solution to this problem, it looks like importlib.resources
is maybe the most popular. I believe this would look like:
# module.py - Approach 3
from pathlib import Path
import importlib.resources
data_path_resource = importlib.resources('mypkg.data', 'data.csv')
with data_path_resources as resource:
data_path = resource
Why is this final approach better than __file__
? It seems like __file__
won't work if the source code is zipped. This is the case I'm not familiar with and which also sounds a bit fringe. I don't think my code will ever be run zipped..
The added overhead from importlib
seems a little ridiculous. I need to add an empty __init__.py
in the data folder, I need to import importlib
, and I need to use a context manager just to access a relative path.
What am I missing about the benefits of the importlib
strategy? Why not just use __file__
?
edit: One possible justification for the importlib
approach is that it has slightly improved semantics. That is data.csv
should be thought of as part of the package, so we should access it using something like from mypkg import data.csv
but of course this syntax only works for importing .py
python modules. But importlib.resources
is sort of porting the "import something from some package" semantics to more general file types.
By contrast, the syntax of building a relative path from __file__
is sort of saying: this module is incidentally close to the data file in the file structure so let's take advantage of that to access it. The fact that the data file is part of the package isn't leveraged.
from Why use importlib.resources over __file__?
No comments:
Post a Comment