Monday, 22 February 2021

How to use pyclbr for nested modules?

My aim is to browse pandas source code for its classes and modules. While my whole source code works for almost all of the modules, I found one particular module throwing error that I cannot understand.

Below is the MVC that I am running which demonstrates my errors:

import pyclbr
import sys

source_code_module = 'doc.sphinxext.contributors'
sys.path.insert(1, '/tmp/pandas/pandas/')
source_code_path = ['/tmp/pandas/pandas']

print('sys.path is: ')
print(sys.path)

source_code_data = pyclbr.readmodule_ex(
    source_code_module, path=source_code_path)
print(source_code_data)

I get the following error:

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    source_code_module, path=source_code_path)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pyclbr.py", line 136, in readmodule_ex
    return _readmodule(module, path or [])
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pyclbr.py", line 170, in _readmodule
    parent = _readmodule(package, path, inpackage)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pyclbr.py", line 175, in _readmodule
    return _readmodule(submodule, parent['__path__'], package)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/pyclbr.py", line 183, in _readmodule
    spec = importlib.util._find_spec_from_path(fullmodule, search_path)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/util.py", line 58, in _find_spec_from_path
    return _find_spec(name, path)
  File "<frozen importlib._bootstrap>", line 906, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 1289, in find_spec
  File "<frozen importlib._bootstrap_external>", line 1084, in __init__
  File "<frozen importlib._bootstrap_external>", line 1099, in _get_parent_path
KeyError: 'doc'

I read the doc strings in the pyclbr.py and from what I understood, if doc's parent is in the sys.path, it should be generated.

This code works for doc.make. So, I thought maybe multi-level directories are the issue but it was not as the same code works for another multi-level module:

import pyclbr
import sys

source_code_module = 'asv_bench.benchmarks.io.sas'
sys.path.insert(1, '/tmp/pandas/pandas/')
source_code_path = ['/tmp/pandas/pandas']

print('sys.path is: ')
print(sys.path)

source_code_data = pyclbr.readmodule_ex(
    source_code_module, path=source_code_path)
print(source_code_data)

and I get the working output:

sys.path is:
['/Users/aviralsrivastava/dev/gruml', '/tmp/pandas/pandas/', '/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python37.zip', '/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7', '/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload', '/Users/aviralsrivastava/Library/Python/3.7/lib/python/site-packages', '/usr/local/lib/python3.7/site-packages']
{'SAS': <pyclbr.Class object at 0x10a606a90>}

Update 1

If I make path as the full directory path of the module, it works:

import pyclbr
import sys

source_code_module = 'contributors'
sys.path.insert(1, '/tmp/pandas/pandas/')
source_code_path = ['/tmp/pandas/pandas/doc/sphinxext']

print('sys.path is: ')
print(sys.path)

source_code_data = pyclbr.readmodule_ex(
    source_code_module, path=source_code_path)
print(source_code_data)

I get the output.

However, I have this inhibition that the data might not be complete and even if I verify in this case, it might fail in some other permutation of use cases that I do not have awareness on. The reason I have this fear is due to the support of modules and submodules in pyclbr which makes me wonder that if it (pyclbr) can support nested modules[1], why send an absolute full path for each file?

[1] The source code of pyclbr that handles nested modules(modules and submodules):

    # Check for a dotted module name.
    i = module.rfind('.')
    if i >= 0:
        package = module[:i]
        submodule = module[i+1:]
        parent = _readmodule(package, path, inpackage)
        if inpackage is not None:
            package = "%s.%s" % (inpackage, package)
        if not '__path__' in parent:
            raise ImportError('No package named {}'.format(package))
        return _readmodule(submodule, parent['__path__'], package)


from How to use pyclbr for nested modules?

No comments:

Post a Comment