Thursday, 30 September 2021

Figuring the required Python modules and their versions of a Python process

I have a tool which follows the system calls of a process. That way I know all the files/areas that were using by a process. I have a Python script which being executed (creates a process). I know all the files that were used during the run, such as the script itself. I also know the files of the modules that were used. The modules are installed in /tmp/vendor.

Based on the files inside /tmp/vendor that I found, I'm trying to figure the module name and module version so I could create a requirements file for the pip and then install them using pip install (to some other directory). Basically, I want to be able to know all the module dependencies of a Python process. Those modules could come from different areas but let's focus on one (/tmp/vendor). The way I installed the modules into /tmp/vendor is just:

pip install --requirement requirements.txt --target /tmp/vendor

Now I want I to be able to build this requirements.txt file, based on the files in /tmp/vendor.

The solution could be dynamic or static. At first I tried to solve it in a static way - check the files in /tmp/vendor. I did an example - I installed requests:

pip install requests --target /tmp/vendor

As I understand, it installs the latest version. Inside vendor I have:

ls -la vendor/
total 52
drwxr-x--- 13 user group 4096 Sep 26 17:37 .
drwxr-x---  8 user group 4096 Sep 26 17:37 ..
drwxr-x---  2 user group 4096 Sep 26 17:37 bin
drwxr-x---  3 user group 4096 Sep 26 17:37 certifi
drwxr-x---  2 user group 4096 Sep 26 17:37 certifi-2021.5.30.dist-info
drwxr-x---  5 user group 4096 Sep 26 17:37 charset_normalizer
drwxr-x---  2 user group 4096 Sep 26 17:37 charset_normalizer-2.0.6.dist-info
drwxr-x---  3 user group 4096 Sep 26 17:37 idna
drwxr-x---  2 user group 4096 Sep 26 17:37 idna-3.2.dist-info
drwxr-x---  3 user group 4096 Sep 26 17:37 requests
drwxr-x---  2 user group 4096 Sep 26 17:37 requests-2.26.0.dist-info
drwxr-x---  6 user group 4096 Sep 26 17:37 urllib3
drwxr-x---  2 user group 4096 Sep 26 17:37 urllib3-1.26.7.dist-info

Now I can see that it also installs other modules that are needed, such as urllib3 and idna.
So my tool finds for example, that I were using:

/tmp/vendor/requests/utils.py

I also notice that each module is in format:

$NAME-(.*).dist-info

And the group is the version of the module. So at first I thought that I could parse for /tmp/vendor/(.*)/.* and get the module name ($NAME) and then look for $NAME-(.*).dist-info, but the problem is that I noticed that some module don't have this *.dist-info directory so I could not figure the version of the module, which made me leave this approach.

I also tried some dynamic approaches - I know which python version was used and I could run python and try to load the module. But I could not figure a way to find the version of the module.

To summarize - I'm looking for a robust way to figure the modules the are required for my Python process in order to run. The modules should come with their version. All of the modules were installed using pip so it should simplify the task. How can it be done?



from Figuring the required Python modules and their versions of a Python process

No comments:

Post a Comment