I have a tool which follows the system calls of a process. That way I know all the files/areas that were using by a process. I have a Python script which being executed (creates a process). I know all the files that were used during the run, such as the script itself. I also know the files of the modules that were used. The modules are installed in /tmp/vendor
.
Based on the files inside /tmp/vendor
that I found, I'm trying to figure the module name and module version so I could create a requirements file for the pip and then install them using pip install
(to some other directory). Basically, I want to be able to know all the module dependencies of a Python process. Those modules could come from different areas but let's focus on one (/tmp/vendor
). The way I installed the modules into /tmp/vendor
is just:
pip install --requirement requirements.txt --target /tmp/vendor
Now I want I to be able to build this requirements.txt
file, based on the files in /tmp/vendor
.
The solution could be dynamic or static. At first I tried to solve it in a static way - check the files in /tmp/vendor
. I did an example - I installed requests
:
pip install requests --target /tmp/vendor
As I understand, it installs the latest version. Inside vendor
I have:
ls -la vendor/
total 52
drwxr-x--- 13 user group 4096 Sep 26 17:37 .
drwxr-x--- 8 user group 4096 Sep 26 17:37 ..
drwxr-x--- 2 user group 4096 Sep 26 17:37 bin
drwxr-x--- 3 user group 4096 Sep 26 17:37 certifi
drwxr-x--- 2 user group 4096 Sep 26 17:37 certifi-2021.5.30.dist-info
drwxr-x--- 5 user group 4096 Sep 26 17:37 charset_normalizer
drwxr-x--- 2 user group 4096 Sep 26 17:37 charset_normalizer-2.0.6.dist-info
drwxr-x--- 3 user group 4096 Sep 26 17:37 idna
drwxr-x--- 2 user group 4096 Sep 26 17:37 idna-3.2.dist-info
drwxr-x--- 3 user group 4096 Sep 26 17:37 requests
drwxr-x--- 2 user group 4096 Sep 26 17:37 requests-2.26.0.dist-info
drwxr-x--- 6 user group 4096 Sep 26 17:37 urllib3
drwxr-x--- 2 user group 4096 Sep 26 17:37 urllib3-1.26.7.dist-info
Now I can see that it also installs other modules that are needed, such as urllib3
and idna
.
So my tool finds for example, that I were using:
/tmp/vendor/requests/utils.py
I also notice that each module is in format:
$NAME-(.*).dist-info
And the group is the version of the module. So at first I thought that I could parse for /tmp/vendor/(.*)/.*
and get the module name ($NAME
) and then look for $NAME-(.*).dist-info
, but the problem is that I noticed that some module don't have this *.dist-info
directory so I could not figure the version of the module, which made me leave this approach.
I also tried some dynamic approaches - I know which python version was used and I could run python and try to load the module. But I could not figure a way to find the version of the module.
To summarize - I'm looking for a robust way to figure the modules the are required for my Python process in order to run. The modules should come with their version. All of the modules were installed using pip so it should simplify the task. How can it be done?
from Figuring the required Python modules and their versions of a Python process
No comments:
Post a Comment