I organize my code in a python package (usually in a virtual environment like virtualenv
and/or conda
) and then usually call:
python <path_to/my_project/setup.py> develop
so that I can use the most recent version of my code. Since I develop mostly statistical or machine learning algorithms, I prototype a lot and change my code daily. However, recently the recommended way to run our experiments on the clusters I have access is through docker. I learned about docker and I think I have a rough idea of how to make it work but wanted wasn't quite sure if my solutions was good or if there might be better solutions out there.
The first solution that I thought is having a solution that copied the data in my docker image with:
COPY /path_to/my_project
pip install /path_to/my_project
and then pip installing it. The issue with this solution is that I have to actually build a new image each time which seems silly and was hoping I could have something better. To do this I was thinking of having a bash file like:
#BASH FILE TO BUILD AND REBUILD MY STUFF
# build the image with the newest version of
# my project code and it pip installs it and its depedencies
docker build -t image_name .
docker run --rm image_name python run_ML_experiment_file.py
docker kill current_container #not sure how to do get id of container
docker rmi image_name
as I said, my intuition tells me this is silly so I was hoping there was a single command way to do this with Docker or with a single Dockerfile. Also, note the command should use -v ~/data/:/data
to be able to get the data and some other volume/mount to write to (in the host) when it finishes training.
Another solution that I thought was to have all the python dependencies or other dependencies that my library needs in the Dockerfile (and hence in the image) and then somehow executing in the running container the installation of my library. Maybe with docker exec [OPTIONS] CONTAINER COMMAND
as:
docker exec CONTAINER pip install /path_to/my_project
in the running container. After that then I could run the real experiment I want to run with the same exec command:
docker exec CONTAINER python run_ML_experiment_file.py
though, I still don't know how to systematically get the container id though (because I probably don't want to look up the container id every time I do this).
Ideally in my head the best conceptual solution would be to simply have the Dockerfile know from the beginning to which file it should mount to (i.e. /path_to/my_project
) and then somehow do python [/path_to/my_project] develop
inside the image so that it would always be linked to the potentially changing python package/project. That way I can run my experiments with a single docker command as in:
docker run --rm -v ~/data/:/data python run_ML_experiment_file.py
and not have to explicitly update the image myself every time (that includes not having to re install parts of the image that should be static) since its always in sync with the real library. Also, having some other script build a new image from scratch each time is not what I am looking for. Also, It would be nice to be able to avoid writing any bash too if possible.
I think I am very close to a good solution. What I will do instead of building a new image each time I will simply run the CMD
command to do python develop as follow:
# install my library (only when the a container is spun)
CMD python ~/my_tf_proj/setup.py develop
the advantage is that it will only pip install my library whenever I run a new container. This solves the development issue because re creating a new image takes to long. Though I just realized that if I use the CMD
command then I can't run other commands given to my docker run, so I actually mean to run ENTRYPOINT
.
Right now the only issue to complete this is that I am having issues using volume because I can't successfully link to my host project library within the Dockerfile (which seems to require an absolute path for some reason). I am currently doing doing (which doesn't seem to work):
VOLUME /absolute_path_to/my_tf_proj /my_tf_proj
why can't I link using the VOLUME command in my Dockerfile? My main intention with using VOLUME is making my library (and other files that are always needed by this image) accessible when the CMD command tries to install my library. Is it possible to just have my library available all the time when a container is initiated?
Ideally I wanted to just have the library be installed automatically when a container is run and if possible, since the most recent version of the library is always required, have it install when a container is initialized.
As a reference right now my non-working Dockerfile looks as follow:
# This means you derive your docker image from the tensorflow docker image
# FROM gcr.io/tensorflow/tensorflow:latest-devel-gpu
FROM gcr.io/tensorflow/tensorflow
#FROM python
FROM ubuntu
RUN mkdir ~/my_tf_proj/
# mounts my tensorflow lib/proj from host to the container
VOLUME /absolute_path_to/my_tf_proj
#
RUN apt-get update
#
apt-get install vim
#
RUN apt-get install -qy python3
RUN apt-get install -qy python3-pip
RUN pip3 install --upgrade pip
#RUN apt-get install -y python python-dev python-distribute python-pip
# have the dependecies for my tensorflow library
RUN pip3 install numpy
RUN pip3 install keras
RUN pip3 install namespaces
RUN pip3 install pdb
# install my library (only when the a container is spun)
#CMD python ~/my_tf_proj/setup.py develop
ENTRYPOINT python ~/my_tf_proj/setup.py develop
As a side remark:
Also, for some reason it requires me to do RUN apt-get update
to be able to even install pip or vim in my container. Do people know why? I wanted to do this because just in case I wanted to attach to the container with a bash
terminal, it would be really helpful.
Seems that Docker just forces you to apt install to always have the most recent version of software in the container?
Bounty:
what a solution with COPY
? and perhaps docker build -f path/Docker .
. See: How does one build a docker image from the home user directory?
from How to use a python library that is constantly changing in a docker image or new container?
No comments:
Post a Comment