The Problem

I have written a neural network classifier that takes in massive images (~1-3 GB apiece), patches them up, and passes the patches through the network individually. Training was going really slowly, so I benchmarked it and found that it was taking ~50s to load the patches from one image into memory (using the Openslide library), and only ~.5 s to pass them through the model.

However, I'm working on a supercomputer with 1.5Tb of RAM of which only ~26 Gb is being utilized. The dataset is a total of ~500Gb. My thinking is that if we could load the entire dataset into memory it would speed up training tremendously. But I am working with a research team and we are running experiments across multiple Python scripts. So ideally, I would like to load the entire dataset into memory in one script and be able to access it across all scripts.

More details:

We run our individual experiments in separate Docker containers (on the same machine), so the dataset has to be accessible across multiple containers.
The dataset is the Camelyon16 Dataset; images are stored in .tif format.
We just need to read the images, no need to write.
We only need to access small portions of the dataset at a time.

Possible Solutions

I have found many posts about how to share Python objects or raw data in memory across multiple Python scripts:

Sharing Python data across scripts

Server Processes with SyncManager and BaseManager in the multiprocessing module | Example 1 | Example 2 | Docs - Server Processes | Docs - SyncManagers

Positives: Can be shared by processes on different computers over a network (can it be shared by multiple containers?)
Possible issue: slower than using shared memory, according to the docs. If we share memory across multiple containers using a client/server, will that be any faster than all of the scripts reading from disk?
Possible issue: according to this answer, the Manager object pickles objects before sending them, which could slow things down.

mmap module | Docs

Possible issue: mmap maps the file to virtual memory, not physical memory - it creates a temporary file.
Possible issue: because we use only a small portion of the dataset at a time, the virtual memory puts the entire dataset on disk, we run into thrashing issues and the program slogs.

Pyro4 (client-server for Python objects) | Docs

The sysv_ipc module for Python. This demo looks promising.

Possible issues: maybe just a lower level exposure of things available in the built-in multi-processing module?

I also found this list of options for IPC/networking in Python.

Some discuss server-client setups, some discuss serialization/deserialization, which I'm afraid will take longer than just reading from disk. None of the answers I've found address my question about whether these will result in a performance improvement on I/O.

Sharing memory across Docker containers

Not only do we need to share Python objects/memory across scripts; we need to share them across Docker containers.

The Docker documentation explains the --ipc flag pretty well. What makes sense to me according to the documentation is running:

docker run -d --ipc=shareable data-server
docker run -d --ipc=container:data-server data-client

But when I run my client and server in separate containers with an --ipc connection set up as described above, they are unable to communicate with each other. The SO questions I've read (1, 2, 3, 4) don't address integration of shared memory between Python scripts in separate Docker containers.

My Questions:

1: Would any of these provide faster access than reading from disk? Is it even reasonable to think that sharing data in memory across processes/containers would improve performance?
2: Which would be most appropriate solution for sharing data in memory across multiple docker containers?
3: How to integrate memory-sharing solutions from Python with docker run --ipc=<mode>? (is a shared IPC namespace even the best way to share memory across docker containers?)
4: Is there a better solution than these to fix our problem of large I/O overhead?

Minimal Working Example

This is my naive approach to memory sharing between Python scripts in separate containers. It works when the Python scripts are run the same container, but not when they are run in separate containers.

You can download the Openslide library with

apt-get update && apt-get install -y --no-install-recommends openslide-tools python-openslide
pip3 --no-cache-dir install openslide-python

Or you can just comment out those lines of code in the server and put something simpler in patch_dict (i.e. ints).

server.py

from multiprocessing.managers import SyncManager
import multiprocessing
import torch
from torchvision import transforms
# Optional line
import openslide

patch_dict = {}

image_level = 2
image_files = ['path/to/normal_042.tif']
region_list = [(14336, 10752),
               (9408, 18368),
               (8064, 25536),
               (16128, 14336)]

def load_patch_dict():

    for i, image_file in enumerate(image_files):
        # Begin optional lines
        image_data = openslide.OpenSlide(image_file) #does not load image, but references it

        patches = []

        for region in region_list:

            pat = image_data.read_region((region[0],region[1]),
                            image_level,
                            (224,224)).convert('RGB')

            patches.append(transforms.ToTensor()(pat))

        patches = torch.stack(patches).detach()
        # End optional lines
        # Simpler alternative
        # patches = 1
        patch_dict.update({'image_{}'.format(i): patches})

def get_patch_dict():
    return patch_dict

class MyManager(SyncManager):
    pass

if __name__ == "__main__":
    load_patch_dict()
    port_num = 4343
    MyManager.register("patch_dict", get_patch_dict)
    manager = MyManager(("127.0.0.1", port_num), authkey=b"password")
    # Set the authkey because it doesn't set properly when we initialize MyManager
    multiprocessing.current_process().authkey = b"password"
    manager.start()
    input("Press any key to kill server".center(50, "-"))
    manager.shutdown

client.py

from multiprocessing.managers import SyncManager
import multiprocessing
import sys, time
import torch
from torchvision import models

class MyManager(SyncManager):
    pass

MyManager.register("patch_dict")

if __name__ == "__main__":
    print("Loading Model")
    torchmodel = models.resnet152(pretrained=True)
    print("Model loaded")
    port_num = 4343

    manager = MyManager(("127.0.0.1", port_num), authkey=b"password")
    multiprocessing.current_process().authkey = b"password"
    manager.connect()
    patch_dict = manager.patch_dict()

    keys = list(patch_dict.keys())
    for key in keys:
        image_patches = patch_dict.get(key)
        image_embedded = torchmodel(image_patches)
        # Do more NN stuff (irrelevant)

I've hosted one image here to save you from downloading the entire dataset.

These scripts work fine for sharing the images when the scripts are run in the same container. But when they are run in separate containers, like this:

# Run the container for the server
docker run -it --name cancer-1 --rm --cpus=10 --ipc=shareable cancer-env
# Run the container for the client
docker run -it --name cancer-2 --rm --cpus=10 --ipc=container:cancer-1 cancer-env

I get the following error:

Traceback (most recent call last):
  File "patch_client.py", line 22, in <module>
    manager.connect()
  File "/usr/lib/python3.5/multiprocessing/managers.py", line 455, in connect
    conn = Client(self._address, authkey=self._authkey)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 487, in Client
    c = SocketClient(address)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 614, in SocketClient
    s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused

from IPC shared memory across Python scripts in separate Docker containers

Hemant Vishwakarma

Tuesday, 9 July 2019

IPC shared memory across Python scripts in separate Docker containers