Monday, 26 August 2019

Why does writing to an inherited file handle from a python sub-process result in not all rows being written?

I have the following python program, which starts three processes that each write 10000 random rows to the same file using an inherited file handle:

import multiprocessing
import random
import string
import traceback

if __name__ == '__main__':
  # clear out the file first
  open('out.txt', 'w')
  # initialise file handle to be inherited by sub-processes
  file_handle = open('out.txt', 'a', newline='', encoding='utf-8')
  process_count = 3

# routine to be run by sub-processes
# adds n lines to the file
def write_random_rows(n):
  try:
    letters = string.ascii_lowercase
    for _ in range(n):
      s = ''.join(random.choice(letters) for _ in range(100))
      file_handle.write(s+"\n")
  except Exception:
    traceback.print_exc()

if __name__ == '__main__':
  # initialise the multiprocessing pool
  process_pool = multiprocessing.Pool(processes=process_count)

  # write the rows
  for i in range(process_count):
    process_pool.apply_async(write_random_rows, (10000,))
    # write_random_rows(10000)

  # wait for the sub-processes to finish
  process_pool.close()
  process_pool.join()

As a result of running this, I expect the file to contain 30000 rows. If I run write_random_rows(10000) inside my main loop (the commented out line in the above program), 30000 rows are written to the file as expected. However, if I run the non-commented line, process_pool.apply_async(write_random_rows, (10000,)), I end up with 15498 rows in the file.

Strangely, no matter how many times I rerun this script, I always get the same (incorrect) number of rows in the output file.

I can fix this issue by initializing the file handle from within write_random_rows(), i.e. within the sub-process execution, which suggests that somehow the inherited file handles are interfering with each other. If it was related to some kind of race condition though, I would expect the number of rows to change each time I ran the script. Why exactly does this issue occur?



from Why does writing to an inherited file handle from a python sub-process result in not all rows being written?

No comments:

Post a Comment