I have a Python 2.7 (design constraint) script that is dispatched to multiple servers that read the same "large_reference_file.txt" at a shared network location. Sometimes, this file is not up-to-date and needs to be modified. I want to make this script able to update large_reference_file.txt at runtime without creating race conditions between all the servers.

I have a function LRF_is_up_to_date() that returns True iff large_reference_file.txt is up to date.

I have a function update_LRF() that will update large_reference_file.txt locally and copy it to the shared location.

Of course, I can't do

if not LRF_is_up_to_date():
    update_LRF()

because many processes might find the LRF to be out of date at the same time. In this circumstance, I want once process to become responsible for updating the LRF while the others wait. My current approach works okay, but includes a ridiculous random sleep that is the only way I could think to make a race for update responsibility less likely.

The code, with some timeout logic and other clutter removed for clarity, goes like this:

while not LRF_is_up_to_date():
    updating_flag = r'\\path_to_a_file' # an empty file communicating that update responsibility has been claimed.
    sleep(random.randrange(0, 120)) # THERE HAS TO BE A BETTER WAY
    if exists(updating_flag):
        print("Another process is updating the extent file.")
        while exists(updating_flag):
            sleep(5)
    else:
        if LRF_is_up_to_date(): # check one more time in case another process finished updating during random sleep
            break
        print("this process is responsible for creating the status shapefile")
        open(updating_flag, 'w').close()
        try:
            update_LRF()
        finally:
            os.remove(updating_flag)

The main problems with this are that the sleeps are wasteful and, more importantly, that they don't prevent race conditions - they only make them less likely.

0

There are 0 best solutions below