Luigi workflow engine command line parameters not being passed to workers in macOS

76 Views Asked by At

I am experiencing some problems when trying to pass global parameters to multiple workers. When I run a workflow using multiple workers, the command line parameter values are not being passed. I am wondering if this might be similar to the problem observed when luigi is run on Windows (#2247) but, in my case, on macOS. An additional issue is a difference in log formatting, it seems that logging.cfg is not passed to the workers. I opened a ticket in github but I got no response (#3236).

Next toy example summarizes my problem.


import luigi
import logging

logger = logging.getLogger('luigi-interface')

class HelloConfig(luigi.Config):
    reference = luigi.Parameter(default="World")

class HelloTask(luigi.Task):
    def run(self):
        logger.info("Hello %s!", HelloConfig().reference)

    def requires(self):
        return []

I run the task calling

luigi --module weekly_update.etl.load_ex HelloTask \
      --workers ? \
      --HelloConfig-reference "Mars"

When workers is 1 the log shows

2023-04-21 10:59:09,386 - luigi-interface - INFO - [MainThread] - Hello Mars!

but when workers is 2

2023-04-21 10:59:53,030 [INFO]-load_ex.run: Hello World!
1

There are 1 best solutions below

0
Laggs On BEST ANSWER

As of Python 3.8, MacOS now defaults to using spawn instead of fork, thus having issues that we previously only saw on Windows. You can change the start method using

import multiprocessing

multiprocessing.set_start_method('fork')

I'm not sure there is a better solution, and we're also struggling with command line parameters across multiple workers when not on Linux.