I got a python script which shall be executed as an Azure ML pipeline step. This script expects to find several file-sets in a certain tree structure e.g.
data/
├─ project_A/
│ ├─ data.csv
│ ├─ config.toml
├─ project_B/
│ ├─ data.csv
│ ├─ config.toml
├─ project_.../
│ ├─ ...
The script receives the base-path e.g. ./data/ as command line argument and walks the sub-directories.
Each file-set e.g. project_A is made available as a URI_FOLDER Azure ML Data Asset.
The Azure ML Python sdk v2 is used and the definition of the component looks like this:
prep = command(
name="preprocessing",
code="./src/preprocessing.py",
# actual paths passed at pipeline level
inputs=dict(
project_A=Input(type=AssetTypes.URI_FOLDER, mode=InputOutputModes.RO_MOUNT),
project_B=Input(type=AssetTypes.URI_FOLDER, mode=InputOutputModes.RO_MOUNT),
project_...),
command="python preprocessing.py <base_dir>",
)
I would like to know how ensure a certain tree structure on the compute such that I can pass the <base_dir> to the script.
Whatever you give in
inputs, it is passed to command line arguments in the Python file. Here is the sample:If you observe
iris_csv,learning_rate, andboosting, they are given in the inputs parameter, which is further passed to Python command arguments as${{inputs.iris_csv}},${{inputs.learning_rate}}, and${{inputs.boosting}}. It's not like passing arguments in the command and using them in theinputsparameter.In your case, if you are passing only
base_path, give it in theinputsparameter asuri_folderand pass it to the command. Then take a path toproject_A,project_B, etc., inside your Python file like below:project_A_path = os.path.join(<base_path>,'project_A')andproject_B_path = os.path.join(<base_path>,'project_B')Command definition:
Or if there are only a few project folders, you can pass the project folder directly along with the base path as well:
I would still recommend you use only the base path in command arguments and construct the required project path inside the Python file.
Refer to this notebook for more about the command job.