I have setting up a processing job in a sagemaker pipeline, and my project has following files
projectA
- run.py
- requirements.txt
I have some dependencies that i need to install before i run my script , which are listed in requirements.txt. I'm not sure , how can i set up the processing step , such that it installs the requirements before it runs my script.
any thoughts?
from sagemaker.processing import ScriptProcessor
from sagemaker.workflow.steps import ProcessingStep
script_processor = ScriptProcessor(
instance_type='ml.t3.medium',
instance_count=1,
...
command = ['python']
)
processing_step = ProcessingStep(
name='p_step',
processor= script_processor,
code = './run.py',
inputs = [
...]
)
Based on my knowledge of SageMaker Pipelines and SageMaker Processing Jobs, there are 2 ways to manage dependencies - either you create an image and specify it in the
image_uriwhen defining theScriptProcessorobject or you install them during the job runtime. Here is how to do the second approach.I provide the following example (which uses the
SKLearnProcessorclass for the job):1.Define the processing job:
The second item in inputs, should point out to the requirements.txt file, which I recommended you bundle everything together in a my_job directory.
4.Proceed to add the import statements of your dependencies as normal. 5. Verify your project structure is as follows:
Note on this approach: Only do it if you trust what is in the requirements.txt file and you don't want to build the image and push it to ECR.
Do let me know if this solve your issue and/or you have questions on the code.