We have a Sagemaker Notebook that runs a Pipeline, and we intend for the pipeline to access RDS directly. Everything runs until the RDS connection which always times out, as that RDS does not have a public IP. (And we would prefer it stay that way.) So let's run the pipeline inside a VPC?
This is from this AWS blog post: Connect SageMaker Studio Notebooks in a VPC to External Resources This is using our default VPC. Adding the VPC endpoints allowed for connections to S3 and Cloud Watch (at first we didn't even see the runtime logs), security groups, IAM, etc. all should be good.
Our major issue is that the python libraries cannot install, because there is no outside internet. The VPC already had an Internet Gateway, and I could see that used in the default Subnet's route table. (The subnets looked auto-created for this region/AZ.) I created a NAT Gateway on some of those default Subnets, referenced only those subnets when starting the pipeline, but python could not reach its external repository.
Do I need to do something special to make that NAT Gateway work with the pipeline to reach the outside internet? Will a Sagemaker pipeline work on a default VPC or do I need to create a new/custom one? How do I know if a Subnet is public or private - that blog post says only to use private subnets?
Thanks.
