Ingest data from IP blocked SFTP server into Google Cloud

152 Views Asked by At

For a use case that I am currently working on, I need to ingest data from an SFTP server into Google Cloud, where the end destination for the raw data is BigQuery. Access to the data on the SFTP server is locked due to IP blocking. What would be a good way to access the data on the server from GCP? The server provider has accepted to whitelist a static IP address or an IP range from GCP, but I am unsure how to set that up and link it to my connection. My initial idea was to use GCPs native SFTP Integration Connector. I would be thankful for step by step instructions to succeed with this.

Many thanks in advance!

Kind regards, Bertan

1

There are 1 best solutions below

0
al-dann On

I am not sure if I can answer on your direct question about a specific feature (static egress IP address) of a SFTP Integration Connector; however I can put some ideas here, so you can search, read and may find a better solution in your circumstances.

First of all, I think it might be possible to divide the problem into 2 steps -

  • (1) ingest files into GCP (i.e. could buckets), and
  • (2) load data from the cloud buckets into the BigQuery.

I guess the second step is relatively simple, so there might be no issues there...

There may be many options how to implement the first step (connect to SFTP server and fetch files). The choice depends on time, budget (CAPEX, OPEX), skills, future plans, non functional requirements, etc.

Whitelisted static source IP addresses - can be considered as one of non functional requirements.

If some specific managed service is to be used, that service should allow such IP addresses whitelisting.

If a (bespoke) solution is to be based on conventional (general purpose) resources and services - Compute Engine, Kubernetes Engine, Cloud Run, Cloud Functions, App Engine - it might be necessary to use a Cloud NAT service, which allows resources without external IP addresses create outbound connections to the internet, and use/expose some external static IP address, so that external parties assume that requests/packets come from that static IP address.

Just in case you would like to use Cloud Functions (or Cloud Run, or App Engine, as all of them can use Serverless VPC Access), some high level description is provided in this SO answer: Assign static IP to Cloud Function with proper description of routing cloud function egress through the VPC network.

If you would like to read an overview how the solution might be build by using cloud functions - some time ago I composed this SO answer: How to fetch files from the SFTP server by using cloud functions based solution