Metaflow and downloading data from outside of s3

46 Views Asked by TCKnight8 At 20 December 2023 at 19:42

TL;DR - Is there a way to use metaflow to download data made available from an outside source via url or is it not possible at this time?

Full version: I'm trying to set up a flow that downloads data using an API (USDA-NASS) and saves it into s3. For most data, this works fine. However, there is one subset of the data that can't be accessed this way (gridded condition/progress). Instead, it requires running the url for the file ("https://www.nass.usda.gov/Research_and_Science/Crop_Progress_Gridded_Layers/datasets/{file_name}.zip") on the target webpage through requests and then processing the data. This creates a problem for metaflow, as it seems to try and find the file in s3, only to throw up a FileNotFound error. I've asked around my colleagues and looked at the metaflow documentation, but nothing I've referenced contains info on how to go about this task. Instead most resources talk about getting data from s3 and nowhere else. Is there a way to make this work?

Original Q&A

Metaflow and downloading data from outside of s3

There are 0 best solutions below

Related Questions in AMAZON-WEB-SERVICES

Related Questions in AMAZON-S3

Related Questions in REQUEST

Related Questions in NETFLIX-METAFLOW

Trending Questions

Popular # Hahtags

Popular Questions