I have a use case where I need programmatically copy data from one or more JSON files into a Snowflake table. However, when I copy the files, I would also need to do an upsert of records. For example, if I start out with following JSON:
[
{
"id": 12323,
"firstName": "Joe",
"lastName": "Johnson",
"email": "[email protected]"
}
]
I would want to copy this record into a VARIANT column of my target table.
Then I would get another JSON file with following data:
[
{
"id": 12323,
"firstName": "Joe",
"lastName": "Johnson",
"email": "[email protected]"
},
{
"id": 45613,
"firstName": "Jane",
"lastName": "Doe",
"email": "[email protected]"
}
]
The first record has the same id field so it's an update whereas the second record would be a insert so my target table would now have two rows. What's the best way to achieve something like this at scale(large number of json files per upsert as well lot of different concurrent upserts at the same time)? I know Snowflake supports JDBC and Spark but I am not sure if there's an efficient way to handle this kind of scenario.