AWS Glue Insert and update into oracle table

29 Views Asked by At

we are trying to migrate few jobs from informatica to aws glue most of the logic and transformation have been completed. The final step is to insert the data back into a oracle db(not present in aws rds) which is in public. Is it possible to perform the insert and update in aws glue ?

I tried using using cx_Oracle module but it's erroring out

1

There are 1 best solutions below

3
Ananth Tirumanur On

Even though, you can use a Glue pyspark job or a Glue python-shell job for this, Pyspark is not meant to do UPDATE/INSERT operations row by row - You can append(INSERT) or you can replace the entire table.

if you were to do it using a Python-shell job (Python V3.9 or above), you need to

  • download the JDBC Oracle driver and add it under "Dependent jars path" in the console. The full S3 URI should be added eg. s3://bucket-name/jars/oracle-jdbc.jar

  • Next, add the additional package needed. In the console, it is under "Python library path". --additional-python-modules cx-Oracle==8.3.0 You can download the correct version of whl file from Pypi.org

  • If you are in a restricted network that restricts the Glue job from downloading the python package to install (you will see the below error, when adding additional packages), add the whl file for cx_Oracle into S3 and add those eg. --additional-python-modules s3://bucket-name/whls/cx-oracle--py3-none-any.whl. Find the correct whl file from https://pypi.org/project/cx-Oracle/#files - choose for python 3.9, manylinux, x86_64

Download error on https://pypi.org/project/cx-Oracle : [Errno 99] Cannot assign requested address -- Some packages may not be found! Couldn't find index page for 'cx_Oracle' (maybe misspelled?)

Test that your job can now import the modules.

import cx_Oracle
import pandas as pd