we are trying to migrate few jobs from informatica to aws glue most of the logic and transformation have been completed. The final step is to insert the data back into a oracle db(not present in aws rds) which is in public. Is it possible to perform the insert and update in aws glue ?
I tried using using cx_Oracle module but it's erroring out
Even though, you can use a Glue pyspark job or a Glue python-shell job for this, Pyspark is not meant to do UPDATE/INSERT operations row by row - You can append(INSERT) or you can replace the entire table.
if you were to do it using a Python-shell job (Python V3.9 or above), you need to
download the JDBC Oracle driver and add it under "Dependent jars path" in the console. The full S3 URI should be added eg.
s3://bucket-name/jars/oracle-jdbc.jarNext, add the additional package needed. In the console, it is under "Python library path".
--additional-python-modules cx-Oracle==8.3.0You can download the correct version of whl file from Pypi.orgIf you are in a restricted network that restricts the Glue job from downloading the python package to install (you will see the below error, when adding additional packages), add the whl file for cx_Oracle into S3 and add those eg.
--additional-python-modules s3://bucket-name/whls/cx-oracle--py3-none-any.whl. Find the correct whl file from https://pypi.org/project/cx-Oracle/#files - choose for python 3.9, manylinux, x86_64Test that your job can now import the modules.