old aws-glue libraries in the Glue streaming ETL job 4.0?

21 Views Asked by At

I am trying to convert a spark df to a glue Dynamic frame in a streaming spark job, but it fails to do that with the error:

TypeError: DynamicFrame.fromDF() missing 1 required positional argument: 'name'

However, According to the documentation here, the name is optional since Glue 3.0.

How could this be solved without waiting for AWS to fix it? Is there any way I can enforce a certain version of glue libraries in the job itself? The older versions don't support the transformations that I am expecting it to do out of the box. The problem is only with the streaming job and doesn't exist with the normal glueetl job.

Here is some more of the code..

def processBatch(df, batchId):
    print("processing batch", batchId)
    print(df.schema)
    ddf = DynamicFrame.fromDF(df, glueContext) # Error source
    ddf.show(2)
    ddf.printSchema()
    if (df.count() > 0):
        ddf_t = ddf.map(f=apply_mapping)
        ddf_t.show(2)
        ddf_t.toDF().writeTo("glue_catalog.new_test.cdc_processed").createOrReplace()
        
glueContext.forEachBatch(
    frame = dynamic_frame,
    batch_function = processBatch,
    options = {
        "windowSize": window_size,
        "checkpointLocation": "s3://somebucketname/checkpoint",
    }
)
0

There are 0 best solutions below