AWS Glue throws AWSBadRequestException when loading DynamicFrame from s3 with local Glue docker

16 Views Asked by At

I have been using AWS Glue 4.0 docker image to locally develop and test ETL jobs. This worked perfectly for DynamoDB exports, but now I have been having troubles with loading files from S3.

I have set of files exported from AWS CloudWatch that I wanted to load into data lake. I created following DynamicFrame configuration

df_raw = glueContext.create_dynamic_frame_from_options(
    connection_type="s3",
    connection_options={
        "paths": [
            f"s3://{audit_logs_source_bucket}/exports/aws-cloudwatch/logs/audit/access-logs/api/apigw/",
        ],
        "recurse": True,
        "exclusions": json.dumps([
            "aws-logs-write-test",
        ]),
    },
    format="grokLog",
    format_options={
        "logFormat": r"%{NOTSPACE:timestamp}%{SPACE}%{GREEDYDATA:msg}",
    },
    transformation_ctx="load_new_logs",
)

but when I execute it (I have Administrator access permission set on training account) then I get following exception:

Py4JJavaError: An error occurred while calling o72.getDynamicFrame.
: org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3://bucket-operators-data-export-eu-west-1-rnd/exports/aws-cloudwatch/logs/audit/access-logs/api/apigw: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request

When I run the same code in AWS Glue Interractive Session then it works properly.

What am I doing wrong?

0

There are 0 best solutions below