I have been using AWS Glue 4.0 docker image to locally develop and test ETL jobs. This worked perfectly for DynamoDB exports, but now I have been having troubles with loading files from S3.
I have set of files exported from AWS CloudWatch that I wanted to load into data lake. I created following DynamicFrame configuration
df_raw = glueContext.create_dynamic_frame_from_options(
connection_type="s3",
connection_options={
"paths": [
f"s3://{audit_logs_source_bucket}/exports/aws-cloudwatch/logs/audit/access-logs/api/apigw/",
],
"recurse": True,
"exclusions": json.dumps([
"aws-logs-write-test",
]),
},
format="grokLog",
format_options={
"logFormat": r"%{NOTSPACE:timestamp}%{SPACE}%{GREEDYDATA:msg}",
},
transformation_ctx="load_new_logs",
)
but when I execute it (I have Administrator access permission set on training account) then I get following exception:
Py4JJavaError: An error occurred while calling o72.getDynamicFrame.
: org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3://bucket-operators-data-export-eu-west-1-rnd/exports/aws-cloudwatch/logs/audit/access-logs/api/apigw: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request
When I run the same code in AWS Glue Interractive Session then it works properly.
What am I doing wrong?