"Invalid sync!" error while converting avro files to json in AWS Glue

105 Views Asked by Harish Bharatham At 14 July 2023 at 00:16

I'm trying to convert a lot of avro files to json. No need for any compression or repartition, just 1-1 conversion would work. I did this before on one batch of files which worked fine but on a different batch I'm getting "An error occurred while calling o100.pyWriteDynamicFrame. Invalid sync!". I'm using standard code from AWS docs, not sure what's causing this, I'm thinking it has to do with reading or writing avro to json? Any help appreciated.

data_source_frame = glueContext.create_dynamic_frame.from_options(
            connection_type="s3",
            connection_options={
                "paths": [S3_inputpath]
            },
            format="avro",
        )

data_destination_frame = glueContext.write_dynamic_frame.from_options(
            frame=data_source_frame,
            connection_type="s3",
            connection_options={"path": S3_outputpath},
            format="json",
        )

Here's the error I'm getting:

py4j.protocol.Py4JJavaError: An error occurred while calling o100.pyWriteDynamicFrame.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 52 in stage 0.0 failed 4 times, most recent failure: Lost task 52.3 in stage 0.0 (TID 73) (172.35.94.104 executor 5): org.apache.avro.AvroRuntimeException: java.io.IOException: Invalid sync!

I read in aws docs that partitioning and grouping are not working with avro format in Glue so I'm no using those, tried changing the format to parquet and csv, checked the directory for null and empty files and removed them, but still getting the same error.

Original Q&A

"Invalid sync!" error while converting avro files to json in AWS Glue

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in JSON

Related Questions in PYSPARK

Related Questions in AWS-GLUE

Related Questions in SPARK-AVRO

Trending Questions

Popular # Hahtags

Popular Questions