I am able to use the avro-tools-1.7.7.jar to take json data and avro schema and output a binary Avro file as shown here https://github.com/miguno/avro-cli-examples#json-to-avro. However, I want to be able to do this programmatically using the Avro python api: https://avro.apache.org/docs/1.7.7/gettingstartedpython.html.
In their example they show how you can write a record at a time into a binary avro file.
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
schema = avro.schema.parse(open("user.avsc").read())
writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"name": "Alyssa", "favorite_number": 256})
writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
writer.close()
My use case is writing all of the records at once like the avro-tools jar does from a json file, just in python code. I do not want to shell out and execute the jar. This will be deployed to Google App Engine if that matters.
This can be accomplished with
fastavro. For example, given the schema in the link:twitter.avsc
And the json file:
twitter.json
You can use something like the following script to write out an avro file: