Avro Dynamic schema change on Hive

801 Views Asked by At

I have some data coming in avro format v1 and getting stored in HDFS under a partition dt=yyyymmdd.
Now the data is maintained with two versions, v1 and v2 under the same partition.
Is it feasible to maintain a single hive table for two different versions?

1

There are 1 best solutions below

4
OneCricketeer On

Avro defines a schema evolution protocol

If v2 has simply added a field with a default value, for example, then updating the table with that schema, it can read the entirety of the old data, as it'll simply return the default values where they are missing.

If you've broken compatibility, you must make a separate table, then union the two to get a consistent result set