I see that the Avro messages have the schema embedded, and then the data in binary format. If multiple messages are sent and new avro files are getting created for every message, is not Schema embedding an overhead? So, does that mean, it is always important for the producer to batch up the messages and then write, so multiple messages writing into one avro file, just carry one schema? On a different note, is there an option to eliminate the schema embedding while serializing using the Generic/SpecificDatum writers?
Schema in Avro message
2.9k Views Asked by Roshan Fernando At
2
There are 2 best solutions below
0

You are correct, there is an overhead if you write a single record, with the schema. This may seem wasteful, but in some scenarios the ability to construct a record from the data using this schema is more important than the size of the payload.
Also take into account that even with the schema included, the data is encoded in a binary format so is usually smaller than Json anyway.
And finally, frameworks like Kafka can plug into a Schema Registry, where rather than store the schema with each record, they store a pointer to the schema.
Related Questions in APACHE
- Graph-SLAM when it uses only odometry information, will it still run? and what is the outcome?
- Camera Calibration with OpenCV: Using the distortion and rotation-translation matrix
- Robotics: Homogenous Transformation Matrix for DH parameters
- Communicating between a PC and UR5 Universal Robotics Robot Arm using TCP/IP LabVIEW
- publishing trajectory_msgs/jointtrajectory msgs
- How to publish a `geometry_msgs/PoseArray` from the command line?
- iRobot Create - Playing two songs
- Making Dataset for 4 Motors Inverse kinematics using ANFIS in MATLAB
- Robotics - Recursive function for fractal.
- 3D Matrix in Simulink which can be 2D is not supported
Related Questions in AVRO
- Graph-SLAM when it uses only odometry information, will it still run? and what is the outcome?
- Camera Calibration with OpenCV: Using the distortion and rotation-translation matrix
- Robotics: Homogenous Transformation Matrix for DH parameters
- Communicating between a PC and UR5 Universal Robotics Robot Arm using TCP/IP LabVIEW
- publishing trajectory_msgs/jointtrajectory msgs
- How to publish a `geometry_msgs/PoseArray` from the command line?
- iRobot Create - Playing two songs
- Making Dataset for 4 Motors Inverse kinematics using ANFIS in MATLAB
- Robotics - Recursive function for fractal.
- 3D Matrix in Simulink which can be 2D is not supported
Related Questions in SPARK-AVRO
- Graph-SLAM when it uses only odometry information, will it still run? and what is the outcome?
- Camera Calibration with OpenCV: Using the distortion and rotation-translation matrix
- Robotics: Homogenous Transformation Matrix for DH parameters
- Communicating between a PC and UR5 Universal Robotics Robot Arm using TCP/IP LabVIEW
- publishing trajectory_msgs/jointtrajectory msgs
- How to publish a `geometry_msgs/PoseArray` from the command line?
- iRobot Create - Playing two songs
- Making Dataset for 4 Motors Inverse kinematics using ANFIS in MATLAB
- Robotics - Recursive function for fractal.
- 3D Matrix in Simulink which can be 2D is not supported
Related Questions in AVRO-TOOLS
- Graph-SLAM when it uses only odometry information, will it still run? and what is the outcome?
- Camera Calibration with OpenCV: Using the distortion and rotation-translation matrix
- Robotics: Homogenous Transformation Matrix for DH parameters
- Communicating between a PC and UR5 Universal Robotics Robot Arm using TCP/IP LabVIEW
- publishing trajectory_msgs/jointtrajectory msgs
- How to publish a `geometry_msgs/PoseArray` from the command line?
- iRobot Create - Playing two songs
- Making Dataset for 4 Motors Inverse kinematics using ANFIS in MATLAB
- Robotics - Recursive function for fractal.
- 3D Matrix in Simulink which can be 2D is not supported
Related Questions in AVRO4S
- Graph-SLAM when it uses only odometry information, will it still run? and what is the outcome?
- Camera Calibration with OpenCV: Using the distortion and rotation-translation matrix
- Robotics: Homogenous Transformation Matrix for DH parameters
- Communicating between a PC and UR5 Universal Robotics Robot Arm using TCP/IP LabVIEW
- publishing trajectory_msgs/jointtrajectory msgs
- How to publish a `geometry_msgs/PoseArray` from the command line?
- iRobot Create - Playing two songs
- Making Dataset for 4 Motors Inverse kinematics using ANFIS in MATLAB
- Robotics - Recursive function for fractal.
- 3D Matrix in Simulink which can be 2D is not supported
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
I am reading following points from Avro Specs
You are not supposed to use data serialization system, if you want to write 1 new file for each new message. This is opposed to goal of serialization. In this case, you want to separate metadata and data.
There is no option available to eliminate schema, while writing avro file. It would be against avro specification.
IMO, There should be balance while batching multiple messages into single avro file. Avro files should be ideally broken down to improve i/o efficiency. In case of HDFS, block size would be ideal avro file size.