Why does MutableAggregationBuffer in UserDefinedAggregateFunction require a bufferSchema?

232 Views Asked by Ghastone At 13 August 2019 at 16:54

I am looking into implementing a UserDefinedAggregateFunction in spark and see that a bufferSchema is needed. I understand how to create it, but my issue is why does it require a bufferSchema? Should it not only need a size (number of elements for use in aggregation), an inputSchema and a dataType? Doesn't a bufferSchema constrain it to UserDefinedTypes in the intermediate steps in sql?

Original Q&A

There are 1 best solutions below

Raphael Roth On 13 August 2019 at 18:31

This is needed because the buffer schema can differ from the input type. For example if you want to calculate the average (arithmetic mean) of doubles, the buffer needs a count and a sum in this case See e.g. the example from databricks how to calculate the geometric mean : https://docs.databricks.com/spark/latest/spark-sql/udaf-scala.html

Why does MutableAggregationBuffer in UserDefinedAggregateFunction require a bufferSchema?

There are 1 best solutions below

Related Questions in JAVA

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in USER-DEFINED-AGGREGATE

Trending Questions

Popular # Hahtags

Popular Questions