Why is NULL handled differently in aggregate and scalar functions?

Question

Why is NULL handled differently in aggregate and scalar functions?

210 Views Asked by David542 At 09 June 2023 at 20:51

Let's take the following two queries:

select greatest(1, val) from unnest([null]) as val
# null

And:

select sum(val) from unnest([1, null]) as val
# 1

My question is why is null handled differently between aggregate functions and normal scalar functions? Of course, the second query is more useful when null values are not counted. But I would think the first one would be more useful as well if it were to return 1 instead of null. If null is an 'unknown value', then wouldn't both functions have a supposedly unknown answer (i.e., null), not just the latter?

If there is a historical reason for this it would be great to know as well. Hopefully someone can shine some light on why there's a difference between the two.

Original Q&A

There are 2 best solutions below

Parfait On 09 June 2023 at 22:57

Without specific citation, reason may be due to a comparison function like GREATEST versus a computing function like SUM.

In ANSI SQL, comparing any value (using operators such as <, >, =, <>) to NULL should always return unknown or NULL. Interestingly, different RDBMSs may not by default follow this rule completely so WHERE my_column <> NULL can retun non-zero rows. See discussion in SQL Server's SET ANSI_NULLS mode. Also, see Postgres' transform_null_equals mode.

On other hand, also in ANSI SQL, SUM like most aggregate functions except COUNT(*) ignore NULLs. Hence, you observe the different behaviors.

**Salman A** · Accepted Answer · 2023-06-13T11:24:38.717000

The aggregate functions are supposed to skip null values. Here is an excerpt from SQL-92 standard:

<set function specification>

[...]

Otherwise, let TX be the single-column table that is the result of applying the <value expression> to each row of T and eliminating null values. If one or more null values are eliminated, then a completion condition is raised: warning- null value eliminated in set function.

The greatest function on the other hand should return null if any argument is null¹.

My question is why is null handled differently between aggregate functions and normal scalar functions?

Generally speaking, scalar functions return null if any of their arguments is null². Here is Ask Tom's take on this exact same issue, I personally agree with everything that is said in the answer:

and Tom said...

pretty much all single row functions return NULL if any of their inputs to compare are null.
ops$tkyte%ORA10GR2> select round( 1.2, null ) from dual;

ROUND(1.2,NULL)
---------------
NULL
when you ask "what is the greatest of 1, NULL, 2 - the answer is "we don't know, because NULL is unknown"

Aggregates are defined to "skip nulls" (ANSI says so)

But functions in general that take a set of inputs, will return NULL when a deciding input is NULL.

http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions001.htm#i88893

If you call a SQL function with a null argument, then the SQL function automatically returns null

¹ Different implementations of the greatest function existed long before it was standardized. MySQL and Oracle return null if any argument is null; while PostgreSQL and SQL Server ignore null values.

² Conversely, vendor specific scalar functions exist that skip null values. One example is concat_ws (MySQL, PostgreSQL, SQL Server) function that is designed to simplify concatenation of (possibly null) strings using a separator.

Why is NULL handled differently in aggregate and scalar functions?

There are 2 best solutions below

Related Questions in SQL

Related Questions in ANSI-SQL

Related Questions in DATABASE-AGNOSTIC

Trending Questions

Popular # Hahtags

Popular Questions