Understanding least common type in databricks

Question

Understanding least common type in databricks

44 Views Asked by Dhruv At 28 March 2024 at 07:50

I was reading the data type rules and found about least common type.

I have a doubt. What is the least common type of STRING and INT? The referred link gives the following example saying the least common type is BIGINT.

-- The least common type between an INT and STRING is BIGINT
> SELECT typeof(coalesce(5, '6'));
  BIGINT

However, if I run this on Databricks 13.3, Spark 3.4.1 and Scala 2.12 configurations it gives the following output.

> SELECT typeof(coalesce(5, '6'));
  STRING

If I run explain on the query, then I get the following output.

> EXPLAIN EXTENDED SELECT typeof(coalesce(5, '6'));
== Analyzed Logical Plan ==
typeof(coalesce(5, 6)): string
Project [typeof(coalesce(cast(5 as string), 6)) AS typeof(coalesce(5, 6))#239637]
+- OneRowRelation

It shows that INT is casted to STRING so the least common type should be STRING.

However, if I run an equal to operation between STRING and INT, then the explain query gives different output.

> EXPLAIN EXTENDED SELECT '1.00' = 1;
== Analyzed Logical Plan ==
(1.00 = 1): boolean
Project [(cast(1.00 as int) = 1) AS (1.00 = 1)#239661]
+- OneRowRelation

Here STRING is casted to INT.

Both coalesce and = rely on least common type and are giving different result types. So, what is exactly the least common type of STRING and INT? Also, which is more narrower STRING or INT? Any simple data type can be converted to STRING without any data loss, like an INT can be converted to STRING without any loss. However, the reverse is not true. If a STRING containing a decimal value is converted to INT, there would be a value loss. So shouldn't STRING be the wider type?

Original Q&A

There are 1 best solutions below

**DileeprajnarayanThumula** · Answer 1 · 2024-03-30T03:14:20.677000

The least common type between STRING and INT is STRING. In Databricks and Spark, when using the coalesce() function, the least common type is determined by the order of the arguments.

Note:when performing an equal to (=) operation, Spark performs implicit type casting. In this case, the STRING '1.00' is successfully casted to an INT, resulting in the same type (INT) for both operands of the equal to operation.

%sql
SELECT
  typeof(coalesce(value_int, value_string)) AS least_common_type
FROM
  test_table;

Results:

least_common_type
string
string
string
string

Both coalesce() and = rely on the least common type, but they might produce different result types depending on the operation and the order of the arguments. In general, STRING is considered wider than INT because any simple data type can be safely converted to STRING without any loss of data.

Understanding least common type in databricks

There are 1 best solutions below

Related Questions in APACHE-SPARK-SQL

Related Questions in DATABRICKS

Related Questions in AZURE-DATABRICKS

Trending Questions

Popular # Hahtags

Popular Questions