ANSI sql for comparison with string and non-string

340 Views Asked by At

Let's say I have any one of the following expression:

SELECT
    DATE '2014-01-01' < '2014-02-01',
    DATE '2014-01-01' < '321',
    9 < '10',
    9 < 'a'

Is there a recommendation or requirement in the SQL standard of how these should be compared? I suppose the three levels or 'strictness' would be the following:

  1. Raise an error [most strict] -- all 4 expressions above would fail.
  2. Try casting the string to the non-string type, if it doesn't work raise an error -- expressions 2 and 4 above would fail.
  3. Try casting the string to the non-string type, if it fails fallback to casting the non-string to a string -- all 4 above work.
  4. Cast the non-string operand to a string -- all 4 above work.

It seems BigQuery uses the second approach, postgres uses something like 2/3 (only the last one fails), mysql uses either 3 or 4 (no fail).

Does the standard make any recommendations here?

3

There are 3 best solutions below

2
Salman A On BEST ANSWER

Does the standard make any recommendations here?

I believe you're asking if there are any official rules regarding automatic type conversion for comparison? The answer is no*.

For comparison operation, the specs mention that data types [...] shall be comparable. Comparison of different data types is not described but it does say that implicit type conversion can occur in expressions [...]. So RDBMS is allowed to convert the data type of one or both operands to compare them.

That being said, the rules for automatic type conversion are implementation-dependent. The rules vastly differ across RDBMS. Consult your RDBMS' documentation to understand them.

SQL Server for example, uses data type precedence to convert the data type of one of the operands to match the other:

  • For DATE '2014-01-01' < '2014-02-01', the varchar value will be converted to date for comparison
  • For 9 < '10', the varchar value will be converted to int for comparison

MySQL has a different set of rules:

  • For DATE '2014-01-01' < '2014-02-01', both values will be converted to timestamps (not timestamp data type) for comparison
  • For 9 < '10', both values will be converted to floating point numbers for comparison

Unfortunately implicit conversions have too many gotchas and they should be avoided. For example 2.0 = '2.01' is true in SQL Server and 2 = '2foo' is true in MySQL.

Use the CAST function, variables of correct type (DECLARE @userdate AS DATE = '20120201') or appropriate functions (STR_TO_DATE('01-02-2012','%d-%m-%Y')) on strings.


* The answer is based on SQL-92 standard — very old but still relevant to the question.

10
Hogan On

The SQL standard for conversion is using the CAST keyword

CAST ( expression AS data_type )

I would suggest using that if you want to be "standard"

0
VonC On

The ISO/IEC 9075 SQL:2016 latest new features (also discussed here or in this PDF) did not include much regarding casting.

That standard does not dictate every detail of how a specific implementation should handle all situations. Instead, it sets general guidelines and requirements that implementations should adhere to, while allowing some flexibility for individual vendors to tailor their products.

That means any solution based on CASTing will need to be checked against the specific documentation of your SQL server.

SELECT
    DATE '2014-01-01' < DATE '2014-02-01',
    DATE '2014-01-01' < CAST('321' AS DATE),
    CAST(9 AS VARCHAR) < '10',
    CAST(9 AS VARCHAR) < 'a'

or

SELECT
    DATE '2014-01-01' < CAST('2014-02-01' AS DATE),
    9 < CAST('10' AS INTEGER)

While CAST is generally more widely supported, you might need to use a different function or syntax depending on the specific SQL implementation you are working with.
For example, SQL Server uses the CONVERT function, which is similar to CAST but also allows you to specify a style argument for certain conversions.