Snowflake - FLATTEN() - How does it work internally?

141 Views Asked by At

I am learning Snowflake right now, and the way FLATTEN() works is a bit counter intuitive.

A simple query.

SELECT
    raw:first_name::STRING AS FName,
    raw:last_name::STRING AS LName,
    f.value::STRING AS Skill
FROM tbl,
    TABLE(FLATTEN(raw:Skills)) f
ORDER BY raw:id::INT
LIMIT 5;

Elegantly, it flattens the Skills array and returns this.

FNAME LNAME SKILL
Flossy Fasson PS3
Flossy Fasson Vlookup
Flossy Fasson Go
Celeste Hubert Tcl-Tk
Celeste Hubert Zines

My questions are:

  1. Does Snowflake just infer that the source of raw:Skills is table tbl, as table name is not explicitly expressed here?

  2. How does Snowflake align the array (as the right table) to the left table? It looks like a Cross Join with no join keys, if so, every array should be joined to each and every row on the left and result in incorrect alignment.

2

There are 2 best solutions below

0
Lukasz Szozda On

Does Snowflake just infer that the source of raw:Skills is table tbl, as table name is not explicitly expressed here?

SELECT
    raw:first_name::STRING AS FName,
    raw:last_name::STRING AS LName,
    f.value::STRING AS Skill
FROM tbl, 
    LATERAL TABLE(FLATTEN(input=> tbl.raw:Skills)) f

Related: Lateral Join

2
Simeon Pilgrim On

the first question is same "inference" as when you use a column name but do not use table name or the alias, thus these are all the same:

select a 
from table_name
select a 
from table_name as t
select t.a 
from table_name as t
select table_name.a 
from table_name

now when this gets tricky is if two tables have the same column name, you must say which one you want

select id 
from table_a_name
cross join table_b_name

if they both have an id this will not compile.

In some databases if you join on id between two tables:

select id 
from table_a_name
join table_b_name 
   on table_a_name.id = table_b_name.id

this is valid, but not in Snowflake.

the FLATTEN is like a LEFT JOIN every input row is match to every expanded row, and both sources and the flatten objects (and you can do many FLATTEN and it just combinations, aka cross joins all the way down).

the comma method you have show is the "old SQL style" and the new style would be a CROSS JOIN, but LATERAL is also valid as that is more akin to what is happening. Snowflake support mean dialects from other RMDBs, to make porting SQL easier, but it does not cover all cases.