I'm using row_number window function in hive CTE(with clause)
with data(
select 1 a,1 b
union select 1,2
union select 1,3
union select 1,4
...
union select 1,26
),
data_with_row_num (
select a,b,row_number() over(partition by 1) as rn from data
)
select * from data_with_row_num
union all
select * from data_with_row_num
I think the row_number should generate fixed id in the CTE block, I union the temp table(data_with_row_num) twice, and expect to get the same b with same rn. while I get the data such as
a b rn
1 4 1
1 2 1
1 9 2
1 3 2
...
1 19 26
1 24 26
it seems the same rn has different b , my question is why this happen .I think the rn should be generated in the with block, but it seems it is re-generated when it is used.
The reason is the CTE(Common Table Expressions) is just expression, which means it is just sth like a script, it will not materialize the result . I do the below query
and the result is
the random get the different result, if really need to materialize the temp result, cache table(in spark) might be a solution