So either this is an Oracle bug, or I'm being a bit slow today.
This SQL executes fine:
WITH car_paint_options AS (
SELECT 'Escort' car_model, 'red,blue' paint_opts FROM dual UNION
SELECT 'Puma' car_model, 'black' paint_opts FROM dual
)
SELECT row_number() over(order by level) rn, level, ep.car_model,
regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
FROM car_paint_options ep
CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null
However, it gives me the wrong answer (4 lines of data):
rn level car_model paint_opt
--- ----- --------- ---------
1 1 Puma black
2 1 Focus red
3 2 Focus blue
4 2 Focus blue
The desired output is only 3 lines of data as shown here:
rn level car_model paint_opt
--- ----- --------- ---------
1 1 Puma black
2 1 Focus red
3 2 Focus blue
I understand why the problem occurs. Level 2 records are attempting to connect back to level 1 records, and what's happening is that the Focus:blue option is matching back successfully to both Puma:black and Focus:red parent rows.
So now I'm thinking: "fine, simple enough fix, let's constrain the level 2 objects so that they only ever link back to parent objects of the same car_model":
WITH car_paint_options AS (
SELECT 'Escort' car_model, 'red,blue' paint_opts FROM dual UNION
SELECT 'Puma' car_model, 'black' paint_opts FROM dual
)
SELECT row_number() over(order by level) rn, level, ep.car_model,
regexp_substr(ep.paint_opts, '[^,]+', 1, level) paint_opt
FROM car_paint_options ep
CONNECT BY regexp_substr (ep.paint_opts, '[^,]+', 1, level) is not null
AND ep.car_model = prior ep.car_model
But it results in an error:
ORA-01436: CONNECT BY loop in user data
Tested in both Oracle 12c + 19c. Can someone confirm I'm not doing anything silly here? Am I right in thinking it's a bug?
Option 1
The normal "hack" for this is to add something to the
CONNECT BYclause that gives each row a unique value within a filter that always evaluates to true such asSYS_GUID() IS NOT NULLorDBMS_RANDOM.VALUE() IS NOT NULLand prevents the hierarchical query from detecting cycles:Which outputs:
Option 2
Regular expression functions are slow and it is often faster to use simple string functions, even if it means you need to type more (and if simple string functions are faster than regular expressions then they are even faster than regular expressions plus generating a GUID for each row):
Which outputs the same.
Option 3
If you did want to still use regular expressions and want to avoid
SYS_GUIDhacks then, from Oracle 12, you can generate the level values using aLATERALjoin (orCROSS APPLY):Which also outputs the same.
Performance
A comparison of the performance of different options for splitting delimited strings is given in this answer. The summary is that option 2 tends to be the most performant (and the most to type).
fiddle