Oracle query to list all occurrences of a string and its subsequent values from CLOB field

126 Views Asked by At

I have a CLOB field in my Oracle table and would like to extract all occurrences of the string that matches the pattern 'RES_GetResData_Public_ScreenPrint' and its subsequent values (which are separated by '|' ).

Sample Data-

|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|


For the above-provided example, I used the below query to get all occurrences of the string 'RES_GetResData_Public_ScreenPrint' and its subsequent values (separated by | ) from the CLOB data.

The query is able to get all the occurrences of 'RES_GetResData_Public_ScreenPrint', but returns only the first subsequent value in all the rows.

**Query - **

select objectid, 
    REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1, 
    REGEXP_SUBSTR(DATA, '([^\|]+)', 1, 3) AS column2
from test_table cross join table(cast(multiset(select level 
from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))

Actual Result -

objectid column1 column2
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028079
12345 RES_GetResData_Public_ScreenPrint013 F28028079

Expected Result -

objectid column1 column2
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028081
12345 RES_GetResData_Public_ScreenPrint013 F28028007

I'm new to writing queries and any suggestion would be helpful.

Thank you !!

1

There are 1 best solutions below

3
MT0 On BEST ANSWER

You do not need regular expression for this; although it is more to type you will probably find that simple string functions are faster than regular expressions:

WITH line_bounds (objectid, data, spos, epos) AS (
  SELECT objectid,
         data,
         1,
         INSTR(data, CHR(10), 1)
  FROM   test_table
UNION ALL
  SELECT objectid,
         data,
         epos + 1,
         INSTR(data, CHR(10), epos + 1)
  FROM   line_bounds
  WHERE  epos > 0
)
SEARCH DEPTH FIRST BY objectid SET orderid,
lines (objectid, line) AS (
  SELECT objectid,
         CASE epos
         WHEN 0
         THEN SUBSTR(data, spos)
         ELSE SUBSTR(data, spos, epos -spos)
         END
  FROM   line_bounds
),
match_bounds (objectid, line, res_spos, res_epos, next_epos) AS (
  SELECT objectid,
         line,
         INSTR(line, '|RES_GetResData_Public_ScreenPrint'),
         INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 1),
         INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 2)
  FROM   lines
)
SELECT objectid,
       SUBSTR(line, res_spos + 1, res_epos - res_spos - 1) AS column1,
       SUBSTR(line, res_epos + 1, next_epos - res_epos - 1) AS column2
FROM   match_bounds
WHERE  res_spos > 0;

Which, for the sample data:

CREATE TABLE test_table (objectid, data) AS
SELECT 12345,
       EMPTY_CLOB() || '|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|
|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|
|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|'
FROM   DUAL;

Outputs:

OBJECTID COLUMN1 COLUMN2
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028081
12345 RES_GetResData_Public_ScreenPrint013 F28028007

If you did want to use regular expressions (please compare the two solutions performance on your data) then you can match the entire pattern and extract the value of the capturing groups:

select objectid, 
       REGEXP_SUBSTR(
         DATA,
         '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
         1,
         column_value,
         NULL,
         1
       ) AS column1, 
       REGEXP_SUBSTR(
         DATA,
         '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
         1,
         column_value,
         NULL,
         2
       ) AS column2
from   test_table
       cross join table(
         cast(
           multiset(
             select level 
             from   dual
             connect by level <= regexp_count(DATA, '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|')
           ) as sys.odcinumberlist
         )
       )

Which has the same output.

fiddle