Oracle query to list all occurrences of a string from CLOB field

161 Views Asked by At

I have a CLOB field in my Oracle table and would like to extract all occurrences of the string which matches the pattern 'RES_GetResData_Public_ScreenPrint'.

**Example - **

|RES_GetResData_Public_ScreenPrint011| |RES_GetResData_Public_ScreenPrint023| |RES_GetResData_Public_ScreenPrint086| |RES_GetResData_Public_ScreenPrint100|

I used the below query to get all occurrences of the string 'RES_GetResData_Public_ScreenPrint' in CLOB data for the above-provided example. The query is able to get all 4 occurrences but displays only the first occurrence of the string in all 4 rows.

Query -

select objectid,
REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^|]+', 1,1) column1 from perftest.NRS_DATASOURCEDATA_A_Test  cross join
table(cast(multiset(select level from dual
connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') 
) as sys.odcinumberlist))

Actual Result -

objectid column1
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint011

Expected Result -

objectid column1
12345 RES_GetResData_Public_ScreenPrint011
12345 RES_GetResData_Public_ScreenPrint023
12345 RES_GetResData_Public_ScreenPrint086
12345 RES_GetResData_Public_ScreenPrint100
1

There are 1 best solutions below

4
kfinity On BEST ANSWER

I think you're missing some of your query? But the issue is the second '1' in regexp_substr. Here's the description of the 4th arg from the docs:

occurrence is a positive integer indicating which occurrence of pattern in source_char Oracle should search for. The default is 1, meaning that Oracle searches for the first occurrence of pattern.

So regexp_count found 4 instances, but each time regexp_substr gets the first one. You'll want to change it from 1 to level, so the regexp_substr is looking at the same occurrence as regexp_count for each row.

Without the rest of the query, I can't say for sure, but I think level probably got re-aliased to column_value by the table/cast/multiset statement, so I think that's probably what you want.

select objectid,

    REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^|]+', 1,column_value) column1
 
from test_table 
cross join
 table(cast(multiset(select level from dual
 
                 connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint')
... more query, probably? maybe just ")))" ...

As a side note, you can maybe rewrite the table/cast/multiset to a simpler inline view - see this question.

--EDIT--

Ok, it sounds like your clob is actually structured data. If you know that there are 3 fields in each row, like this:

RES_GetResData_Public_ScreenPrintXXX F99999999 (blank)

then the column_value that you get back from the cross-join will basically be your row number. Your column2 regexp_substr is now just looking for any field - so if you want the second column, it'll be (row number)*(number of columns)+(the column you want, which is 2). In other words:

(column_value-1)*3+2 will give you column 2. Change the 2 to a 3 if you want column 3.

If your data actually has 4 columns, you'll need to change the 3 to a 4.

select objectid, 
    REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1, 
    REGEXP_SUBSTR(DATA, '([^\|]+)', 1, (column_value-1)*3+2) AS column2,
    column_value 
from test_table 
cross join 
    table(cast(multiset(select level from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))