Text cleanup in snowflake

Question

Text cleanup in snowflake

144 Views Asked by Abhigyan Sarma At 02 October 2023 at 21:39

I have a column in a table that contains data about any updates related to changes about a company in the below format -

#=============#==============#================#
| Company ID  |  updated_at  |   updates      |
#=============#==============#================#
| 101         | 2020-11-01   | name:          |
|             |              | -ABC           |
|             |              | -XYZ           |
|             |              | url:           |
|             |              | -www.abc.com   |
|             |              | -www.xyz.com   |
+-------------+--------------+----------------+
| 109         | 2020-10-20   | rating:        |
|             |              | -4.5           |
|             |              | -4.0           |
+-------------+--------------+----------------+

As you can see above, the column updates contains strings that include newlines and describe one or multiple updates. In the above example this means that for company ID 101, the name changed from ABC to XYZ and the url changed from www.abc.com to www.xyz.com. For company ID 109, only the rating changed from 4.5 to 4.0.

However I would like to divide the updates column into 3 columns - one should contain what was changed (url, name etc.), second should have the old value and the 3rd column should have the new value. Something like this -

#============#============#==============#================#
| Company ID |   Field    |  Old Value   |   New Value    |
#============#============#==============#================#
| 101        |   name     | ABC          | XYZ            |
+------------+------------+--------------+----------------+
| 101        |   url      | www.abc.com  | www.xyz.com    |
+------------+------------+--------------+----------------+
| 109        |   rating   | 4.5          | 4.0            |
+------------+------------+--------------+----------------+

I am doing this in Snowflake. I know how to do this in postgres (using a regexp split to table and split_part functions) but snowflake doesn't support regexp split to table and therefore I am kind of stuck. Any help would be appreciated. Thanks!

I tried doing this using a regular (not regexp) split_to_table function, but obviously the results were not correct. I also tried to convert the text to a key-value pair using object_construct but converting that text to a key-value pair is difficult as well

Original Q&A

There are 1 best solutions below

**Mike Walton** · Accepted Answer · 2023-10-02T23:50:01.677000

I think split() or split_to_table() are actually the right way to go, you just need to do some manipulation after the fact. This query doesn't get you exactly what you need, since I added an extra column for my case statement and doesn't include the original records, but this works based on the data you provided:

first section just replicates the updates field into a column:

with x as (
select 'name:
-ABC
-XYZ
url:
-www.abc.com
-www.xyz.com'::string as str
)

Using the split in the lateral flatten below, I can then use a lead() function to get the next 2 records (old value and new value). This assumes your format is consistent inside each record, but does work as you were describing above:

select trim(y.value::string,'-,:') as field, 
       case when right(value,1) = ':' then 'name' else 'value' end as field_type,
       case when field_type = 'name' then
           lead(field,1) over (partition by seq order by index)
       end as old_value,
       case when field_type = 'name' then
           lead(field,2) over (partition by seq order by index)
       end as new_value
from x,
lateral flatten(input=>split(str,'\n')) y
qualify field_type = 'name';

Text cleanup in snowflake

There are 1 best solutions below

Related Questions in REGEX

Related Questions in TEXT

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in TEXT-PARSING

Trending Questions

Popular # Hahtags

Popular Questions