Text cleanup in snowflake

144 Views Asked by At

I have a column in a table that contains data about any updates related to changes about a company in the below format -

#=============#==============#================#
| Company ID  |  updated_at  |   updates      |
#=============#==============#================#
| 101         | 2020-11-01   | name:          |
|             |              | -ABC           |
|             |              | -XYZ           |
|             |              | url:           |
|             |              | -www.abc.com   |
|             |              | -www.xyz.com   |
+-------------+--------------+----------------+
| 109         | 2020-10-20   | rating:        |
|             |              | -4.5           |
|             |              | -4.0           |
+-------------+--------------+----------------+

As you can see above, the column updates contains strings that include newlines and describe one or multiple updates. In the above example this means that for company ID 101, the name changed from ABC to XYZ and the url changed from www.abc.com to www.xyz.com. For company ID 109, only the rating changed from 4.5 to 4.0.

However I would like to divide the updates column into 3 columns - one should contain what was changed (url, name etc.), second should have the old value and the 3rd column should have the new value. Something like this -

#============#============#==============#================#
| Company ID |   Field    |  Old Value   |   New Value    |
#============#============#==============#================#
| 101        |   name     | ABC          | XYZ            |
+------------+------------+--------------+----------------+
| 101        |   url      | www.abc.com  | www.xyz.com    |
+------------+------------+--------------+----------------+
| 109        |   rating   | 4.5          | 4.0            |
+------------+------------+--------------+----------------+

I am doing this in Snowflake. I know how to do this in postgres (using a regexp split to table and split_part functions) but snowflake doesn't support regexp split to table and therefore I am kind of stuck. Any help would be appreciated. Thanks!

I tried doing this using a regular (not regexp) split_to_table function, but obviously the results were not correct. I also tried to convert the text to a key-value pair using object_construct but converting that text to a key-value pair is difficult as well

1

There are 1 best solutions below

4
Mike Walton On BEST ANSWER

I think split() or split_to_table() are actually the right way to go, you just need to do some manipulation after the fact. This query doesn't get you exactly what you need, since I added an extra column for my case statement and doesn't include the original records, but this works based on the data you provided:

first section just replicates the updates field into a column:

with x as (
select 'name:
-ABC
-XYZ
url:
-www.abc.com
-www.xyz.com'::string as str
)

Using the split in the lateral flatten below, I can then use a lead() function to get the next 2 records (old value and new value). This assumes your format is consistent inside each record, but does work as you were describing above:

select trim(y.value::string,'-,:') as field, 
       case when right(value,1) = ':' then 'name' else 'value' end as field_type,
       case when field_type = 'name' then
           lead(field,1) over (partition by seq order by index)
       end as old_value,
       case when field_type = 'name' then
           lead(field,2) over (partition by seq order by index)
       end as new_value
from x,
lateral flatten(input=>split(str,'\n')) y
qualify field_type = 'name';