I have a table with two string type columns named IL and CL. I have to compare both the strings character by character and wherever there is a question mark in column CL string, I have to replace it with the character found in the other column (IL), in the exact same position.
For example, the CL string after comparing with IL and replacing the question marks would be the New CL as shown below
| CL | IL | NEW CL |
|---|---|---|
| ???-123201-000000-000-??? | 104-234561-644221-123-947 | 104-123201-000000-000-947 |
I have the below code working, where there is a while loop inside of a cursor going through each character of the string, concatenating into a new string, and then doing an update at the end with the new value. However, performance with this piece of code is horribly slow as the table has 100K+ records and it's looping through each row 25 time (string length). I am looking to see if perhaps there is a way to rewrite this logic set based for performance improvement.
declare CharC insensitive cursor for
select ID, IL, CL
from AcctStrings
where CL like '%?%'
open CharC
fetch next from CharC into @ID, @IL, @CL
while @@fetch_status = 0 begin
set @NewCL = ''
set @i = 1
while @i <= 25 begin
set @TestChar = substring(@CL,@i,1)
set @OtherChar = substring(@IL,@i,1)
if (@TestChar = '?') begin
set @NewCL = @NewCL + @OtherChar
end
else
set @NewCL = @NewCL + @TestChar
set @i = @i + 1
end
update AcctStrings
set CL = @NewCL
where ID = @ID
end
fetch next from CharC into @ID, @IL, @CL
end
deallocate CharC
If using SQL Server 2022, you can try the following:
The above uses a subquery to break the
CLstring down into individual characters, apply the transform (when needed), and then reconstruct the updated string.See this db<>fiddle for a demo.
Results (with some additional test data):
???-123201-000000-000-???104-234561-644221-123-947104-123201-000000-000-947111-222222-333333-444-555xxx-xxxxxx-xxxxxx-xxx-xxx?????????????????????????123-654321-123456-456-789123-654321-123456-456-789The null value indicates that no
NewCLvalue was calculated, because the original value contained no "?"s.Addendum:
For SQL Server 2019, the
GENERATE_SERIES()function is unavailable, but there are several alterative techniques available for generating a number sequence - AVALUESsubselect,ROW_NUMBER()applied to a sufficiently large row source, or a recursive CTE (Common Table Expression).The following uses a recursive CTE to generate a number sequence, that is then limited to the CL string length in the
WHEREclause. The remaining logic is the same.See this db<>fiddle.
For 2016, we need to fall back to the old
FOR XML PATH('')technique.Since we are not inserting separators, the
STUFF()can be omitted from this use case. Normally, the expression being concatenated defaults to atext()element (plain text), but since it is a direct column reference, an explicitAS [text()]is needed to avoid the value being formatted as<NewC>x</NewC>. The use of, TYPEand.value('text()[1]', 'nvarchar(max)')is best practice to avoid character encoding issues.See this 2016 fiddle.