Regex unicode replacing spaces and Tabulators

68 Views Asked by At

I'm having trouble getting this regex replacement code to work "again"

I believe this used to work in /net but it never worked in Expresso ( - meaning it might never actually worked properly)Expresson Screenshot

I have text string I need to remove any spaces or Tabulators between the [;] and the [G30] - but not the line feeds!

N10 ;  G30

after the replacement --the result should be

N10 ;G30

Here is what I came up with back in the days

;\u0020*\u0009*(?:\w+\n)   

If found, this should be replaced by a simple ;

I attached the Expresso screenshot as it might make things easier to understand

Any ideas on how to make this work ?

3

There are 3 best solutions below

2
David542 On

Maybe I'm not understanding the question but if the goal is to remove whitespace before or after the ; could you do:

[\u0020\u0009]*;[\u0020\u0009]*

I'm not sure about the newline issue, but perhaps there is a feature in the Expresso program on how to handle start-of-line or text (normally this is something like ^...$ or /A.../z).

If you want the excel A1 formats per line then maybe something like the following would work:

enter image description here

Note in the above I am using the 'anchors' of ^ and $.

2
Bohemian On

Match horizontal whitespace:

;\h*(?:\w+\n)

Or if that isn't supported for you, use a subtraction to match whitespace except new lines:

;[\s&&[^\n\r]]*(?:\w+\n)
0
The fourth bird On

If there should be at least a single word character following and you are using C#, you might use:

;[\p{Zs}\t]+(?=[^\W\d]+\d)
  • ; Match literally
  • [\p{Zs}\t]+ Match 1+ whitespace characters or tabs
  • (?=[^\W\d]+\d) Positive lookahead, assert 1+ word chars without a digit and then a digit

In the replacement use ;

See a regex demo.

If there should be a newline following after the word characters:

;[\p{Zs}\t]+(?=[^\W\d]+\d\w*\r?\n)

See another regex demo.