I want to find digits followed by "f", "ff", "f." or "ff." to standardize the spelling following given conventions/rules.
I already tried some regular expressions, but unfortunately I did not find an universal expression grabbing all of the cases above (f, ff, f., ff.).
In spoken words it seems easy:
- find digits
- followed by an optional whitespace
- then followed by f, ff, f. or ff.
- only whitespaces or NOT word boundaries are allowed before and after the expression
The beginning of the regex is quite easy, but I can’t figure out how to handle the different "f"-cases and the NOT boundaries following.
My best guess yet is:
(?<=\b)(\d+(\h|\b)?f{1,2})\.?
but then still the stings followed by a word character are found.
When I extend the regex to:
(?<=\b)(\d+(\h|\b)?f{1,2})\.?(\W)
the numbered of "false funds" are decreasing, but still it is not the solution
I prepared lines for testing. The lines containing a plus "+" should be found, at the same time the ones with a minus "-" should not be found.
00f aaa +
00f. aaa +
00ff aaa +
00ff. aaa +
00 f aaa +
00 f. aaa +
00 ff aaa +
00 ff. aaa +
+ aaa 00f aaa +
+ aaa 00f. aaa +
+ aaa 00ff aaa +
+ aaa 00ff. aaa +
+ aaa 00 f aaa +
+ aaa 00 f. aaa +
+ aaa 00 ff aaa +
+ aaa 00 ff. aaa +
+ aaa 00f
+ aaa 00f.
+ aaa 00ff
+ aaa 00ff.
+ aaa 00 f
+ aaa 00 f.
+ aaa 00 ff
+ aaa 00 ff.
00 faaa -
00 f.aaa -
00 ffaaa -
00 ff.aaa -
00af aaa -
00af. aaa -
00aff aaa -
00aff. aaa -
- aaa 00 faaa -
- aaa 00 f.aaa -
- aaa 00 ffaaa -
- aaa 00 ff.aaa -
- aaa 00af aaa -
- aaa 00af. aaa -
- aaa 00aff aaa -
- aaa 00aff. aaa -
- aaa00f
- aaa00f.
- aaa00ff
- aaa00ff.
- aaa 00af
- aaa 00af.
- aaa 00aff
- aaa 00aff.
00faaa -
00f.aaa -
00ffaaa -
00ff.aaa -
00af aaa -
00af. aaa -
00aff aaa -
00aff. aaa -
- aaa00 faaa -
- aaa00 f.aaa -
- aaa00 ffaaa -
- aaa00 ff.aaa -
- aaa00af aaa -
- aaa00af. aaa -
- aaa00aff aaa -
- aaa00aff. aaa -
- aaa00af
- aaa00af.
- aaa00aff
- aaa00aff.
Further, the aim is to group the digits anf "f"-cases in a manner, so that they can be uses in a replacement-expression to standardize the spelling to one of those cases:
- 123 ff. (with whitespace, with dot)
- 123 ff (with whitespace, without dot)
- 123ff. (without whitespace, with dot)
- 123ff (without whitespace, without dot)
I suggest
See the regex demo
Details
\b
- word boundary(\d+)
- Group 1: 1+ digits(\s?)
- Group 2: an optional whitespace(f{1,2})
- Group 3: 1 or 2f
s(?:(\.)\B|\b(?!\.))
- either of the two:(\.)\B
- a.
captured in Group 4 if not followed with a word char|
- or\b(?!\.)
- a word boundary not followed with a dot.Then, replacing is easy with:
123 ff.
:$1 $3.
123 ff
:$1 $3
123ff.
:$1$3.
123ff
:$1$3
If the whitespace and dot are not necessary in replacement patterns, remove the groupings and adjust the IDs in the replacement backreferences.