How to perform multiple string pattern replacement without overwriting previous replacements?

95 Views Asked by At

I'd like to take algebraic chess notation and convert the file letters (a, b, c, d, e, f, g, h) to the NATO phonetic alphabet (alpha, bravo, charlie, echo, foxtrot, golf, hotel), without overwriting previous replacements. I'm working in R.

notation <- "1.d4 Nf6 2.c4 e6 3.g3 d5 4.Bg2 Be7 5.Nf3 0-0 6.0-0 dxc4 7.Qc2 a6 8.Qxc4 b5 9.Qc2 Bb7 10.Bd2 Ra7 "

Desired outcome: "1.delta 4 Nfoxtrot 6 2.charlie 4 echo 6 3.golf 3 delta 5" and so on. I do not care about spacing right now.

If I use a naive string replacement method, the replacements will conflict with each other.

Using gsub:

notation <- gsub("a", "alpha", notation)
notation <- gsub("b", "bravo", notation)
notation <- gsub("c", "charlie", notation)
notation <- gsub("d", "delta", notation)
notation <- gsub("e", "echo", notation)
notation <- gsub("f", "foxtrot", notation)
notation <- gsub("g", "golf", notation)
notation <- gsub("h", "hotel", notation)

Yields "1.dechotelolta4 Nfoxtrot6 2.chotelarliechotelo4 echotelo6 3.golf3 dechotelolta5 4.Bgolf2 Bechotelo7 5.Nfoxtrot3 0-0 6.0-0 dechoteloltaxchotelarliechotelo4 7.Qchotelarliechotelo2 alphotela6 8.Qxchotelarliechotelo4 bravo5 9.Qchotelarliechotelo2 Bbravo7 10.Bdechotelolta2 Ralphotela7 "

'd' converts to 'delta', which is good. However, 'delta' contains the letter 'e', and so becomes 'decholta'. There's an 'h' in there, so the result becomes 'dechotelolta'.

I also tried a function from the stringi library, but it also returns something similarly undesirable.

stri_replace_all_fixed(notation, 
                         c("a", "b", "c", "d", "e", "f", "g", "h"), 
                         c("alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"), 
                         vectorise_all = FALSE)

I looked around their documentation and several SO questions, but wasn't able to find what I need.

This python question is close, but limited to single character replacement.

So I am looking for a function/method that will replace multiple patterns, but I do not want the replacement texts to overwrite/alter each other.

My best guess right now is to build a new string by reading notation one character at a time, and appending copies of a single character or substitutions of a-h letters to the new string. But that feels very un-R-like. Does anyone have any suggestions or know of a library function with the desired outcome?

3

There are 3 best solutions below

4
LMc On BEST ANSWER
nato <- c("alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel", "india", "juliett", "kilo", "lima", "mike", "november", "oscar", "papa", "quebec", "romeo", "sierra", "tango", "uniform", "victor", "whiskey", "x-ray", "yankee", "zulu")
tr <- setNames(nato, letters)

stringr::str_replace_all(notation, "[a-z]", ~ tr[.x])
# [1] "1.delta4 Nfoxtrot6 2.charlie4 echo6 3.golf3 delta5 4.Bgolf2 Becho7 5.Nfoxtrot3 0-0 6.0-0 deltax-raycharlie4 7.Qcharlie2 alpha6 8.Qx-raycharlie4 bravo5 9.Qcharlie2 Bbravo7 10.Bdelta2 Ralpha7"

[a-z] will only match lower case letters. The third argument of str_replace_all is the replacement value of the pattern match. Not often used is the fact that you can provide a function (from ?str_replace_all):

Alternatively, supply a function, which will be called once for each match (from right to left) and its return value will be used to replace the match.


Alternatively, mgsub package allows for simultaneous substitution and is very succinct:

library(mgsub)

mgsub(notation, letters, nato)
0
r2evans On

Base R use of gregexpr and regmatches:

phonetic <- c(a="alpha", b="bravo", c="charlie", d="delta", e="echo", f="foxtrot", g="golf", h="hotel")
gre <- gregexpr(paste0("[", paste(names(phonetic), collapse=""), "]"), notation)
regmatches(notation, gre)[[1]] <- phonetic[ regmatches(notation, gre)[[1]] ]
notation
# [1] "1.delta4 Nfoxtrot6 2.charlie4 echo6 3.golf3 delta5 4.Bgolf2 Becho7 5.Nfoxtrot3 0-0 6.0-0 deltaxcharlie4 7.Qcharlie2 alpha6 8.Qxcharlie4 bravo5 9.Qcharlie2 Bbravo7 10.Bdelta2 Ralpha7 "

I used the seemingly more-complex paste(names(phonetic),collapse="") because your example included a subset of the phonetic alphabet. If you're using a full (a-z) version, that's not necessary:

gre <- gregexpr("[a-z]", notation)
# ...
0
Bastián Olea Herrera On

This answer uses dplyr and stringr to do the replacement by converting your chess notation into a dataframe. This approach has the benefit of giving you more control over the replaced text, as you do it step-by-step instead of on a single function like the other answers.

First, we split the character string into a vector:

library(dplyr)
library(stringr)

notation <- "1.d4 Nf6 2.c4 e6 3.g3 d5 4.Bg2 Be7 5.Nf3 0-0 6.0-0 dxc4 7.Qc2 a6 8.Qxc4 b5 9.Qc2 Bb7 10.Bd2 Ra7 "

notation_split <- notation |> str_split(pattern = " ") |> unlist()

Then, we create a dataframe with the phonetic dictionary:

phonetic_dictionary <- tribble(~letter, ~word,
                               "a", "alpha",
                               "b", "bravo",
                               "c", "charlie",
                               "d", "delta",
                               "e", "echo",
                               "f", "foxtrot",
                               "g", "golf",
                               "h", "hotel")

We convert the vector of moves into a dataframe, then we extract the letter we want to replace into the phonetic word, then we attach the corresponding phonetic word using left_join, and finally, we replace the letters with the words:

replacement_table <- tibble(moves = notation_split) |> 
    mutate(letter = str_extract(moves, "a|b|c|d|e|f|g|h")) |> 
    left_join(phonetic_dictionary, join_by(letter)) |> 
    mutate(moves_phonetic = str_replace(moves, letter, paste0(" ", word, " "))) |> 
    mutate(moves_phonetic = ifelse(is.na(moves_phonetic), moves, moves_phonetic)) |> 
    print(n=Inf)

Finally, we extract the resulting column to give your expected result:

replacement_table |> pull(moves_phonetic) |> paste(collapse = " ")
# [1] "1. delta 4 N foxtrot 6 2. charlie 4  echo 6 3. golf 3  delta 5 4.B golf 2 B echo 7 5.N foxtrot 3 0-0 6.0-0  delta xc4 7.Q charlie 2  alpha 6 8.Qx charlie 4  bravo 5 9.Q charlie 2 B bravo 7 10.B delta 2 R alpha 7 "