Re-arranging a list of names in R from "SURNAMES first names", to "first initial. SURNAMES"

111 Views Asked by At

I have a list of names that look like this:

c("CASEY Aoife", "CREMEN Margaret", "MORCH-PEDERSEN Marie", 
  "RORVIK Jenny Marie", "MIGUEL GOMES Natalia", "ROHNER Maria-Clara") 

and to display them in a table I would like them to look like this

c("A. CASEY", "M. CREMEN", "M. MORCH-PEDERSEN", 
  "J. RORVIK", "N. MIGUEL GOMES", "M. ROHNER")

There are challenges as there are people with multiple first names and multiple last names etc, as well as dealing with hyphens etc.

I've tried a function as below but not getting my desired output:

convert_name <- function(name) {
  parts <- str_split(name, " ")[[1]]  # Split name into parts
  
  # Extract initials and last name
  initials <- str_extract(parts, "\\b\\p{L}")  # Extract first letter of each part
  last_name <- parts[length(parts)]
  
  # Concatenate initials and last name with space
  converted_name <- paste(initials, last_name, sep = ". ")
  
  return(converted_name)
}
2

There are 2 best solutions below

0
Vons On

sapply over each name a function to shuffle the name.

x=c("CASEY Aoife", "CREMEN Margaret", "MORCH-PEDERSEN Marie", 
  "RORVIK Jenny Marie", "MIGUEL GOMES Natalia", "ROHNER Maria-Clara") 


sapply(strsplit(x, " "), \(y) {
  j = 1
  for (i in 1:length(y)) {
    if (identical(y[i], toupper(y[i]))) {
      j = i
    } else {
      break
    }
  }
  paste0(substr(y[j+1], 1, 1), ". ", paste0(y[1:j], collapse=" "))
  })

Another option without forloop:

sapply(strsplit(x, " "), function(y){
  ix <- y == toupper(y)
  paste0(substr(y[ !ix ][ 1 ], 1, 1),  ". ", paste(y[ ix ], collapse = " "))
  })

Output

[1] "A. CASEY"          "M. CREMEN"         "M. MORCH-PEDERSEN"
[4] "J. RORVIK"         "N. MIGUEL GOMES"   "M. ROHNER" 
2
GKi On

You can use sub like:

sub("(.*[A-Z]) ([A-Z]).*", "\\2. \\1", s)
#[1] "A. CASEY"              "M. CREMEN"             "M. MORCH-PEDERSEN"    
#[4] "J. RORVIK"             "N. MIGUEL GOMES"       "M. ROHNER"            
#[7] "P. FERNANDES-Da-VEIGA" "W. Van-DORP"           "G. De-VITA"           

Where (.*[A-Z]) matches anything ending with an uppercase followed by a space. () stores the match in \\1. Followed by an uppercase, stored in \\2 followed by anything .*.

Data

s <- c("CASEY Aoife", "CREMEN Margaret", "MORCH-PEDERSEN Marie", 
       "RORVIK Jenny Marie", "MIGUEL GOMES Natalia", "ROHNER Maria-Clara",
       "FERNANDES-Da-VEIGA Paulo", "Van-DORP Wianka", "De-VITA Giuseppe")