How to identify UPPER case strings and move place

38 Views Asked by At

I have created this pandas dataframe:

ds = {"col1":["ROSSI Mauro", "Luca Giacomini", "Sonny Crockett"]}
df = pd.DataFrame(data=ds)

Which looks like this:

print(df)
             col1
0     ROSSI Mauro
1  Luca Giacomini
2  Sonny Crockett

Let's take a look at the column col1, which contains some names and last names (in different order). If a string is in all UPPER case (for example, like ROSSI in record 0), then it is a last name and I need to move it after the non all-upper case string.

So, the resulting dataframe would look like this:

             col1
0     Mauro ROSSI
1  Luca Giacomini
2  Sonny Crockett

Does anyone know how to identify the all-upper case string in col1 and move it after the non all-upper case string?

2

There are 2 best solutions below

0
Scott Boston On BEST ANSWER

We can also use captured groups with regex in str.replace:

df['col1 new'] = df['col1'].str.replace('([A-Z]+)\\b(.*)', '\\2 \\1')

Output:

             col1        col1 new
0     ROSSI Mauro     Mauro ROSSI
1  Luca Giacomini  Luca Giacomini
2  Sonny Crockett  Sonny Crockett

Using the () to make a captured group, with \b as a word boundary, we can use \2 and \1 to reorder the groups. With more complex data, you'll probably have to adjust your regex.

0
mozway On

You can use str.replace with a custom function:

df['col1'] = df['col1'].str.replace(r'(\S+)\s*(\S+)',
                                    lambda m: f'{m.group(2)} {m.group(1)}'
                                    if m.group(1).isupper() else m.group(0))

Or temporary Series and boolean indexing with str.upper:

tmp = df['col1'].str.extract(r'(\S+)\s*(\S+)')

df.loc[tmp[0].str.isupper(), 'col1'] = tmp[1] + ' ' + tmp[0]

NB. this assumes that names are only 2 distinct words, if not you need to adapt the regex accordingly (regex demo).

Output:

             col1
0     Mauro ROSSI
1  Luca Giacomini
2  Sonny Crockett