Natural sorting with R differs on deployment (maybe OS/Locale issue)

72 Views Asked by At

I am using the package "naturalsort" found here: https://github.com/kos59125/naturalsort Natural sorting is not something that is implemented elsewhere in a good manner in R as far as I know, so I was happy to find this package.

I use the function naturalsort to sort file names just like windows explorer, which works great locally.

But when I use it in my production environment deployed with Docker on Google Cloud Run, the sorting changes. I don't know if this is due to changes in locale(I am fra Denmark) or it is due to OS differences between my windows PC and the Docker/Google Cloud Run deployment.

I have created a example ready to be run in R:

######## Code start ###########
require(plumber)
require(naturalsort) #for name sorting

#* Retrieve sorted string list
#* @get /sortstrings
#* @param nothing
function(nothing) {
  
  print(nothing)
  
  test <- c("0.jpg", "file (4_5_1).jpeg", "1 tall thin image.jpeg",
            "8.jpeg", "8.jpg", "file (2.1.2).jpeg", "file (0).jpeg", "3.jpeg",
            "file (1).jpeg", "file (2.1.1).jpeg", "file (0) (3).jpeg", "file (2).jpeg",
            "file (2.1).jpeg", "file (4_5).jpeg", "file (4).jpeg", "file (39).jpeg")
  
  print("Direct sort")
  print(naturalsort(text = test))
  
  sorted_strings <- naturalsort(text = test)
  
  return(sorted_strings) 
}
######## Code end ###########

I would expect it to sort the file names like below, which it does locally both when run directly in the script and also when doing it through plumber RUN API:

    c("0.jpg", 
  "1 tall thin image.jpeg", 
  "3.jpeg", 
  "8.jpeg", 
  "8.jpg", 
  "file (0) (3).jpeg", 
  "file (0).jpeg", 
  "file (1).jpeg", 
  "file (2).jpeg", 
  "file (2.1).jpeg", 
  "file (2.1.1).jpeg", 
  "file (2.1.2).jpeg", 
  "file (4).jpeg", 
  "file (4_5).jpeg", 
  "file (4_5_1).jpeg", 
  "file (39).jpeg"
  )

But instead it sorts it like this:

c("0.jpg",
"1 tall thin image.jpeg",
"3.jpeg",
"8.jpeg",
"8.jpg",
"file (0) (3).jpeg",
"file (0).jpeg",
"file (1).jpeg",
"file (2.1.1).jpeg",
"file (2.1.2).jpeg",
"file (2.1).jpeg",
"file (2).jpeg",
"file (4_5_1).jpeg",
"file (4_5).jpeg",
"file (4).jpeg",
"file (39).jpeg")

Which is not like windows explorer.

1

There are 1 best solutions below

0
Mikael Jagan On BEST ANSWER

Try fixing the collating sequence prior to the naturalsort call. It varies by locale and can affect how strings are compared (and therefore sorted).

## Get initial value
lcc <- Sys.getlocale("LC_COLLATE")

## Use fixed value
Sys.setlocale("LC_COLLATE", "C")

sorted_strings <- naturalsort(text = test)

## Restore initial value
Sys.setlocale("LC_COLLATE", lcc)

You can find some details in ?sort, ?Comparison, and ?locales and more here.