How to sort 'WindowsPath' object files naturally

1.4k Views Asked by At

I am iterating through files in a directory using Path().glob() and it's not iterating in the correct natural ordering. For eg. it's iterating like this:

[WindowsPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P11/dataP11/SAMPLED_NORMALIZED/P11_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P12/dataP12/SAMPLED_NORMALIZED/P12_Cor.csv'),
# ...and so on from P1 to P30

When I want it to iterate like this: P1, P2, P3 and so on.

I have tried using the code below but it gives me an error:

from pathlib import Path

file_path = r'C:/Users/HP/Desktop'

files = Path(file_path).glob(file)
sorted(files, key=lambda name: int(name[10:]))

where 10 is just some trivial number as I am trying out the code.

The error:

TypeError: 'WindowsPath' object is not subscriptable

Ultimately, what I want is to iterate through the files and do something with each file:

from pathlib import Path

for i, fl in enumerate(Path(file_path).glob(file)):
    # do something

I have even tried the library natsort but it's not ordering the files correctly in the iteration. I have tried:

from natsort import natsort_keygen, ns
natsort_key1 = natsort_keygen(key=lambda y: y.lower())
from natsort import natsort_keygen, ns
natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)

The two codes above still gives me P1, P10, P11 and so on.

Any help would really be appreciated.

3

There are 3 best solutions below

9
tdelaney On BEST ANSWER

If you want to sort by the digits in the file name, you can use the Path.name attribute and a regular expression that extracts the digits.

from pathlib import Path
import re

file_path = r'C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/'

def _p_file_sort_key(file_path):
    """Given a file in the form P(digits)_cor.csv, return digits as an int"""
    return int(re.match(r"P(\d+)", file_path.name).group(1))

files = sorted(Path(file_path).glob("P*_Cor.csv"), key=_p_file_sort_key)
1
Eric Truett On

You can call str on the Path object or you can use as_posix().

from pathlib import Path

for fn in sorted([str(p) for p in Path(file_path).glob('*.csv')]):
    # do something with fn

for fn in sorted([p.as_posix() for p in Path(file_path).glob('*.csv')]):
    # do something with fn

1
SethMMorton On

Using natsort works to sort this data, but you have to tell it how to extract a string from the Path object (it doesn't do it by default for performance purposes).

In [2]: from pathlib import Path                                                                                    

In [3]: import natsort                                                                                              

In [4]: a = [Path('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
             Path('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv'),
             Path('C:/Users/HP/Desktop/P2/dataP2/SAMPLED_NORMALIZED/P2_Cor.csv')]                                                                       

In [5]: natsort.natsorted(a, key=str)                                                                                      
Out[5]: 
[PosixPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P2/dataP2/SAMPLED_NORMALIZED/P2_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv')]

In [6]: natsort.natsorted(a, alg=natsort.PATH)
Out[6]: 
[PosixPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P2/dataP2/SAMPLED_NORMALIZED/P2_Cor.csv'),
 PosixPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv')]

The first option will convert all the Path objects to a string which natsort knows how to handle. This works for your data.

The second option switches on natsort's PATH algorithm, which will automatically handle Path objects correctly, and also adds more robust handling for corner-cases common in file system paths.


Full disclosure, I am the natsort author.