I have a directory with 3 million+ files in it (which I should have avoided creating in the first place). Using os.scandir() to simply print out the names,
for f in os.scandir():
print(f)
takes .004 seconds per item for the first ~200,000 files, but drastically slows down to .3 seconds per item. Upon trying it again, it did the same thing- fast for the first ~200,000, then slowed way down.
After waiting an hour and running it again, this time it was fast for the first ~400,000 files but then slowed down in the same way.
The files all start with a year between 1908 and 1963, so I've tried reorganizing the files using bash commands like
for i in {1908..1963}; do
> mkdir ../test-folders/$i;
> mv $i* ../test-folders/$i/;
> done
But it ends up getting hung up and never making it anywhere...
Any advice on how to reorganize this huge folder or more efficiently list the files in the directory?
It sounds like using an iterator, a function that only returns one item at a time instead of putting everything in memory, would be best.
The
globLibrary has the functioniglobDocumentation: https://docs.python.org/3/library/glob.html#glob.iglob
Related question and answer: https://stackoverflow.com/a/17020892/7838574