How can I limit os.walk results for a single file?

98 Views Asked by At

I am trying to search a given directory for a specific file, and if that file does not exist I would want the code to say "File does not exist". Currently with os.walk I can get this to work, however this will hit on every single file that isn't the specified file and print "File dos not exist". I know that this is how os.walk functions, but I was not sure if there is a way to make it only print out once if it is found or not found.

Folder structure:

root folder| |Project Folder |file.xml |other files/subfolders

How I would want the code to work is to go inside of "Project Folder", do a recursive search for "file.xml", and once it is found print out once "Found", otherwise prints out once "Not found".

The code is:

def check_file(x): #x = root folder dir
   for d in next(os.walk(x))[1]: #if I understand correctly, [1] will be Project Folder
        for root, directories, files in os.walk(x):
            for name in files:
                if "file.xml" not in name:
                    print("found")
                else:
                    print("File Missing")

If I change the code to

            for name in files:
                if "file.xml" in name:
                    print("found")
                else:
                    pass

The code technically works as intended, but it doesn't really do much to help point out if it isn't there, so this isn't a good solution. It would be easier if I was able to give the code a specific path to look in, however as the user is able to place the 'root folder' anywhere on their machine as well as the 'project folder' would have different names depending on the project, I don't think I would be able to give the code a specific location.

Is there a way to get this to work with os.walk, or would another method work best?

3

There are 3 best solutions below

2
slothrop On BEST ANSWER

The glob module is very convenient for this kind of wildcard-based recursive search. Particularly, the ** wildcard matches a directory tree of arbitrary depth, so you can find a file anywhere in the descendants of your root directory.

For example:

import glob

def check_file(x):  # where x is the root directory for the search
    files = glob.glob('**/file.xml', root_dir=x, recursive=True)
    if files:
        print(f"Found {len(files)} matching files")
    else:
        print("Did not find a matching file")
2
CristiFati On

Listing [Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False).

You don't need 2 nested loops. You only need to check on each iteration, if the base file name is present in the 3rd member that os.walk produces.
This implementation handles the case of a file being present in multiple directories. If you only need print the file once (no matter how many times it's present in the directory), there's the function search_file_once.

code00.py:

#!/usr/bin/env python

import os
import sys


def search_file(root_dir, base_name):
    found = 0
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print("Found: {:s}".format(os.path.join(root, base_name)))
            found += 1
    if not found:
        print("Not found")


# @TODO - cfati: Only care if file is found once
def search_file_once(root_dir, base_name):
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print("Found: {:s}".format(os.path.join(root, base_name)))
            break
    else:
        print("Not found")


def main(*argv):
    root = os.path.dirname(os.path.abspath(__file__))
    files = (
        "once.xml",
        "multiple.xml",
        "notpresent.xml",
    )
    for file in files:
        print("\nSearching recursively for {:s} in {:s}".format(file, root))
        search_file(root, file)


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.\n")
    sys.exit(rc)

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackExchange\StackOverflow\q076383189]> sopr.bat
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###

[prompt]> tree /a /f
Folder PATH listing for volume SSD0-WORK
Volume serial number is AE9E-72AC
E:.
|   code00.py
|
\---dir0
    +---dir00
    +---dir01
    |       multiple.xml
    |       once.xml
    |
    \---dir02
        \---dir020
                multiple.xml


[prompt]>
[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.10_test0\Scripts\python.exe" ./code00.py
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064bit on win32


Searching recursively for once.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\once.xml

Searching recursively for multiple.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\multiple.xml
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir02\dir020\multiple.xml

Searching recursively for notpresent.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
Not found

Done.

This is just one of the multiple ways possible of doing this. Check [SO]: How do I list all files of a directory? (@CristiFati's answer) for more details.

0
Vincent Laufer On

I have written a function like this and several others in the past. Want to provide them all for context, some will work for your case with minimal to no modifcation.

## Find ALL matches (not just one):
## Example Usage:  findAll('*.txt', '/path/to/dir')

def findAll(name, path):
    result = []
    for root, dirs, files in os.walk(path):
        if name in files:
            result.append(os.path.join(root, name))
            return result

## A function that keeps going until all target files are found)
def findProjectFiles(Folder, targetFiles):
    import os
    os.chdir(Folder)
    filesFound=[]
    while len(targetFiles) > len(filesFound):
        for root, dirs, files in os.walk(Folder):
            for f in files:
                current=os.path.join(Folder, f)
                if f in TargetFiles:
                    filesFound.append(f)
            for d in dirs:
                Folder=os.path.join(Folder, d)
            break;
    filePaths=os.path.abspath(filePaths)
    return filePaths

# find all file paths in folder:

def findPaths(name, path):
    import os
    for root, dirs, files in os.walk(path):
        if name in files:
            return os.path.join(root, name)

## can search the object returned for the string you want to find easily

## Similar, but this will match a pattern (i.e. does not have to be exact file name match).

import os, fnmatch
def findMatch(pattern, path):
    result = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result.append(os.path.join(root, name))
                return result