The Python Oracle

How to use glob() to find files recursively?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Beneath the City Looping

--

Chapters
00:00 Question
00:31 Accepted answer (Score 1710)
01:35 Answer 2 (Score 187)
02:17 Answer 3 (Score 120)
02:57 Answer 4 (Score 91)
03:23 Thank you

--

Full question
https://stackoverflow.com/questions/2186...

Accepted answer links:
[pathlib.Path.rglob]: https://docs.python.org/3/library/pathli...
[pathlib]: https://docs.python.org/3/library/pathli...
[glob.glob('**/*.c')]: https://docs.python.org/3/library/glob.h...
[os.walk]: https://docs.python.org/2/library/os.htm...
[os.walk]: https://docs.python.org/2/library/os.htm...
[fnmatch.filter]: https://docs.python.org/2/library/fnmatc...

Answer 2 links:
[3.5]: https://docs.python.org/3.5/library/glob...
[Python 3 Demo]: https://trinket.io/python3/e69fe22eff

Answer 4 links:
https://github.com/miracle2k/python-glob.../

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #path #filesystems #glob #fnmatch

#avk47



ACCEPTED ANSWER

Score 1828


There are a couple of ways:

pathlib.Path().rglob()

Use pathlib.Path().rglob() from the pathlib module, which was introduced in Python 3.5.

from pathlib import Path

for path in Path('src').rglob('*.c'):
    print(path.name)

glob.glob()

If you don't want to use pathlib, use glob.glob():

from glob import glob

for filename in glob('src/**/*.c', recursive=True):
    print(filename)   

For cases where matching files beginning with a dot (.); like files in the current directory or hidden files on Unix based system, use the os.walk() solution below.

os.walk()

For older Python versions, use os.walk() to recursively walk a directory and fnmatch.filter() to match against a simple expression:

import fnmatch
import os

matches = []
for root, dirnames, filenames in os.walk('src'):
    for filename in fnmatch.filter(filenames, '*.c'):
        matches.append(os.path.join(root, filename))

This version should also be faster depending on how many files you have, as the pathlib module has a bit of overhead over os.walk().




ANSWER 2

Score 123


Similar to other solutions, but using fnmatch.fnmatch instead of glob, since os.walk already listed the filenames:

import os, fnmatch


def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename


for filename in find_files('src', '*.c'):
    print 'Found C source:', filename

Also, using a generator alows you to process each file as it is found, instead of finding all the files and then processing them.




ANSWER 3

Score 93


I've modified the glob module to support ** for recursive globbing, e.g:

>>> import glob2
>>> all_header_files = glob2.glob('src/**/*.c')

https://github.com/miracle2k/python-glob2/

Useful when you want to provide your users with the ability to use the ** syntax, and thus os.walk() alone is not good enough.




ANSWER 4

Score 79


Starting with Python 3.4, one can use the glob() method of one of the Path classes in the new pathlib module, which supports ** wildcards. For example:

from pathlib import Path

for file_path in Path('src').glob('**/*.c'):
    print(file_path) # do whatever you need with these files

Update: Starting with Python 3.5, the same syntax is also supported by glob.glob().