Python recursive folder read
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Jungle Looping
--
Chapters
00:00 Python Recursive Folder Read
00:45 Accepted Answer Score 470
02:15 Answer 2 Score 45
02:35 Answer 3 Score 281
03:02 Answer 4 Score 41
05:35 Thank you
--
Full question
https://stackoverflow.com/questions/2212...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #scripting #fileio
#avk47
ACCEPTED ANSWER
Score 470
Make sure you understand the three return values of os.walk:
for root, subdirs, files in os.walk(rootdir):
has the following meaning:
root: Current path which is "walked through"subdirs: Files inrootof type directoryfiles: Files inroot(not insubdirs) of type other than directory
And please use os.path.join instead of concatenating with a slash! Your problem is filePath = rootdir + '/' + file - you must concatenate the currently "walked" folder instead of the topmost folder. So that must be filePath = os.path.join(root, file). BTW "file" is a builtin, so you don't normally use it as variable name.
Another problem are your loops, which should be like this, for example:
import os
import sys
walk_dir = sys.argv[1]
print('walk_dir = ' + walk_dir)
# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))
for root, subdirs, files in os.walk(walk_dir):
    print('--\nroot = ' + root)
    list_file_path = os.path.join(root, 'my-directory-list.txt')
    print('list_file_path = ' + list_file_path)
    with open(list_file_path, 'wb') as list_file:
        for subdir in subdirs:
            print('\t- subdirectory ' + subdir)
        for filename in files:
            file_path = os.path.join(root, filename)
            print('\t- file %s (full path: %s)' % (filename, file_path))
            with open(file_path, 'rb') as f:
                f_content = f.read()
                list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
                list_file.write(f_content)
                list_file.write(b'\n')
If you didn't know, the with statement for files is a shorthand:
with open('filename', 'rb') as f:
    dosomething()
# is effectively the same as
f = open('filename', 'rb')
try:
    dosomething()
finally:
    f.close()
ANSWER 2
Score 281
If you are using Python 3.5 or above, you can get this done in 1 line.
import glob
# root_dir needs a trailing slash (i.e. /root/dir/)
for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
     print(filename)
As mentioned in the documentation
If recursive is true, the pattern '**' will match any files and zero or more directories and subdirectories.
If you want every file, you can use
import glob
for filename in glob.iglob(root_dir + '**/**', recursive=True):
     print(filename)
ANSWER 3
Score 45
Agree with Dave Webb, os.walk will yield an item for each directory in the tree. Fact is, you just don't have to care about subFolders.
Code like this should work:
import os
import sys
rootdir = sys.argv[1]
for folder, subs, files in os.walk(rootdir):
    with open(os.path.join(folder, 'python-outfile.txt'), 'w') as dest:
        for filename in files:
            with open(os.path.join(folder, filename), 'r') as src:
                dest.write(src.read())
ANSWER 4
Score 41
TL;DR: These are equivalents to find -type f, to go over all files in all folders below and including the current one:
folder = '.'
import os
for currentpath, folders, files in os.walk(folder):
    for file in files:
        print(os.path.join(currentpath, file))
## or:
import glob
for pathstr in glob.iglob(glob.escape(folder) + '/**/*', recursive=True):
    print(pathstr)
Comparing the two methods:
os.walkis about 3× fasteros.walkused slightly more memory in my test because thefilesarray held 82k entries whereasglobreturns an iterator and streams the results. The piecemeal handling of each result (more calls and less buffering going on) likely explains the speed difference- If you forget the 
iinglob.iglob(), it will return a list rather than an iterator and potentially use a lot more memory 
- If you forget the 
 os.walkwill not silently give incomplete results or unexpectedly interpret a name as a matching patternglobdoesn't show empty directoriesglobneeds to escape directory and file names usingglob.escape(name)because they can contain special charactersglobexcludes directories and files starting with a dot (e.g.,~/.bashrcor~/.vim) andinclude_hiddendoes not solve that (it includes hidden folders only; you need to specify a second pattern for dotfiles)globdoesn't tell you what is a file and what a directoryglobwalks into symlinks and may lead to you enumerating a lot of files in completely different places (which may be what you want; in that case,os.walkhasfollowlinks=Trueas an option)os.walklets you modify which paths to walk down by modifying the folders array while it's running, though personally this feels a bit messy and I'm not sure I would recommend that
Other answers already mentioned os.walk(), but it could be explained better. It's quite simple! Let's walk through this tree:
docs/
└── doc1.odt
pics/
todo.txt
With this code:
for currentpath, folders, files in os.walk('.'):
    print(currentpath)
The currentpath is the current folder it is looking at. This will output:
.
./docs
./pics
So it loops three times, because there are three folders: the current one, docs, and pics. In every loop, it fills the variables folders and files with all folders and files. Let's show them:
for currentpath, folders, files in os.walk('.'):
    print(currentpath, folders, files)
This shows us:
# currentpath  folders           files
.              ['pics', 'docs']  ['todo.txt']
./pics         []                []
./docs         []                ['doc1.odt']
So in the first line, we see that we are in folder ., that it contains two folders namely pics and docs, and that there is one file, namely todo.txt. You don't have to do anything to recurse into those folders, because as you see, it recurses automatically and just gives you the files in any subfolders. And any subfolders of that (though we don't have those in the example).
If you just want to loop through all files, the equivalent of find -type f, you can do this:
for currentpath, folders, files in os.walk('.'):
    for file in files:
        print(os.path.join(currentpath, file))
This outputs:
./todo.txt
./docs/doc1.odt