The Python Oracle

Find a file in python

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Realization

--

Chapters
00:00 Find A File In Python
00:18 Accepted Answer Score 347
00:42 Answer 2 Score 52
01:11 Answer 3 Score 28
01:39 Answer 4 Score 8
02:11 Answer 5 Score 4
02:37 Thank you

--

Full question
https://stackoverflow.com/questions/1724...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python

#avk47



ACCEPTED ANSWER

Score 347


os.walk is the answer, this will find the first match:

import os

def find(name, path):
    for root, dirs, files in os.walk(path):
        if name in files:
            return os.path.join(root, name)

And this will find all matches:

def find_all(name, path):
    result = []
    for root, dirs, files in os.walk(path):
        if name in files:
            result.append(os.path.join(root, name))
    return result

And this will match a pattern:

import os, fnmatch
def find(pattern, path):
    result = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result.append(os.path.join(root, name))
    return result

find('*.txt', '/path/to/dir')



ANSWER 2

Score 28


I used a version of os.walk and on a larger directory got times around 3.5 sec. I tried two random solutions with no great improvement, then just did:

paths = [line[2:] for line in subprocess.check_output("find . -iname '*.txt'", shell=True).splitlines()]

While it's POSIX-only, I got 0.25 sec.

From this, I believe it's entirely possible to optimise whole searching a lot in a platform-independent way, but this is where I stopped the research.




ANSWER 3

Score 8


If you are using Python on Ubuntu and you only want it to work on Ubuntu a substantially faster way is the use the terminal's locate program like this.

import subprocess

def find_files(file_name):
    command = ['locate', file_name]

    output = subprocess.Popen(command, stdout=subprocess.PIPE).communicate()[0]
    output = output.decode()

    search_results = output.split('\n')

    return search_results

search_results is a list of the absolute file paths. This is 10,000's of times faster than the methods above and for one search I've done it was ~72,000 times faster.




ANSWER 4

Score 4


If you are working with Python 2 you have a problem with infinite recursion on windows caused by self-referring symlinks.

This script will avoid following those. Note that this is windows-specific!

import os
from scandir import scandir
import ctypes

def is_sym_link(path):
    # http://stackoverflow.com/a/35915819
    FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
    return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(unicode(path)) & FILE_ATTRIBUTE_REPARSE_POINT)

def find(base, filenames):
    hits = []

    def find_in_dir_subdir(direc):
        content = scandir(direc)
        for entry in content:
            if entry.name in filenames:
                hits.append(os.path.join(direc, entry.name))

            elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
                try:
                    find_in_dir_subdir(os.path.join(direc, entry.name))
                except UnicodeDecodeError:
                    print "Could not resolve " + os.path.join(direc, entry.name)
                    continue

    if not os.path.exists(base):
        return
    else:
        find_in_dir_subdir(base)

    return hits

It returns a list with all paths that point to files in the filenames list. Usage:

find("C:\\", ["file1.abc", "file2.abc", "file3.abc", "file4.abc", "file5.abc"])