The Python Oracle

filecmp.cmp() ignoring differing os.stat() signatures?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Drifting Through My Dreams

--

Chapters
00:00 Filecmp.Cmp() Ignoring Differing Os.Stat() Signatures?
01:44 Accepted Answer Score 8
02:31 Answer 2 Score 1
02:50 Thank you

--

Full question
https://stackoverflow.com/questions/8045...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #filecomparison

#avk47



ACCEPTED ANSWER

Score 8


You're misunderstanding the documentation. Line #2 says:

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Files with identical os.stat() signatures are taken to be equal, but the logical inverse is not true: files with unequal os.stat() signatures are not necessarily taken to be unequal. Rather, they may be unequal, in which case the actual file contents are compared. Since the file contents are found to be identical, filecmp.cmp() returns True.

As per the third clause, once it determines that the files are equal, it will cache that result and not bother re-reading the file contents if you ask it to compare the same files again, so long as those files' os.stat structures don't change.




ANSWER 2

Score 1


It seems that 'rolling your own' is indeed what is required to produce a desirable result. It would simply be nice if the documentation were clear enough to make a casual reader reach that conclusion.

Here's the function I am presently using:

def cmp_stat_weak(a, b):
    sa = os.stat(a)
    sb = os.stat(b)
    return (sa.st_size == sb.st_size and sa.st_mtime == sb.st_mtime)