The Python Oracle

filecmp.cmp() ignoring differing os.stat() signatures?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Riding Sky Waves v001

--

Chapters
00:00 Question
02:08 Accepted answer (Score 8)
03:11 Answer 2 (Score 1)
03:36 Thank you

--

Full question
https://stackoverflow.com/questions/8045...

Question links:
[filecmp()]: https://docs.python.org/2/library/filecm...
[os.stat()]: https://docs.python.org/2/library/os.htm...
[documentation]: https://docs.python.org/3/library/filecm...
[os.stat()]: https://docs.python.org/3/library/os.htm...
[os.stat()]: https://docs.python.org/3/library/os.htm...

Accepted answer links:
[logical inverse]: http://en.wikipedia.org/wiki/Inverse_%28...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #filecomparison

#avk47



ACCEPTED ANSWER

Score 8


You're misunderstanding the documentation. Line #2 says:

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Files with identical os.stat() signatures are taken to be equal, but the logical inverse is not true: files with unequal os.stat() signatures are not necessarily taken to be unequal. Rather, they may be unequal, in which case the actual file contents are compared. Since the file contents are found to be identical, filecmp.cmp() returns True.

As per the third clause, once it determines that the files are equal, it will cache that result and not bother re-reading the file contents if you ask it to compare the same files again, so long as those files' os.stat structures don't change.




ANSWER 2

Score 1


It seems that 'rolling your own' is indeed what is required to produce a desirable result. It would simply be nice if the documentation were clear enough to make a casual reader reach that conclusion.

Here's the function I am presently using:

def cmp_stat_weak(a, b):
    sa = os.stat(a)
    sb = os.stat(b)
    return (sa.st_size == sb.st_size and sa.st_mtime == sb.st_mtime)