Difficulty comparing generated and google cloud storage provided CRC32c checksums
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow
--
Chapters
00:00 Question
01:27 Accepted answer (Score 11)
02:32 Answer 2 (Score 1)
03:05 Answer 3 (Score 0)
03:45 Thank you
--
Full question
https://stackoverflow.com/questions/3736...
Question links:
[blob.crc32c]: https://gcloud-python.readthedocs.io/en/...
[crcmod]: http://crcmod.sourceforge.net/crcmod.pre...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #googlecloudstorage #crc32 #gcloudpython #googlecloudpython
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow
--
Chapters
00:00 Question
01:27 Accepted answer (Score 11)
02:32 Answer 2 (Score 1)
03:05 Answer 3 (Score 0)
03:45 Thank you
--
Full question
https://stackoverflow.com/questions/3736...
Question links:
[blob.crc32c]: https://gcloud-python.readthedocs.io/en/...
[crcmod]: http://crcmod.sourceforge.net/crcmod.pre...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #googlecloudstorage #crc32 #gcloudpython #googlecloudpython
#avk47
ACCEPTED ANSWER
Score 12
Here's an example md5 and crc32c for the gsutil public tarball:
$ gsutil ls -L gs://pub/gsutil.tar.gz | grep Hash
Hash (crc32c): vHI6Bw==
Hash (md5): ph7W3cCoEgMQWvA45Z9y9Q==
I'll copy it locally to work with:
$ gsutil cp gs://pub/gsutil.tar.gz /tmp/
Copying gs://pub/gsutil.tar.gz...
Downloading file:///tmp/gsutil.tar.gz: 2.59 MiB/2.59 MiB
CRC values are usually displayed as unsigned 32-bit integers. To convert it:
>>> import base64
>>> import struct
>>> struct.unpack('>I', base64.b64decode('vHI6Bw=='))
(3161602567,)
To obtain the same from the crcmod library:
>>> file_bytes = open('/tmp/gsutil.tar.gz', 'rb').read()
>>> import crcmod
>>> crc32c = crcmod.predefined.Crc('crc-32c')
>>> crc32c.update(file_bytes)
>>> crc32c.crcValue
3161602567L
If you want to convert the value from crcmod to the same base64 format used by gcloud/gsutil:
>>> base64.b64encode(crc32c.digest()).decode('utf-8')
'vHI6Bw=='
ANSWER 2
Score 2
In 2022 I still had trouble finding a definitive answer. Here's what I came up with that seems to work with large files.
import google_crc32c
import collections
def generate_file_crc32c(path, blocksize=2**20):
"""
Generate a base64 encoded crc32c checksum for a file to compare with google cloud storage.
Returns a string like "4jvPnQ=="
Compare with a google storage blob instance:
blob.crc32c == generate_file_crc32c("path/to/local/file.txt")
"""
crc = google_crc32c.Checksum()
read_stream = open(path, "rb")
collections.deque(crc.consume(read_stream, blocksize), maxlen=0)
read_stream.close()
return base64.b64encode(crc.digest()).decode("utf-8")
ANSWER 3
Score 1
From the linked documentation: "CRC32c checksum, as described in RFC 4960, Appendix B; encoded using base64 in big-endian byte order"
It looks like you are not decoding the base64 string.
If you are on a Windows machine, you would need to open the text file in binary mode.