The Python Oracle

Can hash algorithm such as MD5/SHA-1 generate an ID with less probability of collision than pure random number?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Cosmic Puzzle

--

Chapters
00:00 Can Hash Algorithm Such As Md5/Sha-1 Generate An Id With Less Probability Of Collision Than Pure Ran
01:05 Accepted Answer Score 2
02:18 Thank you

--

Full question
https://stackoverflow.com/questions/5123...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #random #md5 #uuid

#avk47



ACCEPTED ANSWER

Score 2


Generating X number of bytes of random data gives exactly the same collision probability as using the hash function on some ID's...

ASSUMING...

  1. The columns you're using the hash function on are themselves unique.
  2. You haven't made mistakes doing #1

I would recommend using the system's cryptographic random number provider. Because you've probably made mistakes. Here's an easy one:

Your system: Concatenate column 1 and column 2, and hash the result. You can guarantee you'll never ever do this on those values of column 1 and column 2 ever again. NEVER.

What about when:

  1. Column 1 = "abc"
  2. Column 2 = "def"

vs

  1. Column 1 = "ab"
  2. Column 2 = "cdef"

Those would create the same hash function.

So who would you trust more to give you random data? Yourself? Or a team of operating system developers including cryptography experts and decades of research and experience? :)

Go with the system's cryptographic random function.