Perform Set Difference on RDDs in Spark Python
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping
--
Chapters
00:00 Question
01:33 Accepted answer (Score 6)
01:57 Thank you
--
Full question
https://stackoverflow.com/questions/3284...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #apachespark #rdd #setdifference
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping
--
Chapters
00:00 Question
01:33 Accepted answer (Score 6)
01:57 Thank you
--
Full question
https://stackoverflow.com/questions/3284...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #apachespark #rdd #setdifference
#avk47
ACCEPTED ANSWER
Score 6
This seems like something you can solve with a subtractByKey
val filteredA = a.subtractByKey(b)
To change to a key value:
val keyValRDD = rdd.map(lambda x: (x[:1],x[1:]))
*Note that my python is weak and there might be better ways to split the values