The Python Oracle

Perform Set Difference on RDDs in Spark Python

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5 Looping

--

Chapters
00:00 Perform Set Difference On Rdds In Spark Python
00:58 Accepted Answer Score 6
01:15 Thank you

--

Full question
https://stackoverflow.com/questions/3284...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #apachespark #rdd #setdifference

#avk47



ACCEPTED ANSWER

Score 6


This seems like something you can solve with a subtractByKey

val filteredA = a.subtractByKey(b)

To change to a key value:

val keyValRDD = rdd.map(lambda x: (x[:1],x[1:]))

*Note that my python is weak and there might be better ways to split the values