The Python Oracle

SparkContext Error - File not found /tmp/spark-events does not exist

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: Sunrise at the Stream

--

Chapters
00:00 Sparkcontext Error - File Not Found /Tmp/Spark-Events Does Not Exist
01:21 Accepted Answer Score 45
01:38 Answer 2 Score 10
02:32 Answer 3 Score 1
02:52 Answer 4 Score 5
03:14 Thank you

--

Full question
https://stackoverflow.com/questions/3835...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #amazonwebservices #apachespark #amazonec2 #pyspark

#avk47



ACCEPTED ANSWER

Score 45


/tmp/spark-events is the location that Spark store the events logs. Just create this directory in the master machine and you're set.

$mkdir /tmp/spark-events
$ sudo /root/spark-ec2/copy-dir /tmp/spark-events/
RSYNC'ing /tmp/spark-events to slaves...
ec2-54-175-163-32.compute-1.amazonaws.com



ANSWER 2

Score 10


While trying to setup my spark history server on my local machine, I had the same 'File file:/tmp/spark-events does not exist.' error. I had customized my log directory to a non-default path. To resolve this, I needed to do 2 things.

  1. edit $SPARK_HOME/conf/spark-defaults.conf -- add these 2 lines spark.history.fs.logDirectory /mycustomdir spark.eventLog.enabled true
  2. create a link from /tmp/spark-events to /mycustomdir.
    ln -fs /tmp/spark-events /mycustomdir Ideally, step 1 would have solved my issue entirely, but i still needed to create the link so I suspect there might have been one other setting i missed. Anyhow, once I did this, i was able to run my historyserver and see new jobs logged in my webui.



ANSWER 3

Score 5


Use spark.eventLog.dir for client/driver program

spark.eventLog.dir=/usr/local/spark/history

and use spark.history.fs.logDirectory for history server

spark.history.fs.logDirectory=/usr/local/spark/history

as mentioned in: How to enable spark-history server for standalone cluster non hdfs mode

At least as per Spark version 2.2.1




ANSWER 4

Score 1


I just created /tmp/spark-events on the {master} node and then distributed it to other nodes on the cluster to work.

mkdir /tmp/spark-events
rsync -a /tmp/spark-events {slaves}:/tmp/spark-events

my spark-default.conf:

spark.history.ui.port=18080
spark.eventLog.enabled=true
spark.history.fs.logDirectory=hdfs:///home/elon/spark/events