In C++, How to read one file with multiple threads?
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Mysterious Puzzle
--
Chapters
00:00 In C++, How To Read One File With Multiple Threads?
00:58 Accepted Answer Score 2
02:45 Thank you
--
Full question
https://stackoverflow.com/questions/1797...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#c++ #python #multithreading #readfile
#avk47
ACCEPTED ANSWER
Score 2
Reading a file requires making a syscall in any language or OS, which means making a call to the underlying operating system, and waiting for it to put the contents of the file into memory for you (assuming you pass the OS's security checks and all that). Multi-threading the file read would indeed slow you down since you'd be making more syscalls which pops you out of program execution and hands control over to the operating system.
As such, the best suggestion is the hyde's - perhaps split out the parsing of the file to multiple threads if need be. If you're able to parse a file that large in a matter of a few seconds though, I'd say it's not really worth it. For example, if you're running a graphical application, you definitely want to keep a separate thread for file loading so you don't freeze your UI.
On the matter of speed, I'd guess there's two primary issues. Firstly, I suspect python reads it's files through a memory buffer by default, which would speed execution. If you can buffer your file reads (so you can make fewer syscalls), you might see some performance gains. The other issue would be which data structures you're using in Python and C++ to load/parse the data. Without knowing your code, I can't suggest anything specific, but taking a little time to research/think about the different data structures applicable to your program might be useful. Keep in mind Python's and C++'s data structures have very different performance profiles, so one that works well in Python may be a much worse choice in C++.
Edit: A simple sample of using file buffering in C++ STL from http://www.cplusplus.com/reference/
// read a file into buffer - sgetn() example
#include <iostream> // std::cout, std::streambuf, std::streamsize
#include <fstream> // std::ifstream
int main () {
char* contents;
std::ifstream istr ("test.txt");
if (istr) {
std::streambuf * pbuf = istr.rdbuf();
std::streamsize size = pbuf->pubseekoff(0,istr.end);
pbuf->pubseekoff(0,istr.beg); // rewind
contents = new char [size];
pbuf->sgetn (contents,size);
istr.close();
std::cout.write (contents,size);
}
return 0;
}