How to run functions in parallel?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Book End

--

Chapters
00:00 Question
01:25 Accepted answer (Score 253)
02:26 Answer 2 (Score 48)
03:14 Answer 3 (Score 27)
04:54 Answer 4 (Score 17)
06:09 Thank you

--

Full question
https://stackoverflow.com/questions/7207...

Accepted answer links:
[threading]: http://docs.python.org/library/threading...
[multiprocessing]: http://docs.python.org/library/multiproc...
[peculiarities of CPython]: http://en.wikipedia.org/wiki/Global_Inte...

Answer 2 links:
[ThreadPoolExecutor]: https://docs.python.org/3/library/concur...
[multiprocessing]: https://docs.python.org/3/library/multip...

Answer 3 links:
[Ray]: https://github.com/ray-project/ray
[multiprocessing]: https://docs.python.org/2/library/multip...
[this related post]: https://stackoverflow.com/questions/2054...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #multithreading #multiprocessing

#avk47

ACCEPTED ANSWER

Score 273

You could use threading or multiprocessing.

Due to peculiarities of CPython, threading is unlikely to achieve true parallelism. For this reason, multiprocessing is generally a better bet.

Here is a complete example:

from multiprocessing import Process


def func1():
    print("func1: starting")
    for i in range(10000000):
        pass

    print("func1: finishing")


def func2():
    print("func2: starting")
    for i in range(10000000):
        pass

    print("func2: finishing")


if __name__ == "__main__":
    p1 = Process(target=func1)
    p1.start()
    p2 = Process(target=func2)
    p2.start()
    p1.join()
    p2.join()

The mechanics of starting/joining child processes can easily be encapsulated into a function along the lines of your runBothFunc:

def runInParallel(*fns):
  proc = []
  for fn in fns:
    p = Process(target=fn)
    p.start()
    proc.append(p)
  for p in proc:
    p.join()

runInParallel(func1, func2)

ANSWER 2

Score 61

If your functions are mainly doing I/O work (and less CPU work) and you have Python 3.2+, you can use a ThreadPoolExecutor:

from concurrent.futures import ThreadPoolExecutor

def run_io_tasks_in_parallel(tasks):
    with ThreadPoolExecutor() as executor:
        running_tasks = [executor.submit(task) for task in tasks]
        for running_task in running_tasks:
            running_task.result()

run_io_tasks_in_parallel([
    lambda: print('IO task 1 running!'),
    lambda: print('IO task 2 running!'),
])

If your functions are mainly doing CPU work (and less I/O work) and you have Python 3.2+, you can use a ProcessPoolExecutor:

from concurrent.futures import ProcessPoolExecutor

def run_cpu_tasks_in_parallel(tasks):
    with ProcessPoolExecutor() as executor:
        running_tasks = [executor.submit(task) for task in tasks]
        for running_task in running_tasks:
            running_task.result()

def task_1():
    print('CPU task 1 running!')

def task_2():
    print('CPU task 2 running!')

if __name__ == '__main__':
    run_cpu_tasks_in_parallel([
        task_1,
        task_2,
    ])

Alternatively if you only have Python 2.6+, you can use the multiprocessing module directly:

from multiprocessing import Process

def run_cpu_tasks_in_parallel(tasks):
    running_tasks = [Process(target=task) for task in tasks]
    for running_task in running_tasks:
        running_task.start()
    for running_task in running_tasks:
        running_task.join()

def task_1():
    print('CPU task 1 running!')

def task_2():
    print('CPU task 2 running!')

if __name__ == '__main__':
    run_cpu_tasks_in_parallel([
        task_1,
        task_2,
    ])

ANSWER 3

Score 33

This can be done elegantly with Ray, a system that allows you to easily parallelize and distribute your Python code.

To parallelize your example, you'd need to define your functions with the @ray.remote decorator, and then invoke them with .remote.

import ray

ray.init()

dir1 = 'C:\\folder1'
dir2 = 'C:\\folder2'
filename = 'test.txt'
addFiles = [25, 5, 15, 35, 45, 25, 5, 15, 35, 45]

# Define the functions. 
# You need to pass every global variable used by the function as an argument.
# This is needed because each remote function runs in a different process,
# and thus it does not have access to the global variables defined in 
# the current process.
@ray.remote
def func1(filename, addFiles, dir):
    # func1() code here...

@ray.remote
def func2(filename, addFiles, dir):
    # func2() code here...

# Start two tasks in the background and wait for them to finish.
ray.get([func1.remote(filename, addFiles, dir1), func2.remote(filename, addFiles, dir2)])

If you pass the same argument to both functions and the argument is large, a more efficient way to do this is using ray.put(). This avoids the large argument to be serialized twice and to create two memory copies of it:

largeData_id = ray.put(largeData)

ray.get([func1(largeData_id), func2(largeData_id)])

Important - If func1() and func2() return results, you need to rewrite the code as follows:

ret_id1 = func1.remote(filename, addFiles, dir1)
ret_id2 = func2.remote(filename, addFiles, dir2)
ret1, ret2 = ray.get([ret_id1, ret_id2])

There are a number of advantages of using Ray over the multiprocessing module. In particular, the same code will run on a single machine as well as on a cluster of machines. For more advantages of Ray see this related post.

ANSWER 4

Score 20

Seems like you have a single function that you need to call on two different parameters. This can be elegantly done using a combination of concurrent.futures and map with Python 3.2+

import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

def sleep_secs(seconds):
  time.sleep(seconds)
  print(f'{seconds} has been processed')

secs_list = [2,4, 6, 8, 10, 12]

Now, if your operation is IO bound, then you can use the ThreadPoolExecutor as such:

with ThreadPoolExecutor() as executor:
  results = executor.map(sleep_secs, secs_list)

Note how map is used here to map your function to the list of arguments.

Now, If your function is CPU bound, then you can use ProcessPoolExecutor

with ProcessPoolExecutor() as executor:
  results = executor.map(sleep_secs, secs_list)

If you are not sure, you can simply try both and see which one gives you better results.

Finally, if you are looking to print out your results, you can simply do this:

with ThreadPoolExecutor() as executor:
  results = executor.map(sleep_secs, secs_list)
  for result in results:
    print(result)