Memory usage keep growing with Python's multiprocessing.pool -
here's program:
#!/usr/bin/python import multiprocessing def dummy_func(r): pass def worker(): pass if __name__ == '__main__': pool = multiprocessing.pool(processes=16) index in range(0,100000): pool.apply_async(worker, callback=dummy_func) # clean pool.close() pool.join()
i found memory usage (both virt , res) kept growing till close()/join(), there solution rid of this? tried maxtasksperchild 2.7 didn't either.
i have more complicated program calles apply_async() ~6m times, , @ ~1.5m point i've got 6g+ res, avoid other factors, simplified program above version.
edit:
turned out version works better, everyone's input:
#!/usr/bin/python import multiprocessing ready_list = [] def dummy_func(index): global ready_list ready_list.append(index) def worker(index): return index if __name__ == '__main__': pool = multiprocessing.pool(processes=16) result = {} index in range(0,1000000): result[index] = (pool.apply_async(worker, (index,), callback=dummy_func)) ready in ready_list: result[ready].wait() del result[ready] ready_list = [] # clean pool.close() pool.join()
i didn't put lock there believe main process single threaded (callback more or less event-driven thing per docs read).
i changed v1's index range 1,000,000, same v2 , did tests - it's weird me v2 ~10% faster v1 (33s vs 37s), maybe v1 doing many internal list maintenance jobs. v2 winner on memory usage, never went on 300m (virt) , 50m (res), while v1 used 370m/120m, best 330m/85m. numbers 3~4 times testing, reference only.
i had memory issues recently, since using multiple times multiprocessing function, keep spawning processes, , leaving them in memory.
here's solution i'm using now:
def myparallelprocess(ahugearray) multiprocessing import pool contextlib import closing closing( pool(15) ) p: res = p.imap_unordered(simple_matching, ahugearray, 100) return res
i ❤ with
Comments
Post a Comment