Skip to content

Instantly share code, notes, and snippets.

@dboyliao
Last active February 24, 2021 16:30
Show Gist options
  • Select an option

  • Save dboyliao/7b886587910000709dad5db616865227 to your computer and use it in GitHub Desktop.

Select an option

Save dboyliao/7b886587910000709dad5db616865227 to your computer and use it in GitHub Desktop.
multiprocessing using global variable as read-only data (Bad idea)
#!/usr/bin/env python3
#-*- coding:utf8 -*-
from __future__ import print_function
import numpy as np
import multiprocessing as mp
import sys
from multiprocessing.queues import Empty
SHARE_DATA = None
def worker(in_queue):
pid = mp.current_process().pid
while True:
try:
i = in_queue.get_nowait()
print("[{}] {}".format(pid, SHARE_DATA[i]), flush=True)
in_queue.task_done()
except Empty:
pass
def main():
in_queue = mp.JoinableQueue()
for _ in range(5):
p = mp.Process(target=worker, args=(in_queue,))
p.daemon = True
p.start()
global SHARE_DATA
print(SHARE_DATA, flush=True)
for i in range(5):
in_queue.put(i)
in_queue.join()
return 0
if __name__ == "__main__":
SHARE_DATA = np.arange(10).reshape(5, 2)
sys.exit(main())
@dboyliao
Copy link
Author

Sorry, it's is a bad example ~
Basically, processes do not 'share' global variables, they copy it.
I simply forgot this fact, my bad.

Maybe slice the data I need and only put slices in the queue?
As far as I know, multiprocess will pickle the args first before passing it to worker function.
So I think this may solve the memory issue a bit but not all.

The memory efficient ways I can think of now is using database or threading.
With threading, all threads can share a read-only data in the memory.
Or load the data in the worker function.

That's it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment