-
-
Save EdwinChan/3c13d3a746bb3ec5082f to your computer and use it in GitHub Desktop.
import concurrent.futures | |
import multiprocessing | |
import sys | |
import uuid | |
def globalize(func): | |
def result(*args, **kwargs): | |
return func(*args, **kwargs) | |
result.__name__ = result.__qualname__ = uuid.uuid4().hex | |
setattr(sys.modules[result.__module__], result.__name__, result) | |
return result | |
def main(): | |
@globalize | |
def func1(x): | |
return x | |
func2 = globalize(lambda x: x) | |
with multiprocessing.Pool() as pool: | |
print(pool.map(func1, range(10))) | |
print(pool.map(func2, range(10))) | |
with concurrent.futures.ThreadPoolExecutor() as executor: | |
print(list(executor.map(func1, range(10)))) | |
print(list(executor.map(func2, range(10)))) | |
if __name__ == '__main__': | |
main() |
@Antyos My pleasure!
Regarding your main()
suggestion, you could do something like:
class main:
def func(x):
return x
However, there's no need to globalize(main.func)
: as long as the class main
is visible when the script is imported as a module, multiprocessing
and concurrent.futures
can find its member main.func()
just fine.
Regarding name collision, all lambda functions are called <lambda>
, and we may also want to distinguish between lambda functions with the same code, so a smarter way of name mangling would be needed.
This is a clever idea, but beware that the locals are captured and retained indefinitely by the module namespace, leading to a memory leak if it is called repeatedly. A variation that avoids the memory leak is to use a context manager rather than a decorator, like so:
from contextlib import contextmanager
import concurrent.futures
import multiprocessing
import sys
import uuid
@contextmanager
def globalized(func):
namespace = sys.modules[func.__module__]
name, qualname = func.__name__, func.__qualname__
func.__name__ = func.__qualname__ = f'_{name}_{uuid.uuid4().hex}'
setattr(namespace, func.__name__, func)
try:
yield
finally:
delattr(namespace, func.__name__)
func.__name__, func.__qualname__ = name, qualname
def main():
def func1(x):
return x
func2 = lambda x: x
with globalized(func1), globalized(func2), multiprocessing.Pool() as pool:
print(pool.map(func1, range(10)))
print(pool.map(func2, range(10)))
with concurrent.futures.ThreadPoolExecutor() as executor:
print(list(executor.map(func1, range(10))))
print(list(executor.map(func2, range(10))))
if __name__ == '__main__':
main()
@EdwinChan Thanks for sharing this. Sorry to bother but could you clarify a bit more why exactly this works? I'm not sure I get this right - are objects at the global level still pickled and unpickled, or does this solution rely on the fact that the process gets forked, and so it's directly accessible as a global object even in the forked process? (in unix systems)
It seems that as far as pickling functions goes, it's just the name that gets pickled, and so making the function global really just allows for that to happen - to pickle/unpickle the name. So it looks like this solution does heavily rely on the fact that, the function will exist in memory of the replicated process - due to either forking, or due to the fact that the function will deterministically recreated (windows). Do I get that right?
@pk1234dva As far as I understand it, multiprocessing
pickles the function name so that the worker processes know where to start. The trick here simply automates the creation of wrappers that can be pickled. Nothing else is pickled, neither in the global scope nor in the local scope, so the module must still be available in some form for the program to work. Whether the worker processes inherit the module in memory as a result of a fork or import the module anew is mostly an implementation detail.
@EdwinChan Thank you.
@EdwinChan, Thanks for the fix! I vaguely understand how it works, though the more I delve into programming, the more I find Windows to be a pain for "reasons".
It's unfortunate that the nature of Windows puts a significant limitation on the usefulness of your
globalize()
decorator, but thankfully there's always WSL.I wonder if it would be possible to create a local function in
main()
, then globalize it outside, e.g.Also, to address the possible namespace collisions, maybe you could use some hashing algorithm from hashlib like md5 or sha256.