- misc tips (building, testing, editor setup)
- module/import system
- byte code
- compilation
- execution
- generator/coroutine/async/await implementation
- thread management
- interpreter initialization
- acquire/release GIL
- object/class system
- memory management
# Building out of tree
mkdir -p __out__ && cd __out__
../configure --with-pydebug --prefix="$PWD/__prefix__" CC=clang CXX=clang++
make -j
make install
# Testing (cf. Tools/scripts/run_test.py, Lib/test/libregrtest)
make test TESTOPTS=--help # via `make test`
./python -m test --help # directly run `Lib/test` module
./python -m test -v test_generators -m GeneratorTest # filter by module and test case name
-
ignore directories e.g.
__out__, Doc, Misc, PC, PCbuild, Tools/msi, Mac
-
.vscode/settings.json
{
"python.defaultInterpreterPath": "${workspaceFolder}/__out__/python",
"python.pythonPath": "${workspaceFolder}/__out__/python"
}
.vscode/c_cpp_properties.json
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/Include/internal",
"${workspaceFolder}/Include",
"${workspaceFolder}/Objects",
"${workspaceFolder}/Python",
"${workspaceFolder}",
"${workspaceFolder}/__out__"
],
"defines": [],
"compilerPath": "/usr/bin/clang",
"cStandard": "c17",
"cppStandard": "c++14",
"intelliSenseMode": "linux-clang-x64"
}
],
"version": 4
}
.vscode/launch.json
(for Linux)
{
"version": "0.2.0",
"configurations": [
{
"name": "cppdbg",
"type": "cppdbg",
"request": "launch",
"program": "${workspaceFolder}/__out__/python",
"args": [],
"stopAtEntry": false,
"cwd": "${fileDirname}",
"environment": [],
"externalConsole": false,
"MIMode": "gdb",
"miDebuggerPath": "/usr/bin/gdb"
}
]
}
- with lldb
lldb ./python -o "b _PyEval_EvalFrameDefault" -o "r"
- with vscode
- use launch.json above
_PyRuntimeState (global variable defined in `pylifecycle.c`)
main_thread (id of main thread)
pyinterpreters interpreters
PyInterpreterState* main, head (as linked list)
_gilstate_runtime_state gilstate
PyThreadState* tstate_current (atomic address)
PyInterpreterState*
_ceval_runtime_state
_gil_runtime_state (synchronizaton primitives e.g. locked state variable, condition variable for unlock notification)
PyInterpreterState
_PyRuntimeState* (back pointer)
pythreads threads
PyThreadState* (as doubly linked list)
_ceval_state ceval
atomic int gil_drop_request
_gc_runtime_state gc (TODO)
PyThreadState
PyInterpreterState* (back pointer)
CFrame* cframe
PyObject** datestack_top
pymain_main =>
pymain_init => Py_InitializeFromConfig =>
_PyRuntime_Initialize => _PyRuntimeState_Init => init_runtime =>
_PyEval_InitRuntimeState => _gil_initialize (initialize `_gil_runtime_state`)
pyinit_core => pyinit_config =>
pycore_init_runtime
pycore_create_interpreter =>
_PyGILState_Init
PyInterpreterState_New =>
alloc_interpreter (on `_PyRuntimeState.interpreters.main`)
init_interpreter
PyThreadState_New =>
new_threadstate (allocate on `PyInterpreterState.threads`)
_PyThreadState_SetCurrent (set current `PyThreadState` as `_PyRuntimeState.gilstate.autoTSSKey`)
PyThreadState_Swap => _PyThreadState_Swap (set `_PyRuntimeState.gilstate.tstate_current` global)
init_interp_create_gil =>
_PyEval_InitGIL =>
create_gil (on `_PyRuntimeState.ceval.gil` global) =>
initialize e.g. condition variable, mutex, then set locked = 0
take_gil(tstate) =>
(while locked)
wait for gil condition variable with timeout
if timed out, SET_GIL_DROP_REQUEST (set PyInterpreterState.ceval.gil_drop_request = 1)
(otherwise)
set locked = 1
pycore_interp_init(tstate) =>
pycore_init_global_objects
_PyGC_Init
pycore_init_types (define builtin types e.g. `type`, `tuple`, `Exception`)
pycore_init_builtins => _PyBuiltin_Init (define `builtin` module)
init_importlib =>
PyImport_ImportFrozenModule("_frozen_importlib") (aka. importlib/_bootstrap.py)
_PyImport_BootstrapImp (aka "_imp" c extension (cf. PyInit__imp))
PyObject_CallMethod (call "_frozen_importlib._install") (append default importers to `sys.meta_path`)
pyinit_main => init_interp_main =>
init_importlib_external (call "_frozen_importlib._install_external_importers") =>
import _frozen_importlib_external (aka importlib/_bootstrap_exxternal.py)
_frozen_importlib_external._install =>
append `FileFinder` to `sys.path_books`
append `PathFinder` to `sys.meta_path`
add_main_module
Py_RunMain => pymain_run_python =>
pymain_run_module (when "python -m ...") =>
invoke "runpy._run_module_as_main" via C API
- default finder/loader (importlib/_bootstrap_external.py)
[python]
importlib._boostrap._find_and_load (cf. `TARGET(IMPORT_NAME)` in ceval.c below) =>
_find_and_load_unlocked =>
_find_spec => PathFinder.find_spec => _get_spec
FileFinder.find_spec =>
(iterate loaders e.g. SourceFileLoader)
_get_spec => spec_from_file_location
_load_unlocked =>
module_from_spec (setup module.__name__, etc...)
SourceFileLoader.exec_module (_LoaderBasics.exec_module) =>
get_code (SourceLoader.get_code) =>
cache_from_source (return __pycache__/(name).cpython-39.pyc)
(if .pyc is found) _compile_bytecode => marshal.loads
(otherwise)
source_to_code => compile (with "exec" flag)
_cache_bytecode
exec(code, module.__dict__)
-
https://github.com/python/cpython/blob/main/Lib/opcode.py
-
https://github.com/python/cpython/blob/main/Python/compile.c
-
https://github.com/python/cpython/blob/main/Python/symtable.c
-
static analysis for generator's "yield" and coroutine's "await"
symtable_visit_expr
compute_code_flags
_PyAST_Compile =>
_PyAST_Optimize
_PySymtable_Build =>
(1st pass: traverse AST with `symtable_visit_???` and construct `PySTEntryObject`)
symtable_enter_block
symtable_visit_stmt/expr =>
(e.g. ClassDef)
symtable_add_def(<class name>, DEF_LOCAL)
symtable_enter_block
VISIT_SEQ(st, stmt, <class body>) => ...
symtable_exit_block
symtable_exit_block
(2nd pass: traverse `PySTEntryObject` by `analyze_block` and `analyze_child_block`)
symtable_analyze =>
analyze_block
analyze_name
analyze_child_block
compiler_mod =>
(traverse AST with `compiler_???`)
compiler_body => compiler_visit_stmt =>
(e.g. FunctionDef)
compiler_function => ...
assemble (returns PyCodeObject) =>
insert_prefix_instructions (GEN_START opcode for generator like function)
normalization, optimization, ...?
makecode =>
compute_code_flags (e.g. CO_GENERATOR, CO_COROUTINE, ...)
...
- Example:
FunctionDef
compiler_function =>
compiler_default_arguments =>
compiler_visit_defaults =>
VISIT_SEQ (compiler_visit_expr for default argument expressions)
ADDOP_I(... BUILD_TUPLE ...) => compiler_addop_i =>
compiler_addop_i_line (construct `struct instr` on `compiler ~> compiler_unit ~> basicblock`)
compiler_enter_scope =>
(construct `compiler_unit` on `compiler ~> compiler_unit`)
compiler_new_block
VISIT_IN_SCOPE (compiler_visit_stmt for body)
assemble => ...
compiler_exit_scope
compiler_make_closure =>
ADDOP_LOAD_CONST (for PyCodeObject)
ADDOP_I MAKE_FUNCTION
compiler_nameop (Store `name`) => LOAD_NAME
- Example:
ClassDef
compiler_visit_stmt => compiler_class =>
compiler_enter_scope
... compiler_body and assemble to PyCodeObject ...
compiler_exit_scope
opcode LOAD_BUILD_CLASS
compiler_make_closure with class body PyCodeObject => ... opcode MAKE_FUNCTION
compiler_call_helper => ... (see compiler_call below)
- Example:
expr_ty
(symtable)
symtable_visit_expr =>
(if `yield` or `yield from`)
set PySTEntryObject.ste_generator
(if `await`)
set PySTEntryObject.ste_coroutine
(compiler)
compiler_visit_expr =>
compiler_visit_expr1 =>
(if function call)
compiler_call =>
visit callee expresson
compiler_call_helper =>
visit argument expressions
ADDOP_I CALL_FUNCTION (with number of arguments)
(for "star" argument (e.g. *args, **kwargs) CALL_FUNCTION_EX)
(if await expression)
visit the operand of `await`
ADDOP GET_AWAITABLE
ADDOP_LOAD_CONST Py_None
ADDOP YIELD_FROM
(if yield expression)
visit the operand of `await`
ADDOP YIELD_VALUE
(if identifier (aka Name_kind))
compiler_nameop
- LOAD_NAME
- CALL_FUNCTION
- MAKE_FUNCTION
- RETURN_VALUE
- LOAD_BUILD_CLASS
PyEval_EvalCode (given PyCodeObject) =>
_PyThreadState_GET
_PyFunction_FromConstructor
_PyEval_Vector =>
_PyEvalFramePushAndInit
_PyEval_EvalFrame =>
_PyEvalFramePushAndInit (take `InterpreterFrame` from `.datastack_top`)
_PyEval_EvalFrameDefault => ...
_PyEvalFrameClearAndPop
_PyEval_EvalFrameDefault
local variables: opcode, oparg, retval, next_instr, stack_pointer, ...
initialize `next_instr` by `InterpreterFrame.f_lasti`
initialize `stack_pointer` by `_PyFrame_GetStackPointer` (stack machine with `PyObject`s on its stack)
DISPATCH =>
NEXTOPARG =>
set `opcode` and `oparg` from next_instr
INSTRUCTION_START =>
DISPATCH_GOTO (goto based on `opcode_targets` array e.g. TARGET_CALL_FUNCTION)
TARGET(LOAD_NAME) =>
INSTRUCTION_START (update `InterpreterFrame.f_lasti` and `next_instr`)
load `name` from `locals` dict
PUSH (push found PyObject to stack_pointer)
DISPATCH => ...
TARGET(CALL_FUNCTION) =>
PEEK (get callable PyObject from stack at `oparg + 1`)
PyObject_Vectorcall (with passing `stack_pointer` as arguments) => ...
PUSH (push result to stack_pointer)
(if result == NULL) goto error (to propagate exception)
TARGET(RETURN_VALUE) =>
POP to `retval`
(if InterpreterFrame.depth > 0 i.e. "internal function call" without growing c stack e.g. via `CALL_FUNCTION_PY_SIMPLE`)
_PyFrame_StackPush (push `retval` to the previous frame's stacktop)
_PyEvalFrameClearAndPop
goto resume_frame
(otherwise)
return retval
TARGET(LOAD_BUILD_CLASS)
push builtin function "__build_class__"
which will be called with class body closure
TARGET(MAKE_FUNCTION) =>
POP PyCodeObject from stack
PyFunction_New => PyFunction_NewWithQualName =>
construct PyFunctionObject with vectorcall = _PyFunction_Vectorcall
...
_PyFunction_Vectorcall =>
_PyEval_Vector (actually the same one called from `PyEval_EvalCode`) =>
(if CO_GENERATOR, CO_COROUTINE, etc...) make_coro =>
make_coro_frame (not sure what's special)
_Py_MakeCoro => make_gen (with PyGen_Type, PyAsyncGen_Type, or PyCoro_Type) =>
set `InterpreterFrame.generator`
(otherwise)
_PyEvalFramePushAndInit
_PyEval_EvalFrame => ...
_PyEvalFrameClearAndPop
TARGET(GET_AWAITABLE) =>
_PyCoro_GetAwaitableIter =>
(if courtine) return coroutine
(otherwise) call `tp_as_async->am_await` (whose result must pass `PyIter_Check`)
SET_TOP iterator
...
TARGET(IMPORT_NAME) => import_name =>
PyImport_ImportModuleLevelObject (if `builtins.__import__` not changed) =>
import_get_module (check if already in sys.modules)
import_find_and_load (delegate to "importlib._bootstrap._find_and_load")
...
(occasionally jump to `check_eval_breaker` for `gil_drop_request` etc...)
eval_frame_handle_pending =>
(if gil_drop_request)
drop_gil
take_gil (will block if other threads took gil first)
References
Summary
- user defined class
- user defined slots (e.g.
__init__
)
- user defined slots (e.g.
- construction
- constructor (
tp_init
,tp_new
viatype_call
)
- constructor (
- destruction
- destructor (
tp_dealloc
,tp_finalize
,tp_free
, ...)
- destructor (
- memory layout
PyObject_HEAD
PyObject_VAR_HEAD
PyGC_Head
for GC types
[ Data structure ]
PyObject
ob_refcnt
ob_type
PyVarObject
...
PyTypeObject
...
tp_basicsize
tp_itemsize
[ Demo code to debug into ]
class C:
def __init__(self):
self.x = 0
x = C()
del x
[ Example: defining class (cf. `compiler_class` and `LOAD_BUILD_CLASS` above)]
builtin___build_class__ =>
set metaclass from 1. "metaclass" kwarg; 2. first base class; 3. `PyType_Type`
_PyEval_Vector (call class body closure)
PyObject_VectorcallDict (call metaclass e.g. PyType_Type) => ... =>
type_call =>
type_new (as PyType_Type.tp_new) =>
type_new_get_bases (if no bases, then use `PyBaseObject_Type`)
type_new_impl =>
type_new_init =>
type_new_alloc =>
PyType_GenericAlloc (as metatype->tp_alloc (TODO: how was this setup?))
setup slots e.g.
tp_flags (Py_TPFLAGS_HAVE_GC)
tp_dealloc (subtype_dealloc)
PyType_Ready => type_ready =>
type_ready_set_new (e.g. copy tp_new from base class (e.g. PyBaseObject_Type))
fixup_slot_dispatchers =>
update_one_slot (e.g. for __init__) =>
find_name_in_mro (find "__init__" method in the class)
if found, set `tp_init` to `slot_tp_init`
type_new_init_subclass (something to do with PySuperType?)
type_init (as PyType_Type.tp_new)
[ Example: instantiating user defined class (here `callable` is user defined class)]
PyObject_Vectorcall(callable) => _PyObject_VectorcallTstate => _PyObject_MakeTpCall =>
Py_TYPE(callable)->tp_call(callable, ...) (i.e. PyType_Type.tp_call (type_call) for user defined type) =>
type_call =>
object_new =>
PyType_GenericAlloc (as tp_alloc) =>
_PyType_AllocNoTrack
_PyObject_GC_TRACK (for GC type)
_PyObject_InitializeDict => PyDict_New
slot_tp_init (object_init if not overriden) =>
_PyObject_Call (for user defined "__init__" function)
[ Example: object destructoin ]
TARGET(DELETE_NAME) => PyObject_DelItem (from local namespace dict) => ... => delitem_common =>
Py_DECREF =>
(if refcnt = 0) _Py_Dealloc =>
subtype_dealloc (for heap types) => ... see "memory management"
References
[ Allocation ]
PyType_GenericAlloc (via tp_alloc) =>
_PyType_AllocNoTrack =>
PyObject_Malloc
_PyObject_GC_Link =>
increment `gc_generation.count`
if it exceeds `gc_generation.threshold`, then run `gc_collect_generations` (see below)
_PyObject_GC_TRACK =>
insert PyGC_Head (casted from PyObject) to
`PyInterpreterState._gc_runtime_state.generation0` linked list
[ Deallocation (after refcnt = 0) ]
subtype_dealloc =>
PyObject_GC_UnTrack (remove `PyObject` from the linked list)
_PyObject_FreeInstanceAttributes =>
`Py_XDECREF` for each value of `_PyObject_ValuesPointer`
object_dealloc (as basedealloc) =>
PyObject_GC_Del (as tp_free) =>
decrement `gc_generation.count`
PyObject_Free
[ Garbage collection for container types ]
gc_collect_generations => gc_collect_with_callback => gc_collect_main =>
deduce_unreachable(base, ...) =>
update_refs
subtract_refs =>
run tp_traverse with visit_decref.
the objects with refcnt = 0 are not reachable from "outside".
note that those might still be reachable from other `base` which is reachable from "outside".
move_unreachable => ...
finalize_garbage
handle_resurrected_objects
delete_garbage
[parent thread]
Thread.start (py) => _thread.start_new_thread (with `_bootstrap` callback)=>
thread_PyThread_start_new_thread (c) =>
PyMem_NEW (allocate `bootstate`)
_PyThreadState_Prealloc => new_threadstate (allocate new PyThreadState)
PyThread_start_new_thread (with `thread_run` callback) => pthread_create
[child thread]
thread_run (c) =>
setup PyThreadState (e.g. set thread_id)
_PyThreadState_SetCurrent
PyEval_AcquireThread =>
take_gil (see `interpreter startup` above)
_PyThreadState_Swap
PyObject_Call (python callable `_bootstrap`) =>
_bootstrap (py) => _bootstrap_inner =>
_set_tstate_lock =>
(c) thread__set_sentinel =>
newlockobject
setup PyThreadState.on_delete_data for release_sentinel
set self to global `_active` dict
run (run callable given by a user)
_delete (remove self from global `_active` dict)
PyThreadState_Clear =>
PyThreadState.on_delete => release_sentinel (release lock)
[parent thread]
Thread.join (py) => _wait_for_tstate_lock =>
self._tstate_lock.acquire =>
(c) lock_PyThread_acquire_lock => acquire_timed =>
Py_BEGIN_ALLOW_THREADS ~> PyEval_SaveThread =>
_PyThreadState_Swap (set global `tstate_current` to NULL)
drop_gil =>
set locked = 0
notify waiting thread on `take_gil` via condition variable
PyThread_acquire_lock_timed => sem_clockwait (this should block until `release_sentinel` in child thread above)
Py_END_ALLOW_THREADS ~> PyEval_RestoreThread =>
take_gil => ...
_PyThreadState_Swap (restore `tstate_current` as before)
References
- https://www.python.org/dev/peps/pep-0255/ (yield)
- https://www.python.org/dev/peps/pep-0380/ (yield from and StopIteration)
-
The value of the yield from expression is the first argument to the StopIteration exception raised by the iterator when it terminates.
-
return expr in a generator causes StopIteration(expr)
-
[initialization]
_Py_MakeCoro => ...
[next (i.e. send(None))]
gen_iternext (PyGen_Type.tp_iternext) =>
gen_send_ex2 =>
_PyFrame_StackPush (push "sent" value to stacktop)
_PyEval_EvalFrame (will resume from `InterpreterFrame.f_lasti` from the last "yield")
(if result but not _PyFrameHasCompleted)
return PYGEN_NEXT
(otherwise)
clear frame and return PYGEN_RETURN
(if PYGEN_RETURN)
_PyGen_SetStopIterationValue
Py_CLEAR(result) (i.e. `retval` is NULL)
[yield]
TARGET(YIELD_VALUE) =>
retval = POP()
set InterpreterFrame.f_state to FRAME_SUSPENDED
save current stack_pointer as InterpreterFrame.stacktop
goto exiting
[yield from]
TARGET(YIELD_FROM) =>
(if tp_iternext defined)
type.tp_iternext => ...
(otherwise)
_PyObject_CallMethodIdOneArg(... "send" ...)
(if PYGEN_RETURN (i.e. `retval == NULL` and _PyGen_FetchStopIterationValue))
SET_TOP(retval)
DISPATCH => ...
(otherwise PYGEN_NEXT)
decrement InterpreterFrame.f_lasti to run the same byte code again
goto exiting;
References
- https://github.com/python/cpython/tree/main/Lib/asyncio
- https://github.com/python/cpython/blob/main/Modules/_asynciomodule.c (Future, Task implemented in C for optimization)
- https://github.com/python/cpython/blob/main/Objects/genobject.c
- https://www.python.org/dev/peps/pep-3156/
- https://www.python.org/dev/peps/pep-0492/
Examples
- grpc python binding
Goals
- GET_AWAITABLE opcode
- "async def" return value
- coroutine to task/future
- leaf of "await" chain (aka Future)
- asyncio.futures.wrap_future
event loop
run initial task (e.g. via task.ensure_future(coroutine)) =>
e.g.
- schedule more tasks via `loop.call_soon`
- yield from ... (down the road, it will reach `await <future>` whose callback will resume this generator)
- return value (internally via StopIteration)
run next scheduled task from initial task => ...
...
- main event loop
[top level]
asyncio.run =>
events.new_event_loop =>
get_event_loop_policy => _init_event_loop_policy =>
import DefaultEventLoopPolicy (i.e _UnixDefaultEventLoopPolicy)
_UnixDefaultEventLoopPolicy.new_event_loop =>
_UnixSelectorEventLoop.__init__ => BaseSelectorEventLoop.__init__ => selectors.DefaultSelector
_UnixSelectorEventLoop.run_until_complete => BaseEventLoop.run_until_complete =>
tasks.ensure_future =>
(if coroutine) BaseEventLoop.create_task => tasks.Task =>
BaseEventLoop.call_soon(Task.__step) => _call_soon =>
_ready.append(events.Handle(...))
BaseEventLoop.run_forever =>
events._set_running_loop
(while True) _run_once =>
(...compute select timeout based on _scheduled...)
DefaultSelector.select(timeout)
BaseSelectorEventLoop._process_events => BaseEventLoop._add_callback => _ready.append
(...move _scheduled to _ready if timeout exceeded)
(iterate over _ready) Handle._run => Context.run => ???
Task
(coroutine asFuture
)
# python implementation
[Task.__step]
coro.send (i.e. progressing generator frame)
(if StopIteration exception)
Future.set_result
(if yielded result with "_asyncio_future_blocking" (i.e. met with `Future.__await__` at the end of await chain))
_asyncio_future_blocking = False
Future.add_done_callback(Task.__wakeup ...)
[Task.__wakeup]
(if future's result is ready) self.__step => ...
Future
(a leaf ofawait
chain)
[Future.__await__]
self._asyncio_future_blocking = True
yield self (see above for how `Task.__step` handles yielded result)
return self.result() (this can return "resolved" result since this generator resumes via `Task.__wakeup` called from this future's `done_callback`)
[initialize coroutine]
make_coro => ... make_gen(&PyCoro_Type, ...)
[yield from coroutine (aka await) cf. `TARGET(YIELD_FROM)` above]
"send" => gen_send => ...
[Main thread]
BaseEventLoop.run_in_executor (returns asyncio.Future)=>
concurrent.futures.ThreadPoolExecutor.submit (returns concurrent.futures.Future)
wrap_future =>
BaseEventLoop.create_future (i.e. asyncio.Future)
_chain_future (concurrent.futures.Future to asyncio.Future) =>
concurrent.futures.Future.add_done_callback(_call_set_state)
(callback will be executed in ThreadPoolExecutor's worker)
[Worker thread of ThreadPoolExecutor]
_call_set_state => BaseEventLoop.call_soon_threadsafe(_set_state) (mostly equivalent to call_soon)
[Main thread]
_set_state => ... => asyncio.Future.set_result(concurrent.futures.Future.result())