cpython

Author	SHA1	Message	Date
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Victor Stinner	b99db92dde	gh-139653: Add PyUnstable_ThreadState_SetStackProtection() (#139668 ) Add PyUnstable_ThreadState_SetStackProtection() and PyUnstable_ThreadState_ResetStackProtection() functions to set the stack base address and stack size of a Python thread state. Co-authored-by: Petr Viktorin <encukou@gmail.com>	2025-11-13 17:30:50 +01:00
Neil Schemenauer	c98c5b3449	gh-131253: free-threaded build support for pystats (gh-137189) Allow the --enable-pystats build option to be used with free-threading. The stats are now stored on a per-interpreter basis, rather than process global. For free-threaded builds, the stats structure is allocated per-thread and then periodically merged into the per-interpreter stats structure (on thread exit or when the reporting function is called). Most of the pystats related code has be moved into the file Python/pystats.c.	2025-11-03 11:36:37 -08:00
Peter Bierma	2cefa70eb9	gh-140544: Always assume that thread locals are available (GH-140690) Python has required thread local support since 3.12 (see GH-103324). By assuming that thread locals are always supported, we can improve the performance of third-party extensions by allowing them to access the attached thread and interpreter states directly.	2025-10-28 09:07:19 -04:00
alm	1753ccb432	gh-138050: [WIP] JIT - Streamline MAKE_WARM - move coldness check to executor creation (GH-138240)	2025-10-27 16:37:37 +00:00
Kumar Aditya	ef4665f918	gh-140544: store pointer to interpreter state as a thread local for fast access (#140573 )	2025-10-25 19:56:07 +05:30
Kumar Aditya	ebf9938496	gh-140544: cleanup `HAVE_THREAD_LOCAL` checks in pystate.c (#140547 )	2025-10-24 14:23:06 +00:00
Shamil	a615fb49c9	gh-140301: Fix memory leak in subinterpreter `PyConfig` cleanup (#140303 ) Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-10-20 09:29:23 +00:00
Kumar Aditya	58c44c2bf2	gh-140067: Fix memory leak in sub-interpreter creation (#140111 ) (#140261 ) Fix memory leak in sub-interpreter creation caused by overwriting of the previously used `_malloced` field. Now the pointer is stored in the first word of the memory block to avoid it being overwritten accidentally. Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-10-18 16:36:58 +05:30
Shamil	c8729c9909	gh-140257: fix data race on eval_breaker during finalization (#140265 )	2025-10-18 16:31:53 +05:30
Peter Bierma	0bcb1c25f7	Revert "gh-140067: Fix memory leak in sub-interpreter creation (#140111 )" (#140140 ) This reverts commit `59547a251f`.	2025-10-15 07:16:43 +05:30
Shamil	59547a251f	gh-140067: Fix memory leak in sub-interpreter creation (#140111 ) Fix memory leak in sub-interpreter creation caused by overwriting of the previously used `_malloced` field. Now the pointer is stored in the first word of the memory block to avoid it being overwritten accidentally. Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-10-14 14:42:17 +00:00
Peter Bierma	9243a4b933	gh-126016: Remove bad assertion in `PyThreadState_Clear` (GH-139158) In the _interpreters module, we use PyEval_EvalCode() to run Python code in another interpreter. However, when the process receives a KeyboardInterrupt, PyEval_EvalCode() will jump straight to finalization rather than returning. This prevents us from cleaning up and marking the thread as "not running main", which triggers an assertion in PyThreadState_Clear() on debug builds. Since everything else works as intended, remove that assertion.	2025-09-19 12:17:05 +00:00
Peter Bierma	2191497933	gh-136003: Execute pre-finalization callbacks in a loop (GH-136004)	2025-09-18 08:29:12 -04:00
Donghee Na	d873fb42f3	gh-137838: Move _PyUOpInstruction buffer to PyInterpreterState (gh-138918)	2025-09-17 18:50:16 +01:00
Hood Chatham	2629ee4eb0	gh-128627: Use __builtin_wasm_test_function_pointer_signature for Emscripten trampoline (#137470 ) With https://github.com/llvm/llvm-project/pull/150201 being merged, there is now a better way to generate the Emscripten trampoline, instead of including hand-generated binary WASM content. Requires Emscripten 4.0.12.	2025-09-17 15:33:55 +01:00
Sam Gross	90fe3250f8	gh-137433: Fix deadlock with stop-the-world and daemon threads (gh-137735) There was a deadlock originally seen by Memray when a daemon thread enabled or disabled profiling while the interpreter was shutting down. I think this could also happen with garbage collection, but I haven't seen that in practice. The daemon thread could be hung while trying acquire the global rwmutex that prevents overlapping global and per-interpreter stop-the-world events. Since it already held the main interpreter's stop-the-world lock, it also deadlocked the main thread, which is trying to perform interpreter finalization. Swap the order of lock acquisition to prevent this deadlock. Additionally, refactor `_PyParkingLot_Park` so that the global buckets hashtable is left in a clean state if the thread is hung in `PyEval_AcquireThread`.	2025-09-16 09:21:58 +01:00
Kumar Aditya	b9bcd02e9f	gh-137384: fix crash when accessing warnings state late in runtime shutdown (#138027 )	2025-08-22 19:10:43 +05:30
Mark Shannon	a8d9d94784	GH-137959: Replace shim code in jitted code with a single trampoline function. (GH-137961)	2025-08-21 10:40:53 +01:00
Sam Gross	a10152f8fd	gh-137400: Fix thread-safety issues when profiling all threads (gh-137518) There were a few thread-safety issues when profiling or tracing all threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads: * The loop over thread states could crash if a thread exits concurrently (in both the free threading and default build) * The modification of `c_profilefunc` and `c_tracefunc` wasn't thread-safe on the free threading build.	2025-08-13 14:15:12 -04:00
Peter Bierma	082f370cdd	gh-137514: Add a free-threading wrapper for mutexes (GH-137515) Add `FT_MUTEX_LOCK`/`FT_MUTEX_UNLOCK`, which call `PyMutex_Lock` and `PyMutex_Unlock` on the free-threaded build, and no-op otherwise.	2025-08-07 11:24:50 -04:00
Mark Shannon	e7b55f564d	GH-136410: Faster side exits by using a cold exit stub (GH-136411)	2025-08-01 16:26:07 +01:00
Kumar Aditya	f183996eb7	gh-136870: fix data race in `PyThreadState_Clear` on `sys_tracing_threads` (#136951 ) In free-threading, multiple threads can be cleared concurrently as such the modifications on `sys_tracing_threads` should be done while holding the profile lock, otherwise it can race with other threads setting up profiling.	2025-07-21 20:35:25 +00:00
Eric Snow	269e19e0a7	gh-132775: Fix Interpreter.call() __main__ Visibility (gh-135595) As noted in the new tests, there are a few situations we must carefully accommodate for functions that get pickled during interp.call(). We do so by running the script from the main interpreter's __main__ module in a hidden module in the other interpreter. That hidden module is used as the function __globals__.	2025-06-17 13:16:59 -06:00
Pablo Galindo Salgado	42b25ad4d3	gh-91048: Refactor and optimize remote debugging module (#134652 ) Completely refactor Modules/_remote_debugging_module.c with improved code organization, replacing scattered reference counting and error handling with centralized goto error paths. This cleanup improves maintainability and reduces code duplication throughout the module while preserving the same external API. Implement memory page caching optimization in Python/remote_debug.h to avoid repeated reads of the same memory regions during debugging operations. The cache stores previously read memory pages and reuses them for subsequent reads, significantly reducing system calls and improving performance. Add code object caching mechanism with a new code_object_generation field in the interpreter state that tracks when code object caches need invalidation. This allows efficient reuse of parsed code object metadata and eliminates redundant processing of the same code objects across debugging sessions. Optimize memory operations by replacing multiple individual structure copies with single bulk reads for the same data structures. This reduces the number of memory operations and system calls required to gather debugging information from the target process. Update Makefile.pre.in to include Python/remote_debug.h in the headers list, ensuring that changes to the remote debugging header force proper recompilation of dependent modules and maintain build consistency across the codebase. Also, make the module compatible with the free threading build as an extra :) Co-authored-by: Łukasz Langa <lukasz@langa.pl>	2025-05-25 20:19:29 +00:00
Peter Bierma	b8998fe2d8	gh-131185: Use a proper thread-local for cached thread states (gh-132510) Switches over to a _Py_thread_local in place of autoTssKey, and also fixes a few other checks regarding PyGILState_Ensure after finalization. Note that this doesn't fix concurrent use of PyGILState_Ensure with Py_Finalize; I'm pretty sure zapthreads doesn't work at all, and that needs to be fixed seperately.	2025-05-21 07:01:25 -06:00
Victor Stinner	e79f640eb6	Simplify interp_look_up_id() (#134257 ) Don't use PyInterpreterState_GetID() but get directly the interpreter 'id' member which cannot fail.	2025-05-19 18:09:10 +00:00
b-pass	f2de1e6861	gh-134144: Fix use-after-free in zapthreads() (#134145 )	2025-05-18 20:32:29 +05:30
Mark Shannon	ac7d5ba96e	GH-133231: Changes to executor management to support proposed `sys._jit` module (GH-133287) * Track the current executor, not the previous one, on the thread-state. * Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.	2025-05-04 10:05:35 +01:00
Mark Shannon	44e4c479fb	GH-124715: Move trashcan mechanism into `Py_Dealloc` (GH-132280)	2025-04-30 11:37:53 +01:00
Mark Shannon	ccf1b0b1c1	GH-132508: Use tagged integers on the evaluation stack for the last instruction offset (GH-132545)	2025-04-29 18:00:35 +01:00
Petr Viktorin	0c26dbd16e	gh-133079: Remove Py_C_RECURSION_LIMIT & PyThreadState.c_recursion_remaining (GH-133080) Both were added in 3.13, are undocumented, and don't make sense in 3.14 due to changes in the stack overflow detection machinery (gh-112282). PEP 387 exception for skipping a deprecation period: https://github.com/python/steering-council/issues/288	2025-04-29 12:56:20 +02:00
Eric Snow	fe462f5a91	gh-132775: Drop PyUnstable_InterpreterState_GetMainModule() (gh-132978) We replace it with _Py_GetMainModule(), and add _Py_CheckMainModule(), but both in the internal-only C-API. We also add _PyImport_GetModulesRef(), which is the equivalent of _PyImport_GetModules(), but which increfs before the lock is released. This is used by a later change related to pickle and handling __main__.	2025-04-28 12:46:22 -06:00
Bénédikt Tran	427e7fc099	gh-132399: ensure correct alignment of `PyInterpreterState` (#132428 )	2025-04-19 11:03:06 +02:00
Victor Stinner	61317074d4	gh-131238: Add pycore_interpframe_structs.h header (#131553 ) Add an explicit include to pycore_interpframe_structs.h in pycore_runtime_structs.h to fix a dependency cycle.	2025-03-21 17:19:47 +00:00
Victor Stinner	b69da006a4	gh-131238: Remove includes from pycore_interp.h (#131495 ) Remove also now unused includes in C files.	2025-03-20 11:35:23 +00:00
Victor Stinner	20c5f969dd	gh-131238: Remove more includes from pycore_interp.h (#131480 )	2025-03-19 23:01:32 +01:00
Victor Stinner	22706843e0	gh-131238: Remove many includes from pycore_interp.h (#131472 )	2025-03-19 17:46:24 +00:00
Victor Stinner	0453e494b6	gh-131238: Convert pycore_pystate.h static inline to functions (#131352 ) Convert static inline functions to functions: * _Py_IsMainThread() * _PyInterpreterState_Main() * _Py_IsMainInterpreterFinalizing() * _Py_GetMainConfig()	2025-03-17 12:31:55 +01:00
Mark Shannon	a1aeec61c4	GH-131238: Core header refactor (GH-131250) * Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h * Removes many cross-header dependencies	2025-03-17 09:19:04 +00:00
Sam Gross	052cb717f5	gh-124878: Fix race conditions during interpreter finalization (#130649 ) The PyThreadState field gains a reference count field to avoid issues with PyThreadState being a dangling pointer to freed memory. The refcount starts with a value of two: one reference is owned by the interpreter's linked list of thread states and one reference is owned by the OS thread. The reference count is decremented when the thread state is removed from the interpreter's linked list and before the OS thread calls `PyThread_hang_thread()`. The thread that decrements it to zero frees the `PyThreadState` memory. The `holds_gil` field is moved out of the `_status` bit field, to avoid a data race where on thread calls `PyThreadState_Clear()`, modifying the `_status` bit field while the OS thread reads `holds_gil` when attempting to acquire the GIL. The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a possible value. This corresponds to the `_PyThreadState_MustExit()` check. This avoids race conditions in the free threading build when checking `_PyThreadState_MustExit()`.	2025-03-06 10:38:34 -05:00
Mark Shannon	78d50e91ff	GH-127705: better double free message. (GH-130785) * Add location information when accessing already closed stackref * Add #def option to track closed stackrefs to provide precise information for use after free and double frees.	2025-03-05 14:00:42 +00:00
Sam Gross	7aeaa5af2c	gh-130091: Reorder `_PyThreadState_Attach` to avoid data race (gh-130092) This moves `tstate_activate()` down to avoid a data race in the free threading build on the `_PyRuntime`'s thread-local `autoTSSkey`. This key is deleted during runtime finalization, which may happen concurrently with a call to `_PyThreadState_Attach`. The earlier `tstate_try/wait_attach` ensures that the thread is blocked before it attempts to access the deleted `autoTSSkey`. This fixes a TSAN reported data race in `test_threading.test_import_from_another_thread`.	2025-02-27 13:57:19 -05:00
Sam Gross	d027787c8d	gh-130421: Fix data race on timebase initialization (gh-130592) Windows and macOS require precomputing a "timebase" in order to convert OS timestamps into nanoseconds. Retrieve and compute this value during runtime initialization to avoid data races when accessing the time.	2025-02-27 13:27:54 +00:00
Mark Shannon	014223649c	GH-130396: Use computed stack limits on linux (GH-130398) * Implement C recursion protection with limit pointers for Linux, MacOS and Windows * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow	2025-02-25 09:24:48 +00:00
Petr Viktorin	ef29104f7d	GH-91079: Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now (GH130413) Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now Unfortunatlely, the change broke some buildbots. This reverts commit `2498c22fa0`.	2025-02-24 11:16:08 +01:00
Sam Gross	ca22147547	gh-111924: Fix data races when swapping allocators (gh-130287) CPython current temporarily changes `PYMEM_DOMAIN_RAW` to the default allocator during initialization and shutdown. The motivation is to ensure that core runtime structures are allocated and freed using the same allocator. However, modifying the current allocator changes global state and is not thread-safe even with the GIL. Other threads may be allocating or freeing objects use PYMEM_DOMAIN_RAW; they are not required to hold the GIL to call PyMem_RawMalloc/PyMem_RawFree. This adds new internal-only functions like `_PyMem_DefaultRawMalloc` that aren't affected by calls to `PyMem_SetAllocator()`, so they're appropriate for Python runtime initialization and finalization. Use these calls in places where we previously swapped to the default raw allocator.	2025-02-20 11:31:15 -05:00
Mark Shannon	2498c22fa0	GH-91079: Implement C stack limits using addresses, not counters. (GH-130007) * Implement C recursion protection with limit pointers * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow	2025-02-19 11:44:57 +00:00
Kumar Aditya	0d68b14a0d	gh-128002: use per threads tasks linked list in asyncio (#128869 ) Co-authored-by: Łukasz Langa <lukasz@langa.pl>	2025-02-06 19:51:07 +01:00
Brandt Bucher	828b27680f	GH-126599: Remove the PyOptimizer API (GH-129194)	2025-01-28 16:10:51 -08:00

1 2 3 4 5 ...

569 Commits