34 Commits

Author SHA1 Message Date
Emma Smith
f262297d52 gh-139877: Use PyBytesWriter in pycore_blocks_output_buffer.h (#139976)
Previously, the _BlocksOutputBuffer code creates a list of bytes objects to handle the output data from compression libraries. This ends up being slow due to the output buffer code needing to copy each bytes element of the list into the final bytes object buffer at the end of compression.

The new PyBytesWriter API introduced in PEP 782 is an ergonomic and fast method of writing data into a buffer that will later turn into a bytes object. Benchmarks show that using the PyBytesWriter API is 10-30% faster for decompression across a variety of settings. The performance gains are greatest when the decompressor is very performant, such as for Zstandard (and likely zlib-ng). Otherwise the decompressor can bottleneck decompression and the gains are more modest, but still sizable (e.g. 10% faster for zlib)!

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-10-14 10:03:55 -07:00
AN Long
6393068bde fix some typos (#138977) 2025-09-16 18:33:39 +05:30
Adam Turner
918e3ba6c0 GH-137623: Use an AC decorator for docstring line length enforcement (#137690) 2025-08-18 18:29:00 +01:00
Rogdham
57eab1b8f7 gh-132983: Use `Py_UNREACHABLE in _zstd_load_impl()` (#137320) 2025-08-04 02:36:12 +01:00
Rogdham
676748d4da gh-132983: Fix docstrings in `ZstdDict` (#137321)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
2025-08-03 15:04:45 +00:00
Adam Turner
b6d3242244 GH-136975: Emend a spelling error (algorthm -> algorithm) (#136999) 2025-07-22 13:48:58 +00:00
Emma Smith
4b44b3409a gh-134938: Add set_pledged_input_size() to ZstdCompressor (GH-135010) 2025-06-05 14:31:49 +03:00
Serhiy Storchaka
b595237166 gh-132983: Minor fixes and clean up for the _zstd module (GH-134930) 2025-06-01 11:22:15 +03:00
Jelle Zijlstra
45c6c48afc gh-134885: zstd: Use Py_XSETREF (GH-134886) 2025-05-30 11:30:05 +02:00
Sam James
2f2bee2111 gh-134768: Fix definition of mt_continue_should_break() (#134769)
In 121ed71f4e, mt_continue_should_break
was changed to be guarded by `Py_DEBUG`, but it's used in `compress_mt_continue_lock_held`
with just `assert`, so it needs to be available when `NDEBUG` is undefined
too.

`Py_DEBUG` implies `NDEBUG` is undefined, so we can check just that.

Fixes: 121ed71f4e
2025-05-30 04:42:19 +00:00
Adam Turner
11f7a939de gh-132983: Split `_zstd_set_c_parameters` (#133921) 2025-05-28 14:45:08 +00:00
Adam Turner
f2ce4bbdfd gh-132983: Convert dict_content to take Py_buffer in `ZstdDict()` (#133924) 2025-05-26 14:48:41 +00:00
Emma Smith
973b8f69d3 gh-132983: Make _zstd C code PEP 7 compliant (GH-134605)
Make _zstd C code PEP 7 compliant
2025-05-23 19:03:21 -07:00
Emma Smith
f478331f98 gh-132983: Slightly tweak error messages for _zstd compressor/decompressor options dict (#134601)
Slightly tweak error messages for options dict
2025-05-23 14:51:41 -07:00
Emma Smith
8dbc119719 gh-133885: Use locks instead of critical sections for _zstd (gh-134289)
Move from using critical sections to locks for the (de)compression methods.
Since the methods allow other threads to run, we should use a lock rather
than a critical section.
2025-05-22 23:30:10 -04:00
Emma Smith
fb68776591 gh-132983: Fix refleak in zstd dictionary functions (gh-134459) 2025-05-21 19:09:34 +00:00
Emma Smith
c64a21454b gh-132983: Refactor shared code in train_dict and finalize_dict (GH-134432)
Refactor shared code in train_dict and finalize_dict
2025-05-21 08:53:13 -07:00
Petr Viktorin
e575190abb gh-132983: Call Py_XDECREF rather than PyObject_GC_Del in failed __new__ (GH-133962)
Call Py_XDECREF rather than PyObject_GC_Del in failed __new__

This will call tp_dealloc and clear all members.
2025-05-13 11:11:52 +02:00
Erlend E. Aasland
121ed71f4e gh-132983: Fix compiler warning about unused function `mt_continue_should_break()` (#133947) 2025-05-12 20:23:40 +01:00
Adam Turner
d29ddbd90c gh-132983: Convert zstd `__new__` methods to Argument Clinic (#133860) 2025-05-12 08:51:53 +00:00
Rogdham
878e0fb8b4 gh-132983: Remove leftovers from EndlessZstdDecompressor (#133856)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
2025-05-11 02:04:25 +00:00
Adam Turner
1a87b6e9ae gh-132983: Make zstd types immutable (#133784) 2025-05-10 22:37:17 +00:00
Adam Turner
1a548c0a50 gh-132983: Reduce the size of `_zstdmodule.h` (#133793) 2025-05-10 22:25:22 +01:00
Adam Turner
1978904a2f GH-132983: PEP 7 and Argument Clinic changes for zstd (#133791) 2025-05-10 00:33:45 +00:00
Adam Turner
98e2c3af47 GH-132983: remove empty_bytes from _zstd module state (#133785) 2025-05-09 20:17:12 +00:00
Adam Turner
bbe9c31edc gh-132983: Simplify `_zstd_exec()` (#133775) 2025-05-09 20:15:19 +01:00
Adam Turner
c2a5d4b383 gh-132983: Clean-ups for `_zstd` (#133670) 2025-05-09 15:08:51 +01:00
Adam Turner
bd7c5859c6 GH-132983: Remove subclassing support from zstd types (#133694)
For consistency with ``bz2``, ``lzma``, and ``zlib``.
2025-05-08 18:35:22 +00:00
Adam Turner
6f6f48d289 gh-103092: Support subinterpreters in `_zstd` (#133674) 2025-05-08 19:11:34 +01:00
Rogdham
2cc6de77bd gh-132983: Remove pyzstd in identifiers (#133535) 2025-05-08 01:47:42 +01:00
Adam Turner
f8691901d7 GH-132983: Remove zstd version check in the header file (#133502) 2025-05-06 15:04:50 +03:00
Emma Smith
c273f59fb3 gh-132983: Add the compression.zstd pacakge and tests (#133365)
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
Co-authored-by: Rogdham <contact@rogdham.net>
2025-05-06 01:38:08 +01:00
Adam Turner
e6f8e0a035 GH-132983: Build `_zstd` on Windows (#133366) 2025-05-06 00:58:47 +01:00
Emma Smith
3b4333583f gh-132983: Introduce _zstd bindings module (GH-133027)
* Add _zstd module for https://peps.python.org/pep-0784/

This commit introduces the `_zstd` module, with bindings to libzstd from
the pyzstd project. It also includes the unix build system configuration.
Windows build system support will be integrated independently as it
depends on integration with cpython-source-deps.

* Add _zstd to modules

* Fix path for compression.zstd module

* Ignore _zstd module like _io

* Expand module state macros to improve code quality

Also removes module state references from the classes in the _zstd
module and instead uses PyType_GetModuleState()

* Remove backticks suggested in review

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>

* Use critical sections to lock object state

This should avoid races and deadlocks.

* Remove compress/decompress and mark module as not reliant on the GIL

The `compress`/`decompress` functions will be moved to Python code for simplicity.
C implementations can always be re-added in the future.

Also, mark _zstd as not requiring the GIL.

* Lift critical section to avoid clang warning

* Respond to comments by picnixz

* Call out pyzstd explicitly in license description

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

* Use a much more robust implementation...

... for `get_zstd_state_from_type`

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

* Use PyList_GetItemRef for thread safety purposes

* Use a macro for the minimum supported version

* remove const from primivite types

* Use PyMem_New in another spot

* Simplify error handling in _get_frame_size

* Another simplification of error handling in get_frame_info

* Rename _module_state to mod_state

* Rewrite comment explaining the context of the code

* Add link to pyzstd

* Add TODO about refactoring dict training code

* Use PyModule_AddObjectRef over PyModule_AddObject

PyModule_AddObject is soft-deprecated, so we should use PyModule_AddObjectRef

* Check result of OutputBufferGrow

* Simplify return logic in `add_constant_to_type`

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

* Ignore return value of _zstd_clear()

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

* Remove redundant comments

* Remove __reduce__ from ZstdDict

We should instead document that to pickle a dictionary a user should use
the `.dict_content` attribute.

* Use PyUnicode_FromFormat instead of a buffer

* Don't use C constants/types in error messages

* Make error messages easier to understand for Python users

* Lower minimum required version 1.4.0

* Use casts and make slot function signatures correct

* Be consistent with CPython on const usage

* Make else clauses in line with PEP 7

* Fix over-indented blocks in argument clinic

* Add critical section around ZSTD_DCtx_setParameter

* Add a TODO about refactoring critical sections

* Use Py_UNREACHABLE

* Move bytes operations out of Py_BEGIN_ALLOW_THREADS

* Add TODO about ensuring a lock is held

* Remove asserts that may not be correct

* Add TODO to make ZstdDict and others GC objects

* Make objects GC tracked

* Remove unused include

* Fix some memory issues

* Fix refleaks on module and in ZstdDict

* Update configure to check for ZDICT_finalizeDictionary

* Properly check version in configure

* exit(1) if check fails

* Use AC_RUN_IFELSE

* Use a define() to re-use version check

* Actually properly set _zstd module status based on version

---------

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-05-04 01:29:55 +00:00