aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Implement 6D parallelization with 1D and no tilingMarat Dukhan2020-12-054-0/+555
|
* Use __STDC_NO_ATOMICS__ to detect C11 compilers without stdatomic.hMarat Dukhan2020-12-051-1/+1
| | | | Replace MSVC-specific check from #10
* Support pre-C11 GCC intrinsics for atomicsMarat Dukhan2020-12-051-0/+104
|
* Fix MSVC build (#10)peterjc1232020-10-051-1/+1
| | | Fix MSVC build
* Use cpuinfo_get_current_uarch_index_with_default for parallelization with uarchMarat Dukhan2020-05-262-12/+12
| | | | index
* 3D/4D/5D parallelization functions with 1D or no tilingMarat Dukhan2020-05-264-0/+1333
|
* Guard against generating ARM yield instruction for unsupporting processorsMarat Dukhan2020-05-161-1/+1
|
* Reorder C11 atomics before MSVC x64 atomicsMarat Dukhan2020-05-081-80/+80
| | | | clang-cl, which supports both, should prefer C11 atomics
* Use platform-specific yield/pause instructionsMarat Dukhan2020-05-084-20/+38
|
* MSVC-compatible FPU state functionsMarat Dukhan2020-05-071-9/+26
|
* Thumb-1 compatible assembly for disable_fpu_denormalsMarat Dukhan2020-05-071-6/+15
|
* Avoid including stdatomic.h in any WAsm buildsMarat Dukhan2020-05-041-1/+1
|
* Fast path using atomic decrement instead of atomic compare-and-swapMarat Dukhan2020-05-023-37/+998
| | | | 50% higher throughput on x86 (disabled on other platforms)
* Reorder C11 atomics before MSVC atomicsMarat Dukhan2020-04-221-117/+117
| | | | clang-cl, which supports both, should prefer C11 atomics
* Recognize Cygwin as WindowsMarat Dukhan2020-04-161-1/+1
|
* Use load-acquire + store-release on synchronization variablesMarat Dukhan2020-04-143-21/+149
| | | | | Synchronization using relaxed atomics + fences instead of LA/SR violates C11/C++11 memory model and cause failures under thread sanitizer
* Support Windows on ARM/ARM64Marat Dukhan2020-04-103-13/+197
|
* Replace atomic fetch_sub with decrement_fetch primitiveMarat Dukhan2020-04-104-32/+28
| | | | Decrement-fetch is a closer match to the primitive used in implementation
* Add compiler barriers to MSVC atomics implementationMarat Dukhan2020-04-101-2/+6
|
* Fix race condition in Windows implementationMarat Dukhan2020-04-101-5/+13
| | | | | | | The command event for the next command must be reset before write-release of the new command, because as soon as the worker threads observe the new command, they may complete it and switch to waiting on the next command event
* Rewrite work spreading between threadsMarat Dukhan2020-04-107-70/+57
| | | | | | | - Avoid word x word -> doubleword multiplication - Avoid doubleword / word -> word division - Replace remaining division with multiplication via FXdiv - Improve portability through removal of platform-dependent multiply_divide function
* Direct implementation pthreadpool_try_decrement_relaxed_size_tMarat Dukhan2020-04-101-59/+62
| | | | | | Replace implementation of pthreadpool_try_decrement_relaxed_size_t on top of emulated pthreadpool_compare_exchange_weak_relaxed_size_t with a direct implementation using platform intrinsics
* Return static thread pool pointer in shim implementationMarat Dukhan2020-04-101-0/+10
| | | | Makes pthreadpool tests pass in WebAssembly builds
* Minor fixes in Windows implementationMarat Dukhan2020-04-071-2/+1
|
* Windows implementation using EventsMarat Dukhan2020-04-0710-32/+630
|
* Fix erroneous narrowing in pthreadpool_fetch_sub_relaxed_size_tMarat Dukhan2020-04-051-4/+4
|
* Optimized pthreadpool_parallelize_* functionsMarat Dukhan2020-04-052-399/+1032
| | | | | Eliminate function call and division per each processed item in the multi-threaded case
* Implementation using Grand Central DispatchMarat Dukhan2020-04-013-5/+176
|
* Refactor pthreadpool implementationMarat Dukhan2020-04-019-717/+826
| | | | | | Split implementation into two types of components: - Components dependent on threading API - Portable components
* Remove unused per-thread wakeup_condvarMarat Dukhan2020-04-011-5/+0
|
* Microarchitecture-aware parallelization functionsMarat Dukhan2020-03-262-10/+497
|
* Refactor multi-threaded case of parallelization functionsMarat Dukhan2020-03-261-105/+142
| | | | | - Extract multi-threaded setup logic into a generalized pthreadpool_parallelize function - Call into pthreadpool_parallelize directly from tiled and 2+-dimensional functions
* Implement atomic_decrement with LL-SC on ARM/ARM64Marat Dukhan2020-03-231-7/+19
|
* Minor refactoring in pthreadpool_destroyMarat Dukhan2020-03-231-5/+6
|
* Fix race conditions in non-futex implementationMarat Dukhan2020-03-231-11/+20
|
* Futex-based WebAssembly+Threads implementationMarat Dukhan2020-03-231-1/+23
|
* Support WebAssembly+Threads buildMarat Dukhan2020-03-232-63/+240
| | | | | | - Abstract away atomic operations and data type from the source file - Polyfill atomic operations for Clang targeting WAsm+Threads - Set Emscripten link options for WebAssembly+Threads builds
* Remove redundant barriersMarat Dukhan2020-03-231-15/+13
|
* Simplify parallel task initializationMarat Dukhan2020-03-231-4/+8
|
* Avoid spinning thread-pool when task has the only itemMarat Dukhan2020-03-231-9/+9
|
* Remove Native Client supportMarat Dukhan2020-03-051-22/+0
|
* PTHREADPOOL_FLAG_YIELD_WORKERS flag to bypass spin-waitMarat Dukhan2020-03-051-12/+16
| | | | | Makes it possible to signal the last operation in a sequence of computations, so pthreadpool workers don't spin in vain.
* Minor cleanupMarat Dukhan2020-03-051-4/+4
|
* Build on Windows/mingw64 (#6)mattn2020-03-011-2/+27
| | | Support Windows/mingw64 build
* Switch to C11 atomics to synchronizationMarat Dukhan2019-10-191-172/+185
|
* Make inline assembly compatible with old toolchainMarat Dukhan2019-10-081-1/+1
| | | Fix #4
* Fix typo in commentMarat Dukhan2019-09-301-1/+1
|
* Enable spin-wait in the main threadMarat Dukhan2019-09-301-7/+24
|
* New pthreadpool_parallelize_* APIMarat Dukhan2019-09-304-145/+830
|
* Enable spin-wait in worker threadsMarat Dukhan2019-09-301-18/+43
|