| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | Implement 6D parallelization with 1D and no tiling | Marat Dukhan | 2020-12-05 | 4 | -0/+555 |
| | | |||||
| * | Use __STDC_NO_ATOMICS__ to detect C11 compilers without stdatomic.h | Marat Dukhan | 2020-12-05 | 1 | -1/+1 |
| | | | | | Replace MSVC-specific check from #10 | ||||
| * | Support pre-C11 GCC intrinsics for atomics | Marat Dukhan | 2020-12-05 | 1 | -0/+104 |
| | | |||||
| * | Fix MSVC build (#10) | peterjc123 | 2020-10-05 | 1 | -1/+1 |
| | | | | Fix MSVC build | ||||
| * | Use cpuinfo_get_current_uarch_index_with_default for parallelization with uarch | Marat Dukhan | 2020-05-26 | 2 | -12/+12 |
| | | | | | index | ||||
| * | 3D/4D/5D parallelization functions with 1D or no tiling | Marat Dukhan | 2020-05-26 | 4 | -0/+1333 |
| | | |||||
| * | Guard against generating ARM yield instruction for unsupporting processors | Marat Dukhan | 2020-05-16 | 1 | -1/+1 |
| | | |||||
| * | Reorder C11 atomics before MSVC x64 atomics | Marat Dukhan | 2020-05-08 | 1 | -80/+80 |
| | | | | | clang-cl, which supports both, should prefer C11 atomics | ||||
| * | Use platform-specific yield/pause instructions | Marat Dukhan | 2020-05-08 | 4 | -20/+38 |
| | | |||||
| * | MSVC-compatible FPU state functions | Marat Dukhan | 2020-05-07 | 1 | -9/+26 |
| | | |||||
| * | Thumb-1 compatible assembly for disable_fpu_denormals | Marat Dukhan | 2020-05-07 | 1 | -6/+15 |
| | | |||||
| * | Avoid including stdatomic.h in any WAsm builds | Marat Dukhan | 2020-05-04 | 1 | -1/+1 |
| | | |||||
| * | Fast path using atomic decrement instead of atomic compare-and-swap | Marat Dukhan | 2020-05-02 | 3 | -37/+998 |
| | | | | | 50% higher throughput on x86 (disabled on other platforms) | ||||
| * | Reorder C11 atomics before MSVC atomics | Marat Dukhan | 2020-04-22 | 1 | -117/+117 |
| | | | | | clang-cl, which supports both, should prefer C11 atomics | ||||
| * | Recognize Cygwin as Windows | Marat Dukhan | 2020-04-16 | 1 | -1/+1 |
| | | |||||
| * | Use load-acquire + store-release on synchronization variables | Marat Dukhan | 2020-04-14 | 3 | -21/+149 |
| | | | | | | Synchronization using relaxed atomics + fences instead of LA/SR violates C11/C++11 memory model and cause failures under thread sanitizer | ||||
| * | Support Windows on ARM/ARM64 | Marat Dukhan | 2020-04-10 | 3 | -13/+197 |
| | | |||||
| * | Replace atomic fetch_sub with decrement_fetch primitive | Marat Dukhan | 2020-04-10 | 4 | -32/+28 |
| | | | | | Decrement-fetch is a closer match to the primitive used in implementation | ||||
| * | Add compiler barriers to MSVC atomics implementation | Marat Dukhan | 2020-04-10 | 1 | -2/+6 |
| | | |||||
| * | Fix race condition in Windows implementation | Marat Dukhan | 2020-04-10 | 1 | -5/+13 |
| | | | | | | | | The command event for the next command must be reset before write-release of the new command, because as soon as the worker threads observe the new command, they may complete it and switch to waiting on the next command event | ||||
| * | Rewrite work spreading between threads | Marat Dukhan | 2020-04-10 | 7 | -70/+57 |
| | | | | | | | | - Avoid word x word -> doubleword multiplication - Avoid doubleword / word -> word division - Replace remaining division with multiplication via FXdiv - Improve portability through removal of platform-dependent multiply_divide function | ||||
| * | Direct implementation pthreadpool_try_decrement_relaxed_size_t | Marat Dukhan | 2020-04-10 | 1 | -59/+62 |
| | | | | | | | Replace implementation of pthreadpool_try_decrement_relaxed_size_t on top of emulated pthreadpool_compare_exchange_weak_relaxed_size_t with a direct implementation using platform intrinsics | ||||
| * | Return static thread pool pointer in shim implementation | Marat Dukhan | 2020-04-10 | 1 | -0/+10 |
| | | | | | Makes pthreadpool tests pass in WebAssembly builds | ||||
| * | Minor fixes in Windows implementation | Marat Dukhan | 2020-04-07 | 1 | -2/+1 |
| | | |||||
| * | Windows implementation using Events | Marat Dukhan | 2020-04-07 | 10 | -32/+630 |
| | | |||||
| * | Fix erroneous narrowing in pthreadpool_fetch_sub_relaxed_size_t | Marat Dukhan | 2020-04-05 | 1 | -4/+4 |
| | | |||||
| * | Optimized pthreadpool_parallelize_* functions | Marat Dukhan | 2020-04-05 | 2 | -399/+1032 |
| | | | | | | Eliminate function call and division per each processed item in the multi-threaded case | ||||
| * | Implementation using Grand Central Dispatch | Marat Dukhan | 2020-04-01 | 3 | -5/+176 |
| | | |||||
| * | Refactor pthreadpool implementation | Marat Dukhan | 2020-04-01 | 9 | -717/+826 |
| | | | | | | | Split implementation into two types of components: - Components dependent on threading API - Portable components | ||||
| * | Remove unused per-thread wakeup_condvar | Marat Dukhan | 2020-04-01 | 1 | -5/+0 |
| | | |||||
| * | Microarchitecture-aware parallelization functions | Marat Dukhan | 2020-03-26 | 2 | -10/+497 |
| | | |||||
| * | Refactor multi-threaded case of parallelization functions | Marat Dukhan | 2020-03-26 | 1 | -105/+142 |
| | | | | | | - Extract multi-threaded setup logic into a generalized pthreadpool_parallelize function - Call into pthreadpool_parallelize directly from tiled and 2+-dimensional functions | ||||
| * | Implement atomic_decrement with LL-SC on ARM/ARM64 | Marat Dukhan | 2020-03-23 | 1 | -7/+19 |
| | | |||||
| * | Minor refactoring in pthreadpool_destroy | Marat Dukhan | 2020-03-23 | 1 | -5/+6 |
| | | |||||
| * | Fix race conditions in non-futex implementation | Marat Dukhan | 2020-03-23 | 1 | -11/+20 |
| | | |||||
| * | Futex-based WebAssembly+Threads implementation | Marat Dukhan | 2020-03-23 | 1 | -1/+23 |
| | | |||||
| * | Support WebAssembly+Threads build | Marat Dukhan | 2020-03-23 | 2 | -63/+240 |
| | | | | | | | - Abstract away atomic operations and data type from the source file - Polyfill atomic operations for Clang targeting WAsm+Threads - Set Emscripten link options for WebAssembly+Threads builds | ||||
| * | Remove redundant barriers | Marat Dukhan | 2020-03-23 | 1 | -15/+13 |
| | | |||||
| * | Simplify parallel task initialization | Marat Dukhan | 2020-03-23 | 1 | -4/+8 |
| | | |||||
| * | Avoid spinning thread-pool when task has the only item | Marat Dukhan | 2020-03-23 | 1 | -9/+9 |
| | | |||||
| * | Remove Native Client support | Marat Dukhan | 2020-03-05 | 1 | -22/+0 |
| | | |||||
| * | PTHREADPOOL_FLAG_YIELD_WORKERS flag to bypass spin-wait | Marat Dukhan | 2020-03-05 | 1 | -12/+16 |
| | | | | | | Makes it possible to signal the last operation in a sequence of computations, so pthreadpool workers don't spin in vain. | ||||
| * | Minor cleanup | Marat Dukhan | 2020-03-05 | 1 | -4/+4 |
| | | |||||
| * | Build on Windows/mingw64 (#6) | mattn | 2020-03-01 | 1 | -2/+27 |
| | | | | Support Windows/mingw64 build | ||||
| * | Switch to C11 atomics to synchronization | Marat Dukhan | 2019-10-19 | 1 | -172/+185 |
| | | |||||
| * | Make inline assembly compatible with old toolchain | Marat Dukhan | 2019-10-08 | 1 | -1/+1 |
| | | | | Fix #4 | ||||
| * | Fix typo in comment | Marat Dukhan | 2019-09-30 | 1 | -1/+1 |
| | | |||||
| * | Enable spin-wait in the main thread | Marat Dukhan | 2019-09-30 | 1 | -7/+24 |
| | | |||||
| * | New pthreadpool_parallelize_* API | Marat Dukhan | 2019-09-30 | 4 | -145/+830 |
| | | |||||
| * | Enable spin-wait in worker threads | Marat Dukhan | 2019-09-30 | 1 | -18/+43 |
| | | |||||
