| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In 32 bit, the va space can be exhausted very easily. Change the
jemalloc defaults to only retain for 64 bit to avoid this exhaustion.
The performance of traces does get slightly worse, but most stay about
the same.
This should only affect devices that use malloc svelte, all other
devices are on scudo.
Bug: 142556796
Bug: 140079007
Test: Ran traces and verified va space is much lower.
Test: Compared benchmarks with and without retaining.
Test: Ran bionic unit tests.
Test: Ran jemalloc tests.
Test: Ran malloc stress tests.
Change-Id: Iaec8276582f880145a1ca5ebbaa65789f46d2bf2
|
|
|
|
|
|
|
|
|
|
|
| |
Removing the stats make the whole cache structure fit in a single page.
Bug: 131362671
Test: Verified that all bionic malloc benchmarks are still the same.
Test: It turns out that the malloc_sql benchmarks seem to get faster.
Test: Verified that after this change, it saves about 2K PSS per thread.
Change-Id: I4dcd633543f05f1a9d47db175f9977ddb42188a9
|
|
|
|
|
|
|
|
|
| |
We don't currently use this and it causes libc.a to have a dependency
on libdl because it interposes pthread_create with dlsym.
Test: treehugger
Bug: None
Change-Id: I259ed5eb8e72045430aee90df1124c1906512fcd
|
|
|
|
|
|
|
|
|
| |
Change the minimum size of a map to make sure that the entropy of
allocations is no worse than jemalloc4. At a future point, we should see
how increasing entropy affects performance.
Test: All unit tests pass.
Change-Id: I644f4fc84fa4ad80ce37fecbe48accbd6cb1034e
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for svelte.
Add je_iterate support.
Update some of the internals so that bad pointers in je_iterate do not
crash.
Test: Ran new bionic unit tests, ran libmemunreachable tests, booted system.
Change-Id: I04171cf88df16d8dc2c2ebb60327e58b915b9d83
|
|
|
|
|
|
|
|
|
| |
Bug: 62621531
Bug: 110158834
Test: Ran unit tests and benchmarks using libc.
Change-Id: Ie13ab8510c42f96b58496b0ab7e4f8c3a9cd2c6d
|
|
|
|
|
|
|
|
| |
This does not add any android specific changes. Those will come in a
follow-up cl.
Test: Builds, and all unit tests pass on a hikey.
Change-Id: Ibac11b324afeac93a0c93d19689be48458d56f56
|
| |
|
|
|
|
|
| |
This dodges a warning emitted by the FreeBSD system gcc when compiling
libc for architectures which don't use clang as the system compiler.
|
|
|
|
|
| |
Looking at the thread counts in our services, jemalloc's background thread
is useful, but mostly idle. Add a config option to tune down the number of threads.
|
|
|
|
|
|
| |
szind and slab bits are read on fast path, where compiler generated two memory
loads separately for them before this diff. Manually operate on the bits to
avoid the extra memory load.
|
|
|
|
| |
Add cast since read / write has unsigned return type on windows.
|
| |
|
| |
|
|
|
|
| |
This is needed for things like mutex stats in table mode.
|
|
|
|
|
|
| |
The emitter can be used to produce structured json or tabular output. For now
it has no uses; in subsequent commits, I'll begin transitioning stats printing
code over.
|
|
|
|
|
|
|
|
| |
"always" marks all user mappings as MADV_HUGEPAGE; while "never" marks all
mappings as MADV_NOHUGEPAGE. The default setting "default" does not change any
settings. Note that all the madvise calls are part of the default extent hooks
by design, so that customized extent hooks have complete control over the
mappings including hugepage settings.
|
| |
|
|
|
|
|
| |
GCC on its own isn't quite able to turn the ticker subtract into a memory
operation followed by a js.
|
| |
|
|
|
|
|
|
|
|
| |
On glibc and Android's bionic, strerror_r returns char* when
_GNU_SOURCE is defined.
Add a configure check for this rather than assume glibc is the
only libc that behaves this way.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
The arena-associated stats are now all prefixed with arena_stats_, and live in
their own file. Likewise, malloc_bin_stats_t -> bin_stats_t, also in its own
file.
|
|
|
|
| |
This lives in the cache_bin module; just a typo.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
In the process, kill arena_bin_index, which is unused. To follow are several
diffs continuing this separation.
|
|
|
|
|
|
|
|
|
| |
When purging, large allocations are usually the ones that cross the npages_limit
threshold, simply because they are "large". This means we often leave the large
extent around for a while, which has the downsides of: 1) high RSS and 2) more
chance of them getting fragmented. Given that they are not likely to be reused
very soon (LRU), let's over purge by 1 extent (which is often large and not
reused frequently).
|
| |
|
|
|
|
|
|
|
| |
According to the RISC-V toolchain conventions, __riscv__ is the old
spelling of this definition. __riscv should be used going forward.
https://github.com/riscv/riscv-toolchain-conventions#cc-preprocessor-definitions
|
|
|
|
|
| |
When coalescing, we should take both extents off the LRU list; otherwise decay
can grab the existing outer extent through extents_evict.
|
|
|
|
|
|
|
|
|
|
| |
When allocating from dirty extents (which we always prefer if available), large
active extents can get split even if the new allocation is much smaller, in
which case the introduced fragmentation causes high long term damage. This new
option controls the threshold to reuse and split an existing active extent. We
avoid using a large extent for much smaller sizes, in order to reduce
fragmentation. In some workload, adding the threshold improves virtual memory
usage by >10x.
|
|
|
|
|
|
| |
While working on #852, I noticed the prng state is atomic. This is the only
atomic use of prng in all of jemalloc. Instead, use a threadlocal prng
state if possible to avoid unnecessary cache line contention.
|
|
|
|
|
| |
Added proper synchronization for switching to using THP in auto mode. Also
fixed stats for number of THPs used.
|
|
|
|
|
|
| |
Added an upper bound on how many pages we can decay during the current run.
Without this, decay could have unbounded increase in stashed, since other
threads could add new pages into the extents.
|
|
|
|
|
|
|
| |
This option controls the max size when grow_retained. This is useful when we
have customized extent hooks reserving physical memory (e.g. 1G huge pages).
Without this feature, the default increasing sequence could result in fragmented
and wasted physical memory.
|
|
|
|
|
|
| |
We observed that arena 0 can have much more metadata allocated comparing to
other arenas. Tune the auto mode to only switch to huge page on the 5th block
(instead of 3 previously) for a0.
|
|
|
|
|
| |
Currently, this is unused (i.e. all extents are always marked dumpable). In the
future, we'll begin using this functionality.
|
|
|
|
| |
This will, eventually, enable us to avoid dumping eden regions.
|
| |
|
|
|
|
|
|
| |
On x86 Linux, we define our own MADV_FREE if madvise(2) is available, but no
MADV_FREE is detected. This allows the feature to be built in and enabled with
runtime detection.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quoting from https://github.com/jemalloc/jemalloc/issues/761 :
[...] reading the Power ISA documentation[1], the assembly in [the CPU_SPINWAIT
macro] isn't correct anyway (as @marxin points out): the setting of the
program-priority register is "sticky", and we never undo the lowering.
We could do something similar, but given that we don't have testing here in the
first place, I'm inclined to simply not try. I'll put something up reverting the
problematic commit tomorrow.
[1] Book II, chapter 3 of the 2.07B or 3.0B ISA documents.
|
|
|
|
|
|
|
|
|
|
| |
There does not seem to be any overlap between usage of
extent_avail and extent_heap, so we can use the same hook.
The only remaining usage of rb trees is in the profiling code,
which has some 'interesting' iteration constraints.
Fixes #888
|
|
|
|
| |
Dodge a name-conflict with the math.h logarithm function. D'oh.
|
|
|
|
|
|
|
| |
In userspace ARM on Linux, zero-ing the high bits is the correct way to do this.
This doesn't fix the fact that we currently set LG_VADDR to 48 on ARM, when in
fact larger virtual address sizes are coming soon. We'll cross that bridge when
we come to it.
|