diff options
author | Anton Blanchard <anton@samba.org> | 2015-01-21 12:27:38 +1100 |
---|---|---|
committer | Michael Ellerman <mpe@ellerman.id.au> | 2015-01-23 14:02:55 +1100 |
commit | 15c2d45d17418cc4a712608c78ff3b5f0583d83b (patch) | |
tree | 53e4ee00f5e0b604ee7451ee6e229751043ae0f6 /arch/powerpc/lib/string.S | |
parent | a113de373bcb7651196e29a49483c8e24e1e6aa9 (diff) | |
download | kernel_replicant_linux-15c2d45d17418cc4a712608c78ff3b5f0583d83b.tar.gz kernel_replicant_linux-15c2d45d17418cc4a712608c78ff3b5f0583d83b.tar.bz2 kernel_replicant_linux-15c2d45d17418cc4a712608c78ff3b5f0583d83b.zip |
powerpc: Add 64bit optimised memcmp
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
- For large (at least 32 byte) comparisons that are also 8 byte
aligned, use an unrolled modulo scheduled loop using 8 byte
loads. This is similar to our glibc memcmp.
A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:
baseline: 29.93 s
modified: 1.70 s
Just over 17x faster.
v2: Incorporated some suggestions from Segher:
- Use andi. instead of rdlicl.
- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
and was a relic from a previous version.
- Don't use cr5, we have plans to use that CR field for fast local
atomics.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diffstat (limited to 'arch/powerpc/lib/string.S')
-rw-r--r-- | arch/powerpc/lib/string.S | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S index 1b5a0a09d609..c80fb49ce607 100644 --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -93,6 +93,7 @@ _GLOBAL(strlen) subf r3,r3,r4 blr +#ifdef CONFIG_PPC32 _GLOBAL(memcmp) PPC_LCMPI 0,r5,0 beq- 2f @@ -106,6 +107,7 @@ _GLOBAL(memcmp) blr 2: li r3,0 blr +#endif _GLOBAL(memchr) PPC_LCMPI 0,r5,0 |