summaryrefslogtreecommitdiffstats
path: root/compiler/optimizing/code_generator_x86.h
Commit message (Collapse)AuthorAgeFilesLines
* Move mirror::ArtMethod to nativeMathieu Chartier2015-06-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
* Revert "Revert "Revert "Revert "[optimizing] Improve x86 shifts""""Mark P Mendell2015-05-041-0/+3
| | | | | | | | | This reverts commit 2a7a1d7808f003bea908023ebd11eb442d2fca39. Fix the problem that a long long >> 63 got the wrong answer. The problem was that a shr was used instead of a sar. Change-Id: I0327f79c718016ddec9272a605fc50ec15ec4566
* Refactor InvokeDexCallingConventionVisitor in Optimizing.Roland Levillain2015-04-291-12/+7
| | | | Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
* Merge "Revert "Revert "Optimizing: Fix long-to-fp conversion on x86."""Roland Levillain2015-04-211-1/+3
|\
| * Revert "Revert "Optimizing: Fix long-to-fp conversion on x86.""Roland Levillain2015-04-201-1/+3
| | | | | | | | | | | | | | This reverts commit 386ce406f150645158d6067c4e0a36565aefc44f. Bug: 20413424 Change-Id: I6e93ff132907f2653f1ae12d6676ff2298f62ca1
* | Opt compiler: Implement parallel move resolver without using swap.Zheng Xu2015-04-171-2/+2
|/ | | | | | | | | | | | | | | | | | The algorithm of ParallelMoveResolverNoSwap() is almost the same with ParallelMoveResolverWithSwap(), except the way we resolve the circular dependency. NoSwap() uses additional scratch register to resolve the circular dependency. For example, (0->1) (1->2) (2->0) will be performed as (2->scratch) (1->2) (0->1) (scratch->0). On architectures without swap register support, NoSwap() can reduce the number of moves from 3x(N-1) to (N+1) when there is circular dependency with N moves. And also, NoSwap() algorithm does not depend on architecture register layout information, which means it can support register pairs on arm32 and X/W, D/S registers on arm64 without additional modification. Change-Id: Idf56bd5469bb78c0e339e43ab16387428a082318
* Merge "Revert "[optimizing] Improve x86 parallel moves/swaps""Calin Juravle2015-04-161-1/+0
|\
| * Revert "[optimizing] Improve x86 parallel moves/swaps"Guillaume Sanchez2015-04-151-1/+0
| | | | | | | | | | | | | | | | This reverts commit a5c19ce8d200d68a528f2ce0ebff989106c4a933. This commit introduces a performance regression on CaffeineLogic of 30%. Change-Id: I917e206e249d44e1748537bc1b2d31054ea4959d
* | Merge "Revert "Optimizing: Fix long-to-fp conversion on x86.""Nicolas Geoffray2015-04-131-3/+1
|\ \
| * | Revert "Optimizing: Fix long-to-fp conversion on x86."Nicolas Geoffray2015-04-131-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Test fails on arm. This reverts commit 2d45b4df3838d9c0e5a213305ccd1d7009e01437. Change-Id: Id2864917b52f7ffba459680303a2d15b34f16a4e
* | | Merge "Optimizing: Fix long-to-fp conversion on x86."Roland Levillain2015-04-131-1/+3
|\| |
| * | Optimizing: Fix long-to-fp conversion on x86.Serguei Katkov2015-04-131-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | long-to-fp conversion implemented using SSE loses the precision. The test is included. CL uses FPU to provide the correct result. Change-Id: I8eaf3c46819a8cb52642a7e7d7c4e3e0edbc88db Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>
* | | Merge "Revert "[optimizing] Improve x86 shifts""Nicolas Geoffray2015-04-101-3/+0
|\ \ \
| * | | Revert "[optimizing] Improve x86 shifts"Roland Levillain2015-04-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 222fcf96c9b73bbb739012575e7e413caf9348ec. Reverting this CL as it is breaking a few tests (see http://build.chromium.org/p/client.art/builders/host-x86/builds/3251/steps/test%20optimizing/logs/stdio). Will investigate ASAP. Change-Id: Iddd8363e83a24aa49fbdf0f0c9dc12e63b4848de
* | | | Merge "Follow up of "div/rem on x86 and x86_64", to tidy up the code a little."Calin Juravle2015-04-101-1/+1
|\ \ \ \ | |_|_|/ |/| | |
| * | | Follow up of "div/rem on x86 and x86_64", to tidy up the code a little.Guillaume Sanchez2015-04-101-1/+1
| | | | | | | | | | | | | | | | Change-Id: Ibf39cbc8ac1d773599d70be2cb1e941674b60f1d
* | | | [optimizing] Improve x86 parallel moves/swapsMark Mendell2015-04-101-0/+1
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new constructor to ScratchRegisterScope that will supply a register if there is a free one, but not spill to force one. Use this to generated alternate code that doesn't use a temporary, as the spill/restore of a register generates extra instructions that aren't necessary on x86. Here is the benefit for a 32 bit memory-to-memory exchange with no free registers: < 50 push eax < 53 push ebx < 8B44244C mov eax, [esp + 76] < 8B5C246C mov ebx, [esp + 108] < 8944246C mov [esp + 108], eax < 895C244C mov [esp + 76], ebx < 5B pop ebx < 58 pop eax --- > FF742444 push [esp + 68] > FF742468 push [esp + 104] > 8F44244C pop [esp + 72] > 8F442468 pop [esp + 100] Avoid using xchg instruction, as it is slow on smaller processors. Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* | | [optimizing] Improve x86 shiftsMark Mendell2015-04-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Support memory operands for integer shifts. Generate better code for long shifts by constants. Change-Id: Icc92fa1b59cc280d4894af6f054e19b01977d5ce Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* | | Merge "Speedup div/rem by constants on x86 and x86_64"Calin Juravle2015-04-091-0/+3
|\| | | |/ |/|
| * Speedup div/rem by constants on x86 and x86_64Guillaume Sanchez2015-04-091-0/+3
| | | | | | | | | | | | This is done using the algorithms in Hacker's Delight chapter 10. Change-Id: I7bacefe10067569769ed31a1f7834f796fb41119
* | Merge "[optimizing] Implement x86/x86_64 math intrinsics"Andreas Gampe2015-04-021-1/+8
|\ \
| * | [optimizing] Implement x86/x86_64 math intrinsicsMark Mendell2015-04-011-1/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement floor/ceil/round/RoundFloat on x86 and x86_64. Implement RoundDouble on x86_64. Add support for roundss and roundsd on both architectures. Support them in the disassembler as well. Add the instruction set features for x86, as the 'round' instruction is only supported if SSE4.1 is supported. Fix the tests to handle the addition of passing the instruction set features to x86 and x86_64. Add assembler tests for roundsd and roundss to x86_64 assembler tests. Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* / Revert "Revert "Deoptimization-based bce.""Mingyao Yang2015-04-011-0/+4
|/ | | | | | This reverts commit 0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430. Change-Id: I1ca10d15bbb49897a0cf541ab160431ec180a006
* Intrinsify String.compareTo.Nicolas Geoffray2015-03-271-0/+19
| | | | Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
* [optimizing] Implement X86 intrinsic supportMark Mendell2015-03-261-0/+17
| | | | | | | | | | | Implement the supported intrinsics for X86. Enhance the graph visualizer to print <U> for unallocated locations, to allow calling the graph dumper from within register allocation for debugging purposes. Change-Id: I3b0319eb70a9a4ea228f67065b4c52d13a1ae775 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* Revert "Deoptimization-based bce."Andreas Gampe2015-03-241-4/+0
| | | | | | | | | | This breaks compiling the core image: Error after BCE: art::SSAChecker: Instruction 219 in block 1 does not dominate use 221 in block 1. This reverts commit e295e6ec5beaea31be5d7d3c996cd8cfa2053129. Change-Id: Ieeb48797d451836ed506ccb940872f1443942e4e
* Deoptimization-based bce.Mingyao Yang2015-03-231-0/+4
| | | | | | | | | | | | | | | | | | | A mechanism is introduced that a runtime method can be called from code compiled with optimizing compiler to deoptimize into interpreter. This can be used to establish invariants in the managed code If the invariant does not hold at runtime, we will deoptimize and continue execution in the interpreter. This allows to optimize the managed code as if the invariant was proven during compile time. However, the exception will be thrown according to the semantics demanded by the spec. The invariant and optimization included in this patch are based on the length of an array. Given a set of array accesses with constant indices {c1, ..., cn}, we can optimize away all bounds checks iff all 0 <= min(ci) and max(ci) < array-length. The first can be proven statically. The second can be established with a deoptimization-based invariant. This replaces n bounds checks with one invariant check (plus slow-path code). Change-Id: I8c6e34b56c85d25b91074832d13dba1db0a81569
* Revert "Revert "[optimizing] Enable x86 long support.""Nicolas Geoffray2015-03-111-0/+2
| | | | | | This reverts commit 154552e666347d41d95d7619c6ee56249ff4feca. Change-Id: Idc726551c249a888b7ff5fde8508ae50e81b2e13
* Revert "[optimizing] Enable x86 long support."Nicolas Geoffray2015-03-061-2/+0
| | | | | | | | Few libcore failures. This reverts commit b4ba354cf8d22b261205494875cc014f18587b50. Change-Id: I4a28d853e730dff9b69aec9555505803cf2fcd63
* [optimizing] Enable x86 long support.Nicolas Geoffray2015-03-061-0/+2
| | | | Change-Id: I9006972a65a1f191c45691104a960366747f9d16
* Avoid generating jmp +0.Nicolas Geoffray2015-02-181-1/+1
| | | | | | | When a block branches to a non-following block, but blocks in-between do branch to it, we can avoid doing the branch. Change-Id: I9b343f662a4efc718cd4b58168f93162a24e1219
* Merge "[optimizing compiler] Support x86 hard float ABI"Nicolas Geoffray2015-01-291-1/+5
|\
| * [optimizing compiler] Support x86 hard float ABIMark Mendell2015-01-281-1/+5
| | | | | | | | | | | | | | | | Add support for the new ABI passing FP parameters in XMM0-XMM3. This allows us to optimize for x86 methods that don't use 'long'. Change-Id: Ic79a24767173451e7d7095ccc2a00b307593a868 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* | Small optimization for recursive calls: avoid dex cache.Nicolas Geoffray2015-01-291-0/+3
|/ | | | Change-Id: I044757a2f06e535cdc1480c4fc8182b89635baf6
* Revert "Revert "ART: Implement X86 hard float (Quick/JNI/Baseline)""Mark P Mendell2015-01-271-3/+8
| | | | | | | | | | This reverts commit 949c91fb91f40a4a80b2b492913cf8541008975e. This time, don't clobber EBX before saving it. Redo some of the macros to make register usage explicit. Change-Id: I8db8662877cd006816e16a28f42444ab7c36bfef
* Revert "ART: Implement X86 hard float (Quick/JNI/Baseline)"Vladimir Marko2015-01-271-8/+3
| | | | | | | | | | | | | And the 3 Mac build fixes. Fix conflicts in context_x86.* . This reverts commits 3d2c8e74c27efee58e24ec31441124f3f21384b9 , 34eda1dd66b92a361797c63d57fa19e83c08a1b4 , f601d1954348b71186fa160a0ae6a1f4f1c5aee6 , bc503348a1da573488503cc2819c9e30807bea31 . Bug: 19150481 Change-Id: I6650ee30a7d261159380fe2119e14379e4dc9970
* ART: Implement X86 hard float (Quick/JNI/Baseline)Mark Mendell2015-01-231-3/+8
| | | | | | | | | | | | | | | | | Use XMM0-XMM3 as parameter registers for float/double on X86. X86_64 already uses XMM0-XMM7 for parameters. Change the 'hidden' argument register from XMM0 to XMM7 to avoid a conflict. Add support for FPR save/restore in runtime/arch/x86. Minimal support for Optimizing baseline compiler. Bump the version in runtime/oat.h because this is an ABI change. Change-Id: Ia6fe150e8488b9e582b0178c0dda65fc81d5a8ba Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* Support callee save floating point registers on x64.Nicolas Geoffray2015-01-231-2/+0
| | | | | | | | | - Share the computation of core_spill_mask and fpu_spill_mask between backends. - Remove explicit stack overflow check support: we need to adjust them and since they are not tested, they will easily bitrot. Change-Id: I0b619b8de4e1bdb169ea1ae7c6ede8df0d65837a
* Enable core callee-save on x64.Nicolas Geoffray2015-01-211-1/+1
| | | | | | Will work on other architectures and FP support in other CLs. Change-Id: I8cef0343eedc7202d206f5217fdf0349035f0e4d
* [optimizing compiler] Implement inline x86 FP '%'Mark Mendell2015-01-211-0/+3
| | | | | | | | | | | | | | Replace the calls to fmod/fmodf by inline code as is done in the Quick compiler. Remove the quick fmod/fmodf runtime entries, as they are no longer in use. 64 bit code generator Move() routine needed to be enhanced to handle constants, as Location::Any() allows them to be generated. Change-Id: I6b6a42f6faeed4b0b3c940453e487daf5b25d184 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* Add implicit null checks for the optimizing compilerCalin Juravle2015-01-161-1/+5
| | | | | | | | | - for backends: arm, arm64, x86, x86_64 - fixed parameter passing for CodeGenerator - 003-omnibus-opcodes test verifies that NullPointerExceptions work as expected Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8
* [optimizing compiler] Compute live spill sizeMark Mendell2015-01-151-0/+5
| | | | | | | | | | | | | | | | | The current stack frame calculation assumes that each live register to be saved/restored has the word size of the machine. This fails for X86, where a double in an XMM register takes up 8 bytes. Change the calculation to keep track of the number of core registers and number of fp registers to handle this distinction. This is slightly pessimal, as the registers may not be active at the same time, but the only way to handle this would be to allocate both classes of registers simultaneously, or remember all the active intervals, matching them up and compute the size of each safepoint interval. Change-Id: If7860aa319b625c214775347728cdf49a56946eb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* Implement double and float support for arm in register allocator.Nicolas Geoffray2015-01-081-0/+4
| | | | | | | | | | | | The basic approach is: - An instruction that needs two registers gets two intervals. - When allocating the low part, we also allocate the high part. - When splitting a low (or high) interval, we also split the high (or low) equivalent. - Allocation follows the (S/D register) requirement that low registers are always even and the high equivalent is low + 1. Change-Id: I06a5148e05a2ffc7e7555d08e871ed007b4c2797
* [optimizing compiler] Add support for volatileCalin Juravle2014-12-191-0/+6
| | | | | | | | | | | | - for backends: arm, x86, x86_64 - added necessary instructions to assemblies - clean up code gen for field set/get - fixed InstructionDataEquals for some instructions - fixed comments in compiler_enums * 003-opcode test verifies basic volatile functionality Change-Id: I144393efa312dfb2c332cb84056b00edffee338a
* [optimizing compiler] Add shiftsCalin Juravle2014-11-241-0/+5
| | | | | | Added SHL, SHR, USHR for arm, x86, x86_64. Change-Id: I971f594e270179457e6958acf1401ff7630df07e
* Consistently use k{InstructionSet}WordSize.Nicolas Geoffray2014-11-191-1/+2
| | | | | | | | These constants were defined prior to k{InstructionSet}PointerSize. So use them consistently in optimizing as a first step. We can discuss whether we should remove them in a second step. Change-Id: If129de1a3bb8b65f8d9c816a8ad466815fb202e6
* [optimizing compiler] Add REM_INT, REM_LONGCalin Juravle2014-11-171-0/+1
| | | | | | | - for arm, x86, x86_64 - minor cleanup/fix in div tests Change-Id: I240874010206a5a9b3aaffbc81a885b94c248f93
* Merge "Do a parallel move in BoundsCheckSlowPath."Nicolas Geoffray2014-11-131-1/+1
|\
| * Do a parallel move in BoundsCheckSlowPath.Nicolas Geoffray2014-11-121-1/+1
| | | | | | | | | | | | | | | | The two locations of the index and length could overlap, so we need a parallel move. Also factorize the code for doing a parallel move based on two locations. Change-Id: Iee8b3459e2eed6704d45e9a564fb2cd050741ea4
* | Implement and/or/xor in optimizing.Nicolas Geoffray2014-11-121-1/+3
|/ | | | Change-Id: I7cf6da1fd334a7177a5580931b8f174dd40b7cec