summaryrefslogtreecommitdiffstats
path: root/compiler/optimizing/code_generator_x86_64.h
Commit message (Collapse)AuthorAgeFilesLines
* Move mirror::ArtMethod to nativeMathieu Chartier2015-06-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimizing + quick tests are passing, devices boot. TODO: Test and fix bugs in mips64. Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS. Some of the savings are from removal of virtual methods and direct methods object arrays. Bug: 19264997 (cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33) Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d Fix some ArtMethod related bugs Added root visiting for runtime methods, not currently required since the GcRoots in these methods are null. Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes --trace run-tests 005, 044. Fixed optimizing compiler bug where we used a normal stack location instead of double on ARM64, this fixes the debuggable tests. TODO: Fix JDWP tests. Bug: 19264997 Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3 ART: Fix casts for 64-bit pointers on 32-bit compiler. Bug: 19264997 Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457 Fix JDWP tests after ArtMethod change Fixes Throwable::GetStackDepth for exception event detection after internal stack trace representation change. Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of proxy method. Bug: 19264997 Change-Id: I363e293796848c3ec491c963813f62d868da44d2 Fix accidental IMT and root marking regression Was always using the conflict trampoline. Also included fix for regression in GC time caused by extra roots. Most of the regression was IMT. Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to detached thread. EvaluateAndApplyChanges: From ~2500 -> ~1980 GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots Bug: 19264997 Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0 Fix bogus image test assert Previously we were comparing the size of the non moving space to size of the image file. Now we properly compare the size of the image space against the size of the image file. Bug: 19264997 Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a [MIPS64] Fix art_quick_invoke_stub argument offsets. ArtMethod reference's size got bigger, so we need to move other args and leave enough space for ArtMethod* and 'this' pointer. This fixes mips64 boot. Bug: 19264997 Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
* [optimizing] Tune some x86_64 movesMark Mendell2015-05-071-0/+3
| | | | | | | | | | | | | | Generate Moves of constant FP values by loading from the constant table. Use 'movl' to load a 64 bit register for positive 32-bit values, saving a byte in the generated code by taking advantage of the implicit zero extension. Change a couple of xorq(reg, reg) to xorl to (potentially) save a byte of code per xor. Change-Id: I5b2a807f0d3b29294fd4e7b8ef6d654491fa0b01 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* Refactor InvokeDexCallingConventionVisitor in Optimizing.Roland Levillain2015-04-291-12/+7
| | | | Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
* Replace String CharArray with internal uint16_t array.Jeff Hao2015-04-271-1/+1
| | | | | | | | | | | | Summary of high level changes: - Adds compiler inliner support to identify string init methods - Adds compiler support (quick & optimizing) with new invoke code path that calls method off the thread pointer - Adds thread entrypoints for all string init methods - Adds map to verifier to log when receiver of string init has been copied to other registers. used by compiler and interpreter Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
* Opt compiler: Implement parallel move resolver without using swap.Zheng Xu2015-04-171-2/+2
| | | | | | | | | | | | | | | | | | The algorithm of ParallelMoveResolverNoSwap() is almost the same with ParallelMoveResolverWithSwap(), except the way we resolve the circular dependency. NoSwap() uses additional scratch register to resolve the circular dependency. For example, (0->1) (1->2) (2->0) will be performed as (2->scratch) (1->2) (0->1) (scratch->0). On architectures without swap register support, NoSwap() can reduce the number of moves from 3x(N-1) to (N+1) when there is circular dependency with N moves. And also, NoSwap() algorithm does not depend on architecture register layout information, which means it can support register pairs on arm32 and X/W, D/S registers on arm64 without additional modification. Change-Id: Idf56bd5469bb78c0e339e43ab16387428a082318
* Merge "Revert "[optimizing] Improve x86 parallel moves/swaps""Calin Juravle2015-04-161-1/+0
|\
| * Revert "[optimizing] Improve x86 parallel moves/swaps"Guillaume Sanchez2015-04-151-1/+0
| | | | | | | | | | | | | | | | This reverts commit a5c19ce8d200d68a528f2ce0ebff989106c4a933. This commit introduces a performance regression on CaffeineLogic of 30%. Change-Id: I917e206e249d44e1748537bc1b2d31054ea4959d
* | Merge "Follow up of "div/rem on x86 and x86_64", to tidy up the code a little."Calin Juravle2015-04-101-1/+1
|\ \ | |/ |/|
| * Follow up of "div/rem on x86 and x86_64", to tidy up the code a little.Guillaume Sanchez2015-04-101-1/+1
| | | | | | | | Change-Id: Ibf39cbc8ac1d773599d70be2cb1e941674b60f1d
* | [optimizing] Improve x86 parallel moves/swapsMark Mendell2015-04-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new constructor to ScratchRegisterScope that will supply a register if there is a free one, but not spill to force one. Use this to generated alternate code that doesn't use a temporary, as the spill/restore of a register generates extra instructions that aren't necessary on x86. Here is the benefit for a 32 bit memory-to-memory exchange with no free registers: < 50 push eax < 53 push ebx < 8B44244C mov eax, [esp + 76] < 8B5C246C mov ebx, [esp + 108] < 8944246C mov [esp + 108], eax < 895C244C mov [esp + 76], ebx < 5B pop ebx < 58 pop eax --- > FF742444 push [esp + 68] > FF742468 push [esp + 104] > 8F44244C pop [esp + 72] > 8F442468 pop [esp + 100] Avoid using xchg instruction, as it is slow on smaller processors. Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* | [optimizing] Address x86_64 RIP patch commentsMark Mendell2015-04-101-1/+1
| | | | | | | | | | | | | | | | | | Nicolas had some comments after the patch https://android-review.googlesource.com/#/c/144100 had merged. Fix the problems that he found. Change-Id: I40e8a4273997860db7511dc8f1986281b72bead2 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* | [optimizing] Add RIP support for x86_64Mark Mendell2015-04-091-0/+14
| | | | | | | | | | | | | | | | | | Support a constant area addressed using RIP on x86_64. Use it for FP operations to avoid loading constants into a CPU register and moving to a XMM register. Change-Id: I58421759ef2a8475538876c20e696ec787015a72 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* | Merge "Speedup div/rem by constants on x86 and x86_64"Calin Juravle2015-04-091-0/+3
|\|
| * Speedup div/rem by constants on x86 and x86_64Guillaume Sanchez2015-04-091-0/+3
| | | | | | | | | | | | This is done using the algorithms in Hacker's Delight chapter 10. Change-Id: I7bacefe10067569769ed31a1f7834f796fb41119
* | Merge "[optimizing] Implement x86/x86_64 math intrinsics"Andreas Gampe2015-04-021-1/+8
|\ \
| * | [optimizing] Implement x86/x86_64 math intrinsicsMark Mendell2015-04-011-1/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement floor/ceil/round/RoundFloat on x86 and x86_64. Implement RoundDouble on x86_64. Add support for roundss and roundsd on both architectures. Support them in the disassembler as well. Add the instruction set features for x86, as the 'round' instruction is only supported if SSE4.1 is supported. Fix the tests to handle the addition of passing the instruction set features to x86 and x86_64. Add assembler tests for roundsd and roundss to x86_64 assembler tests. Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* / Revert "Revert "Deoptimization-based bce.""Mingyao Yang2015-04-011-0/+4
|/ | | | | | This reverts commit 0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430. Change-Id: I1ca10d15bbb49897a0cf541ab160431ec180a006
* Intrinsify String.compareTo.Nicolas Geoffray2015-03-271-0/+19
| | | | Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
* Revert "Deoptimization-based bce."Andreas Gampe2015-03-241-4/+0
| | | | | | | | | | This breaks compiling the core image: Error after BCE: art::SSAChecker: Instruction 219 in block 1 does not dominate use 221 in block 1. This reverts commit e295e6ec5beaea31be5d7d3c996cd8cfa2053129. Change-Id: Ieeb48797d451836ed506ccb940872f1443942e4e
* Deoptimization-based bce.Mingyao Yang2015-03-231-0/+4
| | | | | | | | | | | | | | | | | | | A mechanism is introduced that a runtime method can be called from code compiled with optimizing compiler to deoptimize into interpreter. This can be used to establish invariants in the managed code If the invariant does not hold at runtime, we will deoptimize and continue execution in the interpreter. This allows to optimize the managed code as if the invariant was proven during compile time. However, the exception will be thrown according to the semantics demanded by the spec. The invariant and optimization included in this patch are based on the length of an array. Given a set of array accesses with constant indices {c1, ..., cn}, we can optimize away all bounds checks iff all 0 <= min(ci) and max(ci) < array-length. The first can be proven statically. The second can be established with a deoptimization-based invariant. This replaces n bounds checks with one invariant check (plus slow-path code). Change-Id: I8c6e34b56c85d25b91074832d13dba1db0a81569
* Avoid generating jmp +0.Nicolas Geoffray2015-02-181-1/+1
| | | | | | | When a block branches to a non-following block, but blocks in-between do branch to it, we can avoid doing the branch. Change-Id: I9b343f662a4efc718cd4b58168f93162a24e1219
* Small optimization for recursive calls: avoid dex cache.Nicolas Geoffray2015-01-291-0/+1
| | | | Change-Id: I044757a2f06e535cdc1480c4fc8182b89635baf6
* ART: Arm64 optimizing compiler intrinsicsAndreas Gampe2015-01-281-2/+0
| | | | | | Implement most intrinsics for the optimizing compiler for Arm64. Change-Id: Idb459be09f0524cb9aeab7a5c7fccb1c6b65a707
* Support callee save floating point registers on x64.Nicolas Geoffray2015-01-231-2/+0
| | | | | | | | | - Share the computation of core_spill_mask and fpu_spill_mask between backends. - Remove explicit stack overflow check support: we need to adjust them and since they are not tested, they will easily bitrot. Change-Id: I0b619b8de4e1bdb169ea1ae7c6ede8df0d65837a
* Enable core callee-save on x64.Nicolas Geoffray2015-01-211-1/+1
| | | | | | Will work on other architectures and FP support in other CLs. Change-Id: I8cef0343eedc7202d206f5217fdf0349035f0e4d
* [optimizing compiler] Implement inline x86 FP '%'Mark Mendell2015-01-211-0/+3
| | | | | | | | | | | | | | Replace the calls to fmod/fmodf by inline code as is done in the Quick compiler. Remove the quick fmod/fmodf runtime entries, as they are no longer in use. 64 bit code generator Move() routine needed to be enhanced to handle constants, as Location::Any() allows them to be generated. Change-Id: I6b6a42f6faeed4b0b3c940453e487daf5b25d184 Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* Add implicit null checks for the optimizing compilerCalin Juravle2015-01-161-1/+4
| | | | | | | | | - for backends: arm, arm64, x86, x86_64 - fixed parameter passing for CodeGenerator - 003-omnibus-opcodes test verifies that NullPointerExceptions work as expected Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8
* ART: Optimizing compiler intrinsicsAndreas Gampe2015-01-151-1/+18
| | | | | | | | | Add intrinsics infrastructure to the optimizing compiler. Add almost all intrinsics supported by Quick to the x86-64 backend. Further intrinsics require more assembler support. Change-Id: I48de9b44c82886bb298d16e74e12a9506b8e8807
* [optimizing compiler] Compute live spill sizeMark Mendell2015-01-151-0/+4
| | | | | | | | | | | | | | | | | The current stack frame calculation assumes that each live register to be saved/restored has the word size of the machine. This fails for X86, where a double in an XMM register takes up 8 bytes. Change the calculation to keep track of the number of core registers and number of fp registers to handle this distinction. This is slightly pessimal, as the registers may not be active at the same time, but the only way to handle this would be to allocate both classes of registers simultaneously, or remember all the active intervals, matching them up and compute the size of each safepoint interval. Change-Id: If7860aa319b625c214775347728cdf49a56946eb Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
* Implement double and float support for arm in register allocator.Nicolas Geoffray2015-01-081-0/+4
| | | | | | | | | | | | The basic approach is: - An instruction that needs two registers gets two intervals. - When allocating the low part, we also allocate the high part. - When splitting a low (or high) interval, we also split the high (or low) equivalent. - Allocation follows the (S/D register) requirement that low registers are always even and the high equivalent is low + 1. Change-Id: I06a5148e05a2ffc7e7555d08e871ed007b4c2797
* [optimizing compiler] Add support for volatileCalin Juravle2014-12-191-0/+6
| | | | | | | | | | | | - for backends: arm, x86, x86_64 - added necessary instructions to assemblies - clean up code gen for field set/get - fixed InstructionDataEquals for some instructions - fixed comments in compiler_enums * 003-opcode test verifies basic volatile functionality Change-Id: I144393efa312dfb2c332cb84056b00edffee338a
* [optimizing compiler] Add shiftsCalin Juravle2014-11-241-0/+2
| | | | | | Added SHL, SHR, USHR for arm, x86, x86_64. Change-Id: I971f594e270179457e6958acf1401ff7630df07e
* Consistently use k{InstructionSet}WordSize.Nicolas Geoffray2014-11-191-1/+2
| | | | | | | | These constants were defined prior to k{InstructionSet}PointerSize. So use them consistently in optimizing as a first step. We can discuss whether we should remove them in a second step. Change-Id: If129de1a3bb8b65f8d9c816a8ad466815fb202e6
* [optimizing compiler] Add REM_INT, REM_LONGCalin Juravle2014-11-171-0/+1
| | | | | | | - for arm, x86, x86_64 - minor cleanup/fix in div tests Change-Id: I240874010206a5a9b3aaffbc81a885b94c248f93
* Merge "Do a parallel move in BoundsCheckSlowPath."Nicolas Geoffray2014-11-131-1/+1
|\
| * Do a parallel move in BoundsCheckSlowPath.Nicolas Geoffray2014-11-121-1/+1
| | | | | | | | | | | | | | | | The two locations of the index and length could overlap, so we need a parallel move. Also factorize the code for doing a parallel move based on two locations. Change-Id: Iee8b3459e2eed6704d45e9a564fb2cd050741ea4
* | Implement and/or/xor in optimizing.Nicolas Geoffray2014-11-121-1/+3
|/ | | | Change-Id: I7cf6da1fd334a7177a5580931b8f174dd40b7cec
* Implement try/catch/throw in optimizing.Nicolas Geoffray2014-11-061-26/+30
| | | | | | | | - We currently don't run optimizations in the presence of a try/catch. - We therefore implement Quick's mapping table. - Also fix a missing null check on array-length. Change-Id: I6917dfcb868e75c1cf6eff32b7cbb60b6cfbd68f
* Implement CONST_CLASS in optimizing compiler.Nicolas Geoffray2014-11-041-0/+2
| | | | Change-Id: Ia8c8dfbef87cb2f7893bfb6e178466154eec9efd
* Add support for static fields in optimizing compiler.Nicolas Geoffray2014-10-291-2/+2
| | | | Change-Id: Id2f010589e2bd6faf42c05bb33abf6816ebe9fa9
* Implement register allocator for floating point registers.Nicolas Geoffray2014-10-211-2/+6
| | | | | | | | Also: - Fix misuses of emitting the rex prefix in the x86_64 assembler. - Fix movaps code generation in the x86_64 assembler. Change-Id: Ib6dcf6e7c4a9c43368cfc46b02ba50f69ae69cbe
* Don't use assembler classes in code_generator.h.Nicolas Geoffray2014-10-161-1/+11
| | | | | | | | | The arm64 backend uses its own assembler and does not share the same classes as the other backends. To avoid conflicts or unnecessary mappings, just don't use those classes in the shared part of the code generator. Change-Id: I9e5fa40c1021d2e83a4ef14c52cd1ccd03f2f73d
* Cleanup baseline register allocator.Nicolas Geoffray2014-10-101-15/+2
| | | | | | | | - Use three arrays for blocking regsters instead of one and computing offsets in that array.] - Don't pass blocked_registers_ to methods, just use the field. Change-Id: Ib698564c31127c59b5a64c80f4262394b8394dc6
* Fix code generation of materialized conditions.Nicolas Geoffray2014-10-091-2/+2
| | | | | | | | | | | | | Move the logic for knowing if a condition needs to be materialized in an optimization pass (so that the information does not change as a side effect of another optimization). Also clean-up arm and x86_64 codegen: - arm: ldr and str are for power-users when a constant is in play. We should use LoadFromOffset and StoreToOffset. - x86_64: fix misuses of movq instead of movl. Change-Id: I01a03b91803624be2281a344a13ad5efbf4f3ef3
* Stop converting from Location to ManagedRegister.Nicolas Geoffray2014-10-091-1/+1
| | | | | | | Now the source of truth is the Location object that knows which register (core, pair, fpu) it needs to refer to. Change-Id: I62401343d7479ecfb24b5ed161ec7829cda5a0b1
* Add support for floats and doubles.Nicolas Geoffray2014-10-071-4/+14
| | | | | | | - Follows Quick conventions. - Currently only works with baseline register allocator. Change-Id: Ie4b8e298f4f5e1cd82364da83e4344d4fc3621a3
* Optimize suspend checks in optimizing compiler.Nicolas Geoffray2014-09-251-0/+5
| | | | | | | | | | - Remove the ones added during graph build (they were added for the baseline code generator). - Emit them at loop back edges after phi moves, so that the test can directly jump to the loop header. - Fix x86 and x86_64 suspend check by using cmpw instead of cmpl. Change-Id: I6fad5795a55705d86c9e1cb85bf5d63dadfafa2a
* Support for saving and restoring live registers in a slow path.Nicolas Geoffray2014-09-231-0/+2
| | | | | | And use it in suspend check slow paths. Change-Id: I79caf28f334c145a36180c79a6e2fceae3990c31
* Implement invoke virtual in optimizing compiler.Nicolas Geoffray2014-09-171-0/+2
| | | | | | | Also refactor 004 tests to make them work with both Quick and Optimizing. Change-Id: I87e275cb0ae0258fc3bb32b612140000b1d2adf8
* Implement array get and array put in optimizing.Nicolas Geoffray2014-07-281-1/+4
| | | | | | Also fix a couple of assembler/disassembler issues. Change-Id: I705c8572988c1a9c4df3172b304678529636d5f6