| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimizing + quick tests are passing, devices boot.
TODO: Test and fix bugs in mips64.
Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.
Bug: 19264997
(cherry picked from commit e401d146407d61eeb99f8d6176b2ac13c4df1e33)
Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d
Fix some ArtMethod related bugs
Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.
Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.
Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.
TODO: Fix JDWP tests.
Bug: 19264997
Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3
ART: Fix casts for 64-bit pointers on 32-bit compiler.
Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457
Fix JDWP tests after ArtMethod change
Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.
Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.
Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2
Fix accidental IMT and root marking regression
Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.
Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.
EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots
Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0
Fix bogus image test assert
Previously we were comparing the size of the non moving space to
size of the image file.
Now we properly compare the size of the image space against the size
of the image file.
Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a
[MIPS64] Fix art_quick_invoke_stub argument offsets.
ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.
This fixes mips64 boot.
Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generate Moves of constant FP values by loading from the constant table.
Use 'movl' to load a 64 bit register for positive 32-bit values, saving
a byte in the generated code by taking advantage of the implicit
zero extension.
Change a couple of xorq(reg, reg) to xorl to (potentially) save a byte
of code per xor.
Change-Id: I5b2a807f0d3b29294fd4e7b8ef6d654491fa0b01
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
|
|
| |
Change-Id: I7ede0f59d5109644887bf5d39201d4e1bf043f34
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary of high level changes:
- Adds compiler inliner support to identify string init methods
- Adds compiler support (quick & optimizing) with new invoke code path
that calls method off the thread pointer
- Adds thread entrypoints for all string init methods
- Adds map to verifier to log when receiver of string init has been
copied to other registers. used by compiler and interpreter
Change-Id: I797b992a8feb566f9ad73060011ab6f51eb7ce01
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The algorithm of ParallelMoveResolverNoSwap() is almost the same with
ParallelMoveResolverWithSwap(), except the way we resolve the circular
dependency. NoSwap() uses additional scratch register to resolve the
circular dependency. For example, (0->1) (1->2) (2->0) will be performed
as (2->scratch) (1->2) (0->1) (scratch->0).
On architectures without swap register support, NoSwap() can reduce the
number of moves from 3x(N-1) to (N+1) when there is circular dependency
with N moves.
And also, NoSwap() algorithm does not depend on architecture register
layout information, which means it can support register pairs on arm32
and X/W, D/S registers on arm64 without additional modification.
Change-Id: Idf56bd5469bb78c0e339e43ab16387428a082318
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit a5c19ce8d200d68a528f2ce0ebff989106c4a933.
This commit introduces a performance regression on CaffeineLogic of 30%.
Change-Id: I917e206e249d44e1748537bc1b2d31054ea4959d
|
|\ \
| |/
|/| |
|
| |
| |
| |
| | |
Change-Id: Ibf39cbc8ac1d773599d70be2cb1e941674b60f1d
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a new constructor to ScratchRegisterScope that will supply a
register if there is a free one, but not spill to force one. Use this
to generated alternate code that doesn't use a temporary, as the
spill/restore of a register generates extra instructions that aren't
necessary on x86.
Here is the benefit for a 32 bit memory-to-memory exchange with no
free registers:
< 50 push eax
< 53 push ebx
< 8B44244C mov eax, [esp + 76]
< 8B5C246C mov ebx, [esp + 108]
< 8944246C mov [esp + 108], eax
< 895C244C mov [esp + 76], ebx
< 5B pop ebx
< 58 pop eax
---
> FF742444 push [esp + 68]
> FF742468 push [esp + 104]
> 8F44244C pop [esp + 72]
> 8F442468 pop [esp + 100]
Avoid using xchg instruction, as it is slow on smaller processors.
Change-Id: Id29ee3abd998577baaee552d55d23e60ae0c7871
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Nicolas had some comments after the patch
https://android-review.googlesource.com/#/c/144100 had merged. Fix the
problems that he found.
Change-Id: I40e8a4273997860db7511dc8f1986281b72bead2
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support a constant area addressed using RIP on x86_64. Use it for FP
operations to avoid loading constants into a CPU register and moving
to a XMM register.
Change-Id: I58421759ef2a8475538876c20e696ec787015a72
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|\| |
|
| |
| |
| |
| |
| |
| | |
This is done using the algorithms in Hacker's Delight chapter 10.
Change-Id: I7bacefe10067569769ed31a1f7834f796fb41119
|
|\ \ |
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Implement floor/ceil/round/RoundFloat on x86 and x86_64.
Implement RoundDouble on x86_64.
Add support for roundss and roundsd on both architectures. Support them
in the disassembler as well.
Add the instruction set features for x86, as the 'round' instruction is
only supported if SSE4.1 is supported.
Fix the tests to handle the addition of passing the instruction set
features to x86 and x86_64.
Add assembler tests for roundsd and roundss to x86_64 assembler tests.
Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|/
|
|
|
|
| |
This reverts commit 0ba627337274ccfb8c9cb9bf23fffb1e1b9d1430.
Change-Id: I1ca10d15bbb49897a0cf541ab160431ec180a006
|
|
|
|
| |
Change-Id: Ia540df98755ac493fe61bd63f0bd94f6d97fbb57
|
|
|
|
|
|
|
|
|
|
| |
This breaks compiling the core image:
Error after BCE: art::SSAChecker: Instruction 219 in block 1 does not dominate use 221 in block 1.
This reverts commit e295e6ec5beaea31be5d7d3c996cd8cfa2053129.
Change-Id: Ieeb48797d451836ed506ccb940872f1443942e4e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A mechanism is introduced that a runtime method can be called
from code compiled with optimizing compiler to deoptimize into
interpreter. This can be used to establish invariants in the managed code
If the invariant does not hold at runtime, we will deoptimize and continue
execution in the interpreter. This allows to optimize the managed code as
if the invariant was proven during compile time. However, the exception
will be thrown according to the semantics demanded by the spec.
The invariant and optimization included in this patch are based on the
length of an array. Given a set of array accesses with constant indices
{c1, ..., cn}, we can optimize away all bounds checks iff all 0 <= min(ci) and
max(ci) < array-length. The first can be proven statically. The second can be
established with a deoptimization-based invariant. This replaces n bounds
checks with one invariant check (plus slow-path code).
Change-Id: I8c6e34b56c85d25b91074832d13dba1db0a81569
|
|
|
|
|
|
|
| |
When a block branches to a non-following block, but blocks
in-between do branch to it, we can avoid doing the branch.
Change-Id: I9b343f662a4efc718cd4b58168f93162a24e1219
|
|
|
|
| |
Change-Id: I044757a2f06e535cdc1480c4fc8182b89635baf6
|
|
|
|
|
|
| |
Implement most intrinsics for the optimizing compiler for Arm64.
Change-Id: Idb459be09f0524cb9aeab7a5c7fccb1c6b65a707
|
|
|
|
|
|
|
|
|
| |
- Share the computation of core_spill_mask and fpu_spill_mask
between backends.
- Remove explicit stack overflow check support: we need to adjust
them and since they are not tested, they will easily bitrot.
Change-Id: I0b619b8de4e1bdb169ea1ae7c6ede8df0d65837a
|
|
|
|
|
|
| |
Will work on other architectures and FP support in other CLs.
Change-Id: I8cef0343eedc7202d206f5217fdf0349035f0e4d
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the calls to fmod/fmodf by inline code as is done in the Quick
compiler.
Remove the quick fmod/fmodf runtime entries, as they are no longer in
use.
64 bit code generator Move() routine needed to be enhanced to handle
constants, as Location::Any() allows them to be generated.
Change-Id: I6b6a42f6faeed4b0b3c940453e487daf5b25d184
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
|
|
|
|
|
|
|
| |
- for backends: arm, arm64, x86, x86_64
- fixed parameter passing for CodeGenerator
- 003-omnibus-opcodes test verifies that NullPointerExceptions work as
expected
Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8
|
|
|
|
|
|
|
|
|
| |
Add intrinsics infrastructure to the optimizing compiler.
Add almost all intrinsics supported by Quick to the x86-64 backend.
Further intrinsics require more assembler support.
Change-Id: I48de9b44c82886bb298d16e74e12a9506b8e8807
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current stack frame calculation assumes that each live register to
be saved/restored has the word size of the machine. This fails for X86,
where a double in an XMM register takes up 8 bytes. Change the
calculation to keep track of the number of core registers and number of
fp registers to handle this distinction.
This is slightly pessimal, as the registers may not be active at the
same time, but the only way to handle this would be to allocate both
classes of registers simultaneously, or remember all the active
intervals, matching them up and compute the size of each safepoint
interval.
Change-Id: If7860aa319b625c214775347728cdf49a56946eb
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basic approach is:
- An instruction that needs two registers gets two intervals.
- When allocating the low part, we also allocate the high part.
- When splitting a low (or high) interval, we also split the high
(or low) equivalent.
- Allocation follows the (S/D register) requirement that low
registers are always even and the high equivalent is low + 1.
Change-Id: I06a5148e05a2ffc7e7555d08e871ed007b4c2797
|
|
|
|
|
|
|
|
|
|
|
|
| |
- for backends: arm, x86, x86_64
- added necessary instructions to assemblies
- clean up code gen for field set/get
- fixed InstructionDataEquals for some instructions
- fixed comments in compiler_enums
* 003-opcode test verifies basic volatile functionality
Change-Id: I144393efa312dfb2c332cb84056b00edffee338a
|
|
|
|
|
|
| |
Added SHL, SHR, USHR for arm, x86, x86_64.
Change-Id: I971f594e270179457e6958acf1401ff7630df07e
|
|
|
|
|
|
|
|
| |
These constants were defined prior to k{InstructionSet}PointerSize. So
use them consistently in optimizing as a first step. We can discuss
whether we should remove them in a second step.
Change-Id: If129de1a3bb8b65f8d9c816a8ad466815fb202e6
|
|
|
|
|
|
|
| |
- for arm, x86, x86_64
- minor cleanup/fix in div tests
Change-Id: I240874010206a5a9b3aaffbc81a885b94c248f93
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| | |
The two locations of the index and length could overlap,
so we need a parallel move. Also factorize the code for
doing a parallel move based on two locations.
Change-Id: Iee8b3459e2eed6704d45e9a564fb2cd050741ea4
|
|/
|
|
| |
Change-Id: I7cf6da1fd334a7177a5580931b8f174dd40b7cec
|
|
|
|
|
|
|
|
| |
- We currently don't run optimizations in the presence of a try/catch.
- We therefore implement Quick's mapping table.
- Also fix a missing null check on array-length.
Change-Id: I6917dfcb868e75c1cf6eff32b7cbb60b6cfbd68f
|
|
|
|
| |
Change-Id: Ia8c8dfbef87cb2f7893bfb6e178466154eec9efd
|
|
|
|
| |
Change-Id: Id2f010589e2bd6faf42c05bb33abf6816ebe9fa9
|
|
|
|
|
|
|
|
| |
Also:
- Fix misuses of emitting the rex prefix in the x86_64 assembler.
- Fix movaps code generation in the x86_64 assembler.
Change-Id: Ib6dcf6e7c4a9c43368cfc46b02ba50f69ae69cbe
|
|
|
|
|
|
|
|
|
| |
The arm64 backend uses its own assembler and does not share
the same classes as the other backends. To avoid conflicts
or unnecessary mappings, just don't use those classes in the
shared part of the code generator.
Change-Id: I9e5fa40c1021d2e83a4ef14c52cd1ccd03f2f73d
|
|
|
|
|
|
|
|
| |
- Use three arrays for blocking regsters instead of
one and computing offsets in that array.]
- Don't pass blocked_registers_ to methods, just use the field.
Change-Id: Ib698564c31127c59b5a64c80f4262394b8394dc6
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the logic for knowing if a condition needs to be materialized
in an optimization pass (so that the information does not change
as a side effect of another optimization).
Also clean-up arm and x86_64 codegen:
- arm: ldr and str are for power-users when a constant is
in play. We should use LoadFromOffset and StoreToOffset.
- x86_64: fix misuses of movq instead of movl.
Change-Id: I01a03b91803624be2281a344a13ad5efbf4f3ef3
|
|
|
|
|
|
|
| |
Now the source of truth is the Location object that knows
which register (core, pair, fpu) it needs to refer to.
Change-Id: I62401343d7479ecfb24b5ed161ec7829cda5a0b1
|
|
|
|
|
|
|
| |
- Follows Quick conventions.
- Currently only works with baseline register allocator.
Change-Id: Ie4b8e298f4f5e1cd82364da83e4344d4fc3621a3
|
|
|
|
|
|
|
|
|
|
| |
- Remove the ones added during graph build (they were added
for the baseline code generator).
- Emit them at loop back edges after phi moves, so that the test
can directly jump to the loop header.
- Fix x86 and x86_64 suspend check by using cmpw instead of cmpl.
Change-Id: I6fad5795a55705d86c9e1cb85bf5d63dadfafa2a
|
|
|
|
|
|
| |
And use it in suspend check slow paths.
Change-Id: I79caf28f334c145a36180c79a6e2fceae3990c31
|
|
|
|
|
|
|
| |
Also refactor 004 tests to make them work with both Quick and
Optimizing.
Change-Id: I87e275cb0ae0258fc3bb32b612140000b1d2adf8
|
|
|
|
|
|
| |
Also fix a couple of assembler/disassembler issues.
Change-Id: I705c8572988c1a9c4df3172b304678529636d5f6
|