aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* android: Enable llvmpipe when using the swrast driverHEADreplicant-9Ricardo 'Grim' Cabrita2019-07-224-2/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (v1) original patches by Wu Zhen https://patchwork.freedesktop.org/series/6451/ https://patchwork.freedesktop.org/series/17611/ (v2) resolution of conflicts and support for Rob Herring's "Android: push driver build details to driver makefiles" also by Wu Zhen https://github.com/maurossi/mesa/commit/f7c2a171a4320a5a713861440f5f3df2611d7b1e.patch (v3) compatibility with Android 9 by the Replicant project Changes from v1 to v2 of the patch: Android.mk add swrast.HAVE_GALLIUM_LLVMPIPE, MESA_ENABLE_LLVM rules src/gallium/Android.mk resolution of conflicts src/gallium/drivers/llvmpipe/Android.mk added GALLIUM_LIBS rules src/gallium/targets/dri/Android.mk resolution of conflicts Changes from v2 to v3 of the patch: src/gallium/drivers/llvmpipe/Android.mk Added module libmesa_winsys_sw_kms_dri Call mesa-build-with-llvm instead of directly assigning LOCAL_SHARED_LIBRARIES src/gallium/targets/dri/Android.mk Include libLLVM as a shared library when building the gallium_dri module Co-authored-by: Wu Zhen <wuzhen@jidemail.com> Signed-off-by: Ricardo 'Grim' Cabrita <grimkriegor@krutt.org>
* android: Switch LLVM shared lib according to Android versionRicardo 'Grim' Cabrita2019-07-221-5/+9
| | | | | | | | | | | | Android versions 9+ renamed LLVM's shared library module from `libLLVM` to `libLLVM_android`. This commit sets `libLLVM_android` as the new default and allows the `mesa-build-with-llvm` function to switch to `libLLVM` when dealing with Android versions between 6 and 8. Signed-off-by: Ricardo 'Grim' Cabrita <grimkriegor@krutt.org> Acked-by: Mauro Rossi <issor.oruam@gmail.com>
* egl/dri2: Allow using kms_swrast with exynos driverRicardo 'Grim' Cabrita2019-07-221-2/+3
| | | | Signed-off-by: Ricardo 'Grim' Cabrita <grimkriegor@krutt.org>
* Use DRM render node for software renderingJoonas Kylmälä2019-07-221-1/+1
| | | | | | | We allow in our kernel dumb buffer creation only in render nodes so with current setup this is a must. Signed-off-by: Joonas Kylmälä <joonas.kylmala@iki.fi>
* intel/compiler: Use nir_opt_conditional_discardCaio Marcelo de Oliveira Filho2019-07-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | anv vkpipeline-db results for SKL: total instructions in shared programs: 3622461 -> 3611281 (-0.31%) instructions in affected programs: 396452 -> 385272 (-2.82%) helped: 2062 HURT: 1 total cycles in shared programs: 1458144669 -> 1458105320 (<.01%) cycles in affected programs: 4171830 -> 4132481 (-0.94%) helped: 1874 HURT: 180 total loops in shared programs: 2437 -> 2437 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 8745 -> 8748 (0.03%) spills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 total fills in shared programs: 23392 -> 23395 (0.01%) fills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 LOST: 0 GAINED: 1 No changes to shader-db on i965 or iris. The glsl compiler already does a similar optimization. Improvement suggested by Daniel Schürmann. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
* pan/decode: Disable magic divisor debuggingAlyssa Rosenzweig2019-07-221-0/+2
| | | | | | | Memory corruption (for both legitimate and illegitimate reasons) causes this to hang pantrace. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Report spills:fills to shader-dbAlyssa Rosenzweig2019-07-223-2/+12
| | | | | | | Route this info through so we can track how we're doing on register spilling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Reenable pipeline register creationAlyssa Rosenzweig2019-07-221-10/+9
| | | | | | | This was disabled to permit regression-free RA work. Now that the spill code is in place, we can reenable, with some caveats about efficacy. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Report tls_sizeAlyssa Rosenzweig2019-07-224-0/+13
| | | | | | | | Pipe through the number of bytes of spilled memory used from the compiler into the main driver, where it will be used to allocate the Thread Local Storage buffer. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost: Set `initialized` in more casesAlyssa Rosenzweig2019-07-222-10/+9
| | | | | | | Indirect linear writes were not being marked as initialized, causing the back blit to be dropped, breaking the listed tests. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/ci: Update expectationsAlyssa Rosenzweig2019-07-221-4/+0
| | | | | | We've fixed some shader tests. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Promote to *move*, not rewrite for non-SSAAlyssa Rosenzweig2019-07-221-2/+9
| | | | | | Fixes promoted uniform loads to registers. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Dump MIR of RA failureAlyssa Rosenzweig2019-07-221-1/+3
| | | | Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard; Dump successor graph when printing MIRAlyssa Rosenzweig2019-07-221-2/+12
| | | | | | | We just use the pointers of the midgard_block*, which is crude, but it gets the point across and will help debug successor related issues. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Remove debug statementAlyssa Rosenzweig2019-07-221-2/+0
| | | | Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Implement register spillingAlyssa Rosenzweig2019-07-224-54/+158
| | | | | | | | | | | | Now that we run RA in a loop, before each iteration after a failed allocation we choose a spill node and spill it to Thread Local Storage using st_int4/ld_int4 instructions (for spills and fills respectively). This allows us to compile complex shaders that normally would not fit within the 16 work register limits, although it comes at a fairly steep performance penalty. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Add mir_has_arg helperAlyssa Rosenzweig2019-07-221-0/+12
| | | | | | Helps scan the MIR for uses of an index. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Check write-before-read in liveness analysisAlyssa Rosenzweig2019-07-221-0/+13
| | | | | | | | If we write to an index before reading it, the old copy we're checking liveness for isn't live in this block, even if it does get read later. Fixes abnormally high register pressure in shaders with loops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard/disasm: Check for certain tag errorsAlyssa Rosenzweig2019-07-221-0/+18
| | | | | | | | Midgard bundles contain a tag, as well as a copy of the tag of the next bundle to facilitate prefetch. Do some simple static analysis to detect certain tag errors (particularly on shaders without branching). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Add OP_IS_CSEL helperAlyssa Rosenzweig2019-07-221-0/+7
| | | | Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Add mir_rewrite_index_src_single helperAlyssa Rosenzweig2019-07-222-6/+13
| | | | | | | Rather than rewriting an index away across the whole block, we expose finer (per-instruction) granularity for rewrites. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Ignore inline_constant in livenessAlyssa Rosenzweig2019-07-221-0/+3
| | | | | | It doesn't make any sense to look at it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost/midgard: Implement load/store scratch opcodesAlyssa Rosenzweig2019-07-224-2/+52
| | | | | | | | These are used to load/store from Thread Local Storage, which is memory allocated per-thread (corresponding to ctx->scratchpad in the command stream) and used for register spilling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midg/disasm: Check for int varying opsAlyssa Rosenzweig2019-07-221-0/+4
| | | | Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Remove "aliasing"Alyssa Rosenzweig2019-07-222-96/+0
| | | | | | | It was a crazy idea that didn't pan out. We're better served by a good copyprop pass. It's also unused now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* panfrost: Promote uniform registers lateAlyssa Rosenzweig2019-07-226-82/+174
| | | | | | | | | Rather than creating either a load or a uniform register read with a fixed beginning offset, we always create a load and then promote to a uniform register later. This will allow us to promote in a register pressure aware manner. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Call scheduler/RA in a loopAlyssa Rosenzweig2019-07-223-13/+27
| | | | | | | | | | | | This will allow us to insert instructions as a result of register allocation, permitting spilling to be implemented. As a side effect, with the assert commented out this would fix a bunch of glamor crashes (due to RA failures) so MATE becomes useable. Ideally we'll have scheduling or RA actually sorted out before the branch point but if not this gives us a one-line out to get X working... Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* pan/midgard: Remove custom register selection callbackAlyssa Rosenzweig2019-07-221-19/+0
| | | | | | What we have is equivalent to the default callback; let's use that. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
* radv: fix crash in vkCmdClearAttachments with unused attachmentSamuel Pitoiset2019-07-221-1/+1
| | | | | | | | | | | depth_stencil_attachment and/or ds_resolve attachment can be NULL. This fixes crashes with dEQP-VK.renderpass.suballocation.unused_clear_attachments.* Cc: 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* i965: free object labels when deletingSergii Romantsov2019-07-223-0/+3
| | | | | | | | Some leaks detected with GL_KHR_debug on i965. CC: Timothy Arceri <t_arceri@yahoo.com.au> Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
* radv/gfx10: update descriptors for inline uniform blocksSamuel Pitoiset2019-07-221-3/+10
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv/gfx10: emit the GS NGG prologue before the nested barrierSamuel Pitoiset2019-07-221-6/+1
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv/gfx10: do not allocate space for the ZPASS_DONE bugSamuel Pitoiset2019-07-221-6/+8
| | | | | | | GFX10 isn't affected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv/gfx10: do not set ELEMENT_SIZE for buffer descriptorsSamuel Pitoiset2019-07-221-4/+4
| | | | | | | This field doesn't exist. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv: clean up fill_geom_tess_rings()Samuel Pitoiset2019-07-221-25/+9
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv: change a bunch of >= GFX9 to == GFX9Samuel Pitoiset2019-07-224-15/+15
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* ac/nir: do not clamp shadow reference on GFX10Samuel Pitoiset2019-07-221-2/+6
| | | | | | | | RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures on GFX10 and RADV doesn't promotes Z16 or Z24. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv: move nir_opt_conditional_discard out of optimization loopDaniel Schürmann2019-07-221-1/+1
| | | | | | | | This late optimization pass is only affected by nir_opt_if() and handles all cases in a single pass. It's enough to call it once after the optimization loop. No changes on vkpipeline-db. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* v3d: fill logicop_func in the fragment shader key when precompiling shadersIago Toral Quiroga2019-07-221-0/+2
| | | | | | | | | | | Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng of logic ops on precompiled shaders, which we don't want to do. Also, this had the side effect of making shader-db crash, as during this lowering we would try to read the color format swizzle information from the fragment shader key that we don't populate in precompiled shaders because right now we only need it when logic operations are enabled. Reviewed-by: Eric Anholt <eric@anholt.net>
* v3d: Avoid scheduling an instruction that stalls waiting for SFU retvalJose Maria Casanova Crespo2019-07-221-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen. The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority. Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle. Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations. total instructions in shared programs: 9102719 -> 9117851 (0.17%) instructions in affected programs: 4324628 -> 4339760 (0.35%) helped: 4162 HURT: 12128 helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1 helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51% HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1 HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68% 95% mean confidence interval for instructions value: 0.90 0.96 95% mean confidence interval for instructions %-change: 0.47% 0.50% Instructions are HURT. total max-temps in shared programs: 1327728 -> 1327812 (<.01%) max-temps in affected programs: 4730 -> 4814 (1.78%) helped: 61 HURT: 134 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26% 95% mean confidence interval for max-temps value: 0.28 0.58 95% mean confidence interval for max-temps %-change: 1.80% 3.52% Max-temps are HURT. total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%) sfu-stalls in affected programs: 95029 -> 31802 (-66.53%) helped: 25882 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2 helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00% 95% mean confidence interval for sfu-stalls value: -2.47 -2.42 95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54% Sfu-stalls are helped. total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%) inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%) helped: 22728 HURT: 855 helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92% HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86% 95% mean confidence interval for inst-and-stalls value: -2.07 -2.01 95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05% Inst-and-stalls are helped. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>
* v3d: add shader-db stat to count SFU stallsJose Maria Casanova Crespo2019-07-225-14/+74
| | | | | | | | | | | | | | SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>
* radv: replace memset()+strcpy() with snprintf()Eric Engestrom2019-07-211-3/+1
| | | | | | | Just like the next line :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv: drop unnecessary memset() before snprintf()Eric Engestrom2019-07-211-1/+0
| | | | | | | snprintf() always terminates the string. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
* radv: Fix uninitialized warning.Bas Nieuwenhuizen2019-07-201-1/+2
| | | | | | | | For es_vgpr_comp_cnt. Fixes: 795adbbadd4 "radv/gfx10: Add pipeline state support for tess." Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* virgl: fix a sync issue in virgl_buffer_transfer_extendChia-I Wu2019-07-201-62/+15
| | | | | | | | | | | | | | | | | | | | | | | | | In virgl_buffer_transfer_extend, when no flush is needed, it tries to extend a previously queued transfer instead if it can find one. Comparing to virgl_resource_transfer_prepare, it fails to check if the resource is busy. The existence of a previously queued transfer normally implies that the resource is not busy, maybe except for when the transfer is PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a lengthy comment, and potential concerns over breaking it as the transfer code evolves, this commit makes the valid_buffer_range check the only condition to take the fast path. In real world, we hit the fast path almost only because of the valid_buffer_range check. In micro benchmarks, the condition should always be true, otherwise the benchmarks are not very representative of meaningful workloads. I think this fix is justified. The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the fast path. This commit re-enables it as well. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
* virgl: rework virgl_transfer_queue_extendChia-I Wu2019-07-203-25/+24
| | | | | | | | Do not take a transfer and do the memcpy. Add a _buffer suffix to the function name to make it clear that it is only for buffers. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
* virgl: fix virgl_buffer_transfer_extendChia-I Wu2019-07-201-0/+1
| | | | | | | | Without setting hw_res, virgl_transfer_queue_extend never finds a match and always returns NULL. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
* radeonsi: initialize scissor registers etc. without clear stateMarek Olšák2019-07-201-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* radeonsi: return success from vi_dcc_clear_level to simplify callersMarek Olšák2019-07-203-28/+26
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* radeonsi: fix compute-based culling regression in 1ce52c1e373Marek Olšák2019-07-201-1/+1
| | | | | Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>