intel/nir: Don't try to emit vector load_scratch instructions

In 53bfcdeecf4c9, we added load/store_scratch instructions which deviate a little bit from most memory load/store instructions in that we can't use the normal untyped read/write instructions which can read and write up to a vec4 at a time. Instead, we have to use the DWORD scattered read/write instructions which are scalar. To handle this, we added code to brw_nir_lower_mem_access_bit_sizes to cause them to be scalarized. However, one case was missing: the load-as-larger-vector case. In this case, we take small bit-sized constant-offset loads replace it with a 32-bit load and shuffle the result around as needed. For scratch, this case is much trickier to get right because it often emits vec2 or wider which we would then have to lower again. We did this for other load and store ops because, for lower bit-sizes we have to scalarize thanks to the byte scattered read/write instructions being scalar. However, for scratch we're not losing as much because we can't vectorize 32-bit loads and stores either. It's easier to just disallow it whenever we have to scalarize. Fixes: 53bfcdeecf4c9 "intel/fs: Implement the new load/store_scratch..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6872>
author: Jason Ekstrand <jason@jlekstrand.net> 2020-09-24 16:28:56 -0500
committer: Marge Bot <eric+marge@anholt.net> 2020-10-08 03:56:01 +0000
commit: fd04f858b0aa9f688f5dfb041ccb706da96f862a (patch)
tree: 48314ae9838ebae61783a2fe2c7a8e682dd5d897 /src/intel/compiler
parent: 0a172dca264fe32bc0bb05d7383656762aa00cec (diff)
download: external_mesa3d-fd04f858b0aa9f688f5dfb041ccb706da96f862a.tar.gz
external_mesa3d-fd04f858b0aa9f688f5dfb041ccb706da96f862a.tar.bz2
external_mesa3d-fd04f858b0aa9f688f5dfb041ccb706da96f862a.zip
1 files changed, 4 insertions, 1 deletions
diff --git a/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c b/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c
index 4ea20fe5b18..aabf24150a6 100644
--- a/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c
+++ b/src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c
@@ -53,6 +53,9 @@ dup_mem_intrinsic(nir_builder *b, nir_intrinsic_instr *intrin,
    }
 
    dup->num_components = num_components;
+   if (intrin->intrinsic == nir_intrinsic_load_scratch ||
+       intrin->intrinsic == nir_intrinsic_store_scratch)
+      assert(num_components == 1);
 
    for (unsigned i = 0; i < info->num_indices; i++)
       dup->const_index[i] = intrin->const_index[i];
@@ -92,7 +95,7 @@ lower_mem_load_bit_size(nir_builder *b, nir_intrinsic_instr *intrin,
 
    nir_ssa_def *result;
    nir_src *offset_src = nir_get_io_offset_src(intrin);
-   if (bit_size < 32 && nir_src_is_const(*offset_src)) {
+   if (bit_size < 32 && !needs_scalar && nir_src_is_const(*offset_src)) {
       /* The offset is constant so we can use a 32-bit load and just shift it
        * around as needed.
        */
author	Jason Ekstrand <jason@jlekstrand.net>	2020-09-24 16:28:56 -0500
committer	Marge Bot <eric+marge@anholt.net>	2020-10-08 03:56:01 +0000
commit	fd04f858b0aa9f688f5dfb041ccb706da96f862a (patch)
tree	48314ae9838ebae61783a2fe2c7a8e682dd5d897 /src/intel/compiler
parent	0a172dca264fe32bc0bb05d7383656762aa00cec (diff)
download	external_mesa3d-fd04f858b0aa9f688f5dfb041ccb706da96f862a.tar.gz external_mesa3d-fd04f858b0aa9f688f5dfb041ccb706da96f862a.tar.bz2 external_mesa3d-fd04f858b0aa9f688f5dfb041ccb706da96f862a.zip