Created attachment 144082 [details] Minimal SDL2 program that reproduces the problem I filed this for i965, but the issue is in the shader compiler, so it also affects the Vulkan driver and, presumably, Iris. The problem only happens when INTEL_DEBUG=do32 is used, in which case SDL2 generates a shader that hits the following assertion: ../src/intel/compiler/brw_fs.cpp:1745: void fs_visitor::assign_urb_setup(): Assertion `inst->src[i].offset < REG_SIZE / 2' failed. I bisected the regression to this commit: commit c0504569eac5e5c305e9f0c240e248aca9d8891f (HEAD -> master) Author: Rafael Antognolli <rafael.antognolli@intel.com> Date: Fri Oct 19 15:44:15 2018 -0700 intel/fs: Move the scalar-region conversion to the generator. Move the scalar-region conversion from the IR to the generator, so it doesn't affect the Gen11 path. We need the non-scalar regioning for a later lowering pass that we are adding. v2: Better commit message (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com> I am attaching a minimal SDL2 sample program that reproduces the problem.
*** Bug 110905 has been marked as a duplicate of this bug. ***
(In reply to Iago Toral from comment #0) > I am attaching a minimal SDL2 sample program that reproduces the problem. This is enough to reproduce: INTEL_DEBUG=do32 glxgears
Valgrind doesn't give any additional info, just abort backtrace (same as I got with systemd-coredump from Weston when enabling SIMD32 heuristics): ==5950== Process terminating with default action of signal 6 (SIGABRT): dumping core ==5950== at 0x48CE6A5: raise (in /usr/lib64/libc-2.29.so) ==5950== by 0x48AF860: abort (in /usr/lib64/libc-2.29.so) ==5950== by 0x48AF736: ??? (in /usr/lib64/libc-2.29.so) ==5950== by 0x48BF545: __assert_fail (in /usr/lib64/libc-2.29.so) ==5950== by 0x6BEBC7E: fs_visitor::assign_urb_setup() (brw_fs.cpp:1745) ==5950== by 0x6BFB847: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:7664) ==5950== by 0x6BFD659: brw_compile_fs (brw_fs.cpp:8039) ==5950== by 0x66ED3FB: brw_codegen_wm_prog (brw_wm.c:123) ==5950== by 0x66EE44F: brw_fs_precompile (brw_wm.c:590) ==5950== by 0x66DADD3: brw_shader_precompile (brw_link.cpp:56) ==5950== by 0x66DADD3: brw_link_shader (brw_link.cpp:374) ==5950== by 0x68E9798: _mesa_glsl_link_shader (ir_to_mesa.cpp:3170) ==5950== by 0x6910023: create_new_program (ff_fragment_shader.cpp:1131) ==5950== by 0x6910023: _mesa_get_fixed_func_fragment_program (ff_fragment_shader.cpp:1161)
Curro and/or Jason had a fix for this.
Typed this up and tested it w/ glxgears: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1839
Hi guys Just tested the issue with attached test and glxgears - both tests looks good, no core dumps with applied patch.
(In reply to Jason Ekstrand from comment #5) > Typed this up and tested it w/ glxgears: > > https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1839 Yes, I had a more complete branch with SIMD32 fixes, but go ahead and duplicate the rest of my work if you don't have anything else to do.
This will not affect users unless they are overriding the simd32 configuration for fragment shaders. During our engineering meeting, we decided that this issue should not block the release.
(In reply to Mark Janes from comment #8) > This will not affect users unless they are overriding the simd32 > configuration for fragment shaders. During our engineering meeting, we > decided that this issue should not block the release. Should there be e.g. mesa-19.3 tracker where regressions that don't block current release would get moved? (to make sure they don't fall between the cracks later on)
Fixed by the following commit in master: commit d15fe8ca8262d502435c4f83985ac414f950bc5f Author: Jason Ekstrand <jason@jlekstrand.net> Date: Sun Sep 1 22:12:07 2019 -0500 Revert "intel/fs: Move the scalar-region conversion to the generator." This reverts commit c0504569eac5e5c305e9f0c240e248aca9d8891f. Now that we're doing interpolation lowering in NIR, we can continue to stride the FS input registers directly in the brw_fs_nir code like we did before. This fixes SIMD32 fragment shaders which broke because lower_simd_width depended on the 0 stride to split PLN instructions correctly. Reviewed-by: Francisco Jerez <currojerez@riseup.net> I didn't CC stable because nothing currently requires SIMD32 FS there.
Tested with Iris. Works fine without any asserts -> VERIFIED Btw. With a simple patch to enable SIMD32 in most cases where it could make sense: ------------------------------------------------ - /* Currently, the compiler only supports SIMD32 on SNB+ */ - if (v8.max_dispatch_width >= 32 && !use_rep_send && + /* Currently, the compiler only supports SIMD32 on SNB+ + * + * Compile SIMD32 only if: + * - SIMD16 didn't fail (there are enough regs) + * - GEN6+ & 32-wide dispatch supported + * - there's only single RT (with MRT, perf is lower due to RCC write trashing) + * - or it's force enabled + */ + if (!simd16_failed && + v8.max_dispatch_width >= 32 && compiler->devinfo->gen >= 6 && - unlikely(INTEL_DEBUG & DEBUG_DO32)) { + !use_rep_send && + (shader->info.outputs_written <= BITFIELD64_BIT(FRAG_RESULT_DATA0) || + unlikely(INTEL_DEBUG & DEBUG_DO32))) { ------------------------------------------------ SIMD32 has approximately following perf impact with Iris... On BXT J4205 (with 2-channel memory): * +30-35% GfxBench ALU2 * +25% SynMark PSPom * +5-10% GfxBench Manhattan 3.0, SynMark Deferred & ShMapVsm, GpuTest Julia FP32 * +5% GfxBench CarChase & T-Rex * +4% SynMark PSPhong * +2-3% GfxBench Manhattan 3.1, GpuTest FurMark ... * -5-10% SynMark TexFilterTri [1] * -20% SynMark DrvShComp On SKL GT2: * +20% GfxBench ALU2 * +7-8% SynMark PSPom * +3-4% GfxBench CarChase & Manhattan 3.0 * +2% GfxBench T-Rex, SynMark PSPhong & Deferred ... * -3% SynMark PSBump2 * -5% SynMark DrvRes * -10-15% SynMark TexFilterTri [1], DrvShComp [1] These TexFilterTri drops are clear regressions compared to earlier (1-2 years ago) measured impact of SIMD32 vs. SIMD16. (And I'm a bit surprised that shader compilation speed test didn't regress more from compiling additional SIMD32 variant.)
Creating a 19.3 tracker before the branch point would generate a lot of bug spam. With that mechanism, every new regression would be tracked, even when the vast majority would soon get fixed. We use the following mechanisms to fill out the tracker after the branch point: - unfixed bugs with "bisected" or "regression" keywords - automated tests that changed status since the last release - developers and end users reviewing bugs
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.