R9 270X since - commit f98a7d89be5d307c7a80fbde028a610f4377c3b9 Author: Marek Olšák <marek.olsak@amd.com> Date: Wed May 7 13:15:41 2014 +0200 radeonsi: enable ARB_sample_shading unigine valley run like - vblank_mode=0 MESA_GLSL_VERSION_OVERRIDE=330 MESA_GL_VERSION_OVERRIDE=3.3 ./valley will gpu lock then hard lock if I don't sysrq sub quickly enough Jun 4 21:59:31 ph4 kernel: radeon 0000:01:00.0: ring 3 stalled for more than 10003msec Jun 4 21:59:31 ph4 kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000d8a9 last fence id 0x000000000000d8a7 on ring 3) Jun 4 21:59:31 ph4 kernel: radeon 0000:01:00.0: failed to get a new IB (-35) Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: Saved 1677 dwords of commands on ring 0. Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GPU softreset: 0x0000004D Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0xF7D20028 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0xEC400000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0xEDC00000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x40000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00008006 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x80228647 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44483106 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Jun 4 21:59:32 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100100 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jun 4 21:59:33 ph4 kernel: [drm] probing gen 2 caps for device 1022:9603 = 300d02/0 Jun 4 21:59:33 ph4 kernel: [drm] PCIE gen 2 link speeds already enabled Jun 4 21:59:33 ph4 kernel: [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000). Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: WB enabled Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8800cc194c00 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8800cc194c04 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8800cc194c08 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8800cc194c0c Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8800cc194c10 Jun 4 21:59:33 ph4 kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc900105b5a18 Jun 4 21:59:33 ph4 kernel: [drm] ring test on 0 succeeded in 3 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 1 succeeded in 1 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 2 succeeded in 1 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 3 succeeded in 2 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 4 succeeded in 1 usecs Jun 4 21:59:33 ph4 kernel: [drm] ring test on 5 succeeded in 2 usecs Jun 4 21:59:33 ph4 kernel: [drm] UVD initialized successfully. Jun 4 21:59:43 ph4 kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10000msec Jun 4 21:59:43 ph4 kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000002dc5a last fence id 0x000000000002dc3f on ring 0) Jun 4 21:59:43 ph4 kernel: [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). Jun 4 21:59:43 ph4 kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). Jun 4 21:59:43 ph4 kernel: radeon 0000:01:00.0: ib ring test failed (-35). Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GPU softreset: 0x00000048 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0xA0003028 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00400002 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x84010243 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Jun 4 21:59:44 ph4 kernel: SysRq : Emergency Sync Jun 4 21:59:44 ph4 kernel: Emergency Sync complete Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS = 0x200400C0 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 Jun 4 21:59:44 ph4 kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Looks like it's failing to compile some (all?) fragment shaders: GLShader::loadFragment(): error in "core/shaders/default/sky/fragment_volume_ambient.shader" file defines: UNKNOWN,QUALITY_LOW,QUALITY_MEDIUM,QUALITY_HIGH,MULTISAMPLE_0,USE_INSTANCING,USE_GEOMETRY_SHADER,USE_TEXTURE_3D,USE_TEXTURE_ARRAY,USE_ALPHA_FADE,USE_REFLECTION,USE_OCCLUSION,HAS_DEFERRED_COLOR,HAS_DEFERRED_NORMAL,USE_RGB10A2,USE_ENVIRONMENT,USE_NORMALIZATION,USE_DIRECTIONAL_LIGHTMAPS,USE_SHADOW_KERNEL,OPENGL,HAS_ARB_DRAW_INSTANCED,HAS_ARB_TEXTURE_SNORM,SHADING_LANGUAGE=330,USE_ARB_BLEND_FUNC_EXTENDED,USE_ARB_SHADER_BIT_ENCODING,USE_ARB_SAMPLE_SHADING,,TURBULENCE 0:170(1): error: syntax error, unexpected EXTENSION, expecting $end ... and so on.
I have been experiencing the same problem both with Unigine Heaven and Unigine Valley since 2 June git version. I had not been able to identify the commit which was causing the problem, but given that mesa 10.3 git of 28 May works OK, while git of 2 June (and subsequent days git versions) does not, I presume the problem must be caused by the radeonsi related commits applied on 2 June. Particularly, the 'radeonsi: enable ARB_sample_shading' commit was applied on 2 June, so I presume Andy Furniss' guess is correct (not sure if he had bisected). I'm also getting the "UNKNOWN,QUALITY_LOW,QUALITY_MEDIUM,QUALITY_HIGH,MULTISAMPLE_0,USE_INSTANCING,USE_GEOMETRY_SHADER,USE_TEXTURE_3D,USE_TEXTURE_ARRAY,USE_ALPHA_FADE,USE_REFLECTION,USE_OCCLUSION,HAS_DEFERRED_COLOR,HAS_DEFERRED_NORMAL,USE_RGB10A2,USE_ENVIRONMENT,USE_NORMALIZATION,USE_DIRECTIONAL_LIGHTMAPS,USE_SHADOW_KERNEL,OPENGL,HAS_ARB_DRAW_INSTANCED,HAS_ARB_TEXTURE_SNORM,SHADING_LANGUAGE=330,USE_ARB_BLEND_FUNC_EXTENDED,USE_ARB_SHADER_BIT_ENCODING,USE_ARB_SAMPLE_SHADING,,TURBULENCE 0:170(1): error: syntax error, unexpected EXTENSION, expecting $end" kind of log on the console from which I launch heaven or valley. I'm using a Radeon HD 7870.
What happens if you set this environment variable? force_glsl_extensions_warn=true
(In reply to comment #3) > force_glsl_extensions_warn=true That's enabled by default for Heaven in /etc/drirc, but I just tried setting it explicitly just in case. Doesn't help.
The problem is Unigine don't know how to use GLSL, again. There is "#extension GL_ARB_sample_shading : enable" in the middle of (all?) shaders. This is not allowed by any GLSL specification. All #extension directives must occur before any non-preprocessor tokens, which pretty much means "at the beginning of shader code". What I see: Valley is loading. Then there is hang and it recovers successfully. After that, Valley seems to have exited. That's all.
If you only want to run the application and don't care about a fix, you can run with MESA_EXTENSION_OVERRIDE=-GL_ARB_sample_shading We should implement a driconf workaround for this.
(In reply to comment #6) > If you only want to run the application and don't care about a fix, you can > run with > > MESA_EXTENSION_OVERRIDE=-GL_ARB_sample_shading > > We should implement a driconf workaround for this. Thanks, that works and is also needed for heaven 4.0
(In reply to comment #5) > The problem is Unigine don't know how to use GLSL, again. > > There is "#extension GL_ARB_sample_shading : enable" in the middle of (all?) > shaders. This is not allowed by any GLSL specification. All #extension > directives must occur before any non-preprocessor tokens, which pretty much > means "at the beginning of shader code". > > What I see: Valley is loading. Then there is hang and it recovers > successfully. After that, Valley seems to have exited. That's all. It's repeatedly more serious than that for me - maybe because I am fullscreen? But anyway if I don't sysrq quickly enough when the monitor goes off I am in ext4 bitching about disk errors territory after I hard reset, so no waiting to see if the GPU reset works for me (which it never seems to do on SI - but then I haven't had this card for long). Heaven 4.0 is also affected, but I don't lock with that - it renders junk but I can quit OK, after that there is a 90% chance my display is mostly trash. fbcon is OK when I quit X, but restarting X will still result in trashed display.
The hangs are gone if I apply my workaround which fixes the compile failures.
(In reply to comment #9) > The hangs are gone if I apply my workaround which fixes the compile failures. If you mean - st/mesa, gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0 I hadn't tried, I assumed they would go in, and now it looks like the stuff in common has moved up a level. Checking patch src/gallium/state_trackers/dri/common/dri_context.c... error: src/gallium/state_trackers/dri/common/dri_context.c: No such file or directory
On a r7 260x this bug leads to a dead system and a reboot. From my pov its fine if the demo fails but its NOT fine if it brings down my box...
(In reply to comment #9) > The hangs are gone if I apply my workaround which fixes the compile failures. Working for me now the workaround is in. One nit WRT drirc, I don't know what the expected behavior is, but Mesa doesn't use the configured/installed location. So for me who configures --prefix=/usr and so gets the default --sysconfdir=PREFIX/etc drirc ends up in /usr/etc/ but doesn't get read from there by the same mesa - is that expected?
(In reply to comment #12) > (In reply to comment #9) > > The hangs are gone if I apply my workaround which fixes the compile failures. > > Working for me now the workaround is in. > > One nit WRT drirc, I don't know what the expected behavior is, but Mesa > doesn't use the configured/installed location. > > So for me who configures --prefix=/usr and so gets the default > --sysconfdir=PREFIX/etc drirc ends up in /usr/etc/ but doesn't get read from > there by the same mesa - is that expected? This is weird. It should have been installed in /etc.
Interesting.. My Bonaire XTX (R7 260X) is not affected by this bug. How is this possible? Cape Verde PRO (HD 7750) is affected and workaround from comment 6 fixes the problem. On both systems I have mesa-10.3 which contains the commit mentioned in comment 0.
Ah.. "st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0" is also included in mesa-10.3. So both Heaven 4.0 and Valley 1.0 should just work. The question is why ARB_sample_shading is causing GPU lockup on VERDE. Should I open a separate bug for this issue?
(In reply to Alexander Tsoy from comment #15) > Ah.. "st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley > 1.0" is also included in mesa-10.3. So both Heaven 4.0 and Valley 1.0 should > just work. The question is why ARB_sample_shading is causing GPU lockup on > VERDE. Should I open a separate bug for this issue? Maybe check that the your /etc/drirc has the workaround and/or if you have a .drirc in your home dir that it has it also, though I haven't tested if having a .drirc under $HOME without the workaround overrides one in /etc with it.
(In reply to Andy Furniss from comment #16) drirc was the first thing I checked. I filed a new bug 84836.
(In reply to Alexander Tsoy from comment #15) > Ah.. "st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley > 1.0" is also included in mesa-10.3. So both Heaven 4.0 and Valley 1.0 should > just work. The question is why ARB_sample_shading is causing GPU lockup on > VERDE. Should I open a separate bug for this issue? I can re-test VERDE when I get home. I haven't investigated why the hw hangs, because it only happens if shader compilation fails and so none of the sample_shading shader stuff makes it to the driver. I think the likely cause is that Unigine attempted to do rendering with a shader that hasn't actually been compiled and things went wrong after that.
Resolving per comment #12.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.