Summary: | [bisected] [SI Scheduler] Graphical corruption in Dota 2 | ||
---|---|---|---|
Product: | Mesa | Reporter: | Nick Sarnie <sarnex> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | CC: | daniel, tstellar |
Version: | git | ||
Hardware: | All | ||
OS: | All | ||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=88561 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
patch to disable the machine scheduler for SI
R600_DEBUG=ps,vs,gs output for the Talos trace with r227460 (no lockups) R600_DEBUG=ps,vs,gs output for the Talos trace with r227461 (lockups) dmesg log sdiff output for suspected bad shader |
Description
Nick Sarnie
2015-02-05 03:34:17 UTC
Which card is used? I already saw those artifacts in bug 88758 and tried to reproduce it on Kabini with two apitraces i have, but i can't. Those are from bug 67887 and bug 88301. Can you also reproduce artifacts with any of those two? Or made new one, might be some recent game update only show an issue or it arise only with particular settings. Hi smoki. I am on a Radeon HD 7950(TAHITI). If I try the trace from https://bugs.freedesktop.org/show_bug.cgi?id=67887, I DO see the same graphical glitches that I get. Here an apitrace I just took: https://idontevenlift.no-ip.org/sarnex_dota_linux.trace.xz The other apitrace is too large, I'll try it tomorrow. Thanks, sarnex OK, thanks for trace, but i can't reproduce it either with your trace on Kabini. So as it happen on R7 265 and HD 7950, i guess this is likely GCN 1.0 only bug. (In reply to sarnex from comment #0) > Hi guys. If I use LLVM git, I get these graphical glitches in Dota 2 native. > > https://i.imgur.com/I4vyWFt.jpg > > The bug has been bisected to LLVM: 51a3c27d6e0c66cc8d2d1da8e9205fec7b74ca5c > R600/SI: Define a schedule model and enable the generic machine scheduler > > > I'm using Mesa git, Kernel 3.18.5 and Linux Mint. > > Thanks alot, > > sarnex Can you run the game with the environment variable: R600_DEBUG=ps,vs,gs and post the output. (In reply to Tom Stellard from comment #4) > (In reply to sarnex from comment #0) > > Hi guys. If I use LLVM git, I get these graphical glitches in Dota 2 native. > > > > https://i.imgur.com/I4vyWFt.jpg > > > > The bug has been bisected to LLVM: 51a3c27d6e0c66cc8d2d1da8e9205fec7b74ca5c > > R600/SI: Define a schedule model and enable the generic machine scheduler > > > > > > I'm using Mesa git, Kernel 3.18.5 and Linux Mint. > > > > Thanks alot, > > > > sarnex > > Can you run the game with the environment variable: > R600_DEBUG=ps,vs,gs and post the output. Hi Tom, thanks for replying. The log is here, since it's too big to be an attachment. Skip to near the end to see the in-game bugged time, the beginning is mostly the menu. Log: http://paste.ubuntu.com/10075950/ (In reply to sarnex from comment #5) > (In reply to Tom Stellard from comment #4) > > (In reply to sarnex from comment #0) > > > Hi guys. If I use LLVM git, I get these graphical glitches in Dota 2 native. > > > > > > https://i.imgur.com/I4vyWFt.jpg > > > > > > The bug has been bisected to LLVM: 51a3c27d6e0c66cc8d2d1da8e9205fec7b74ca5c > > > R600/SI: Define a schedule model and enable the generic machine scheduler > > > > > > > > > I'm using Mesa git, Kernel 3.18.5 and Linux Mint. > > > > > > Thanks alot, > > > > > > sarnex > > > > Can you run the game with the environment variable: > > R600_DEBUG=ps,vs,gs and post the output. > > Hi Tom, thanks for replying. > > The log is here, since it's too big to be an attachment. Skip to near the > end to see the in-game bugged time, the beginning is mostly the menu. > > Log: http://paste.ubuntu.com/10075950/ Thanks, would you also be able to get a dump using the last good commit. (In reply to Tom Stellard from comment #6) > (In reply to sarnex from comment #5) > > (In reply to Tom Stellard from comment #4) > > > (In reply to sarnex from comment #0) > > > > Hi guys. If I use LLVM git, I get these graphical glitches in Dota 2 native. > > > > > > > > https://i.imgur.com/I4vyWFt.jpg > > > > > > > > The bug has been bisected to LLVM: 51a3c27d6e0c66cc8d2d1da8e9205fec7b74ca5c > > > > R600/SI: Define a schedule model and enable the generic machine scheduler > > > > > > > > > > > > I'm using Mesa git, Kernel 3.18.5 and Linux Mint. > > > > > > > > Thanks alot, > > > > > > > > sarnex > > > > > > Can you run the game with the environment variable: > > > R600_DEBUG=ps,vs,gs and post the output. > > > > Hi Tom, thanks for replying. > > > > The log is here, since it's too big to be an attachment. Skip to near the > > end to see the in-game bugged time, the beginning is mostly the menu. > > > > Log: http://paste.ubuntu.com/10075950/ > > Thanks, would you also be able to get a dump using the last good commit. Hi Tom, Here is the log from the commit directly before R600/SI: Define a schedule model and enable the generic machine scheduler, and it has no graphical issues Log: http://paste.ubuntu.com/10143733/ Thanks again, sarnex The Mesa patch from bug 88561 comment 6 fixes this for me - at least the glitches with the posted apitrace. (In reply to Daniel Scharrer from comment #8) > The Mesa patch from bug 88561 comment 6 fixes this for me - at least the > glitches with the posted apitrace. Hi Daniel, Thanks for the information. The patch from Marek significantly reduces the number of artifacts in Dota 2, but it does not completely fix the issue and I still see a few artifacts per second. It seems that this bug and the Portal bug are related, but there is still an underlying bug somewhere. Thanks, sarnex This issue is still present on LLVM git and Mesa git, although the frequency of the corruption is significantly lowered with Marek's patch from https://bugs.freedesktop.org/show_bug.cgi?id=88561#c6 Created attachment 115995 [details] [review] patch to disable the machine scheduler for SI I can confirm that these these glitches are still present on current LLVM + Mesa git with a 7950 (TAHITI). Glitches happen in various games with different engines (Source, Unity, …). Here is a trace of The Talos Principle (first posted in bug #88561 comment 9), that still produces more than just occasional glitches (even with Marek's patch): http://constexpr.org/tmp/Talos-radeonsi.3.trace.xz (147 MiB) Like sarnex, I have bisected this to LLVM 51a3c27d6e0c66cc8d2d1da8e9205fec7b74ca5c (r227461). I had to revert b8797a7 and a99a16a in current Mesa git for it to build against that LLVM revision. Some Source engine games (L4D2, Nuclear Dawn, maybe others) don't just produce graphical glitches but also frequently lock up the GPU since a later change to the machine scheduler (r233366) - see bug #90378. Disabling the machine scheduler for SI on current LLVM (see attached patch) also fixes both the lockups an graphical glitches. Additionally, using R600_DEBUG=switch_on_eop with unpatched LLVM also works around both the graphical glitches and and GPU lockups. Created attachment 115996 [details]
R600_DEBUG=ps,vs,gs output for the Talos trace with r227460 (no lockups)
Created attachment 115997 [details]
R600_DEBUG=ps,vs,gs output for the Talos trace with r227461 (lockups)
Can you post your dmesg log too? Created attachment 116176 [details] dmesg log Here is the dmesg log with Linux 4.0.4-gentoo and LLVM patched to disable the machine scheduler for SI, after replaying both sarnex' and my trace. I don't have an unpatched LLVM build right now, but don't remember the dmesg output being different. The log is compressed because there are lots of GPU faults at the end (bug #87278) which pushed the uncompressed log over the attachment size limit - not sure if you wanted those or just the startup part. Bug #90378 has the dmesg output for L4D2 including GPU lockups with an unpatched (but older revision on) LLVM and 4.0.1-gentoo in attachment 115653 [details]. (In reply to Daniel Scharrer from comment #15) > Created attachment 116176 [details] > dmesg log > > Here is the dmesg log with Linux 4.0.4-gentoo and LLVM patched to disable > the machine scheduler for SI, after replaying both sarnex' and my trace. I > don't have an unpatched LLVM build right now, but don't remember the dmesg > output being different. > > The log is compressed because there are lots of GPU faults at the end (bug > #87278) which pushed the uncompressed log over the attachment size limit - > not sure if you wanted those or just the startup part. > > Bug #90378 has the dmesg output for L4D2 including GPU lockups with an > unpatched (but older revision on) LLVM and 4.0.1-gentoo in attachment 115653 [details] > [details]. Would you be able to post an API trace of one of the games that is locking up? (In reply to Tom Stellard from comment #16) > (In reply to Daniel Scharrer from comment #15) > > Created attachment 116176 [details] > > dmesg log > > > > Here is the dmesg log with Linux 4.0.4-gentoo and LLVM patched to disable > > the machine scheduler for SI, after replaying both sarnex' and my trace. I > > don't have an unpatched LLVM build right now, but don't remember the dmesg > > output being different. > > > > The log is compressed because there are lots of GPU faults at the end (bug > > #87278) which pushed the uncompressed log over the attachment size limit - > > not sure if you wanted those or just the startup part. > > > > Bug #90378 has the dmesg output for L4D2 including GPU lockups with an > > unpatched (but older revision on) LLVM and 4.0.1-gentoo in attachment 115653 [details] > > [details]. > > Would you be able to post an API trace of one of the games that is locking > up? Nevermind, I think the one you posted already should be enough. Created attachment 116227 [details]
sdiff output for suspected bad shader
Here is a dump from sdiff of a good shader with no GPU protection faults (left side) and a bad shader that causes GPU protection faults (right side). Search for the pipe character '|' to find the only difference between the two shaders.
I'm not sure yet why this difference would lead to GPU protection faults.
I cannot reproduce this on Mesa master, it must have been fixed with radeonsi: completely rework updating descriptors without CP DMA Resolving per comment 19, thanks for the update. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.