Bug 105371

Summary: r600_shader_from_tgsi - GPR limit exceeded - shader requires 360 registers
Product: Mesa Reporter: Gert Wollny <gw.fossdev>
Component: Drivers/Gallium/r600Assignee: mesa-dev
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: major    
Priority: high CC: mirh
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
URL: https://www.shadertoy.com/view/Xs2fWD
Whiteboard:
i915 platform: i915 features:

Description Gert Wollny 2018-03-06 17:49:04 UTC
The shader fails because it uses an excessive number of arrays and temporary registers, so that even the spill code that landed recently can't handle it. 

Applying 

 https://patchwork.freedesktop.org/series/37991/ and 
 https://patchwork.freedesktop.org/series/39471/

fixes the bug.
Comment 1 mirh 2018-03-28 21:24:34 UTC
Can confirm it fixes shader 2 and 5 of GraphicsFuzz demo 
http://www.graphicsfuzz.com/benchmark/android-v1.html

Should I wait for this (or, I dunno, some day sw fp64) to land before reporting of the others "gcm_sched_late_pass: unscheduled ops" errors?
Comment 2 Guenther Sohler 2018-08-06 19:18:37 UTC
Hi Gert Wollny,

I see exactly same bug in a software called "curv" .

So I tried to use exact those 2 patches you provided here.
I did

* git clone MESA repository
* git fetch --all
* git am TGSI-split-merge-and-interleave-arrays.mbox
< here come problems, tried to manually patch the sourcecode> and using git am --continue
* git am mesa-st-glsl_to_tgsi-Properly-resolve-life-times-for-simple-if-else-use-constructs.mbox
< went without a problem>


However, when i compile MESA then and using with curv, the error is not fixed.
Also, when looking into the patched MESA code, it appears, only a random selected set of patches is applied. So it appears i have quite some difficulties to use the patch.

Is this patch already commited into the MESA repo ?
Or is it possible to get a copy of your patched source code ?

Thank you and best regards Guenther
Comment 3 mirh 2018-08-06 22:13:27 UTC
This is the newest one
https://patchwork.freedesktop.org/series/44315/

curl 'https://patchwork.freedesktop.org/project/mesa/pwclientrc/' | sed '7,8s/^#//g' > ~/.pwclientrc && pwclient list -w "Gert Wollny" -s New v4 -f %{id} | egrep '227' | xargs pwclient git-am

works just fine for me. 
If it still isn't the case for you, check https://github.com/gerddie/mesa/tree/allfixes
Comment 4 Gert Wollny 2018-08-07 08:32:23 UTC
The patch is not yet committed, I'll try a bit harder for the 18.3 release cycle. 

Actually, with the current mesa version you should have the spilling code in place, and if you see this error then the shader either uses many registers that are not organized as arrays, so they will not be spilled and my patch will not help, or you may have encountered a case where the shader uses exactly the amount of registers that falls in the gap between failing and forcing spilling. I encountered something like this with the latest Unreal editor and a certain level design. I plan to send out a patch to deal with this too, but haven't gotten around this. 

Essentially in r600_shader.c:3550 (more or less) you have 

   if (regno > 124) {
	choose_spill_arrays(&ctx, &regno, &pipeshader->scratch_space_needed);
	shader->indirect_files = ctx.info.indirect_files;
  }

but the shader needs around 10-20 registers more then regno, so the limit should be more like 100. That said the spilling has its own problems, there is still something going wrong with the synchronization I think. 

Apart from that: 
Note that when you apply the mbox, the first "patch" is the cover letter and this you have to skip with "git am --skip", after that the latest version of the series should apply cleanly (thanks @mirth for pointing out how to do this with pwclient and where the github tree is)

hope that helps,
Comment 5 mirh 2018-08-07 15:59:38 UTC
(In reply to mirh from comment #1)
> Can confirm it fixes shader 2 and 5 of GraphicsFuzz demo 
> http://www.graphicsfuzz.com/benchmark/android-v1.html
> 
> Should I wait for this (or, I dunno, some day sw fp64) to land before
> reporting of the others "gcm_sched_late_pass: unscheduled ops" errors?

Well, colour me shocked, but after building mesa-git with the last patch series all the tests now pass. 

Which is quite remarkable considering not even latest GCN closed drivers are compliant.
Comment 6 Gert Wollny 2018-08-11 11:36:39 UTC
I've pushed the series, so this might be fixed (although I've seen new piglits that fail with the same error also with this array split patch series applied.
Comment 7 amonpaike 2018-08-12 20:53:29 UTC
(In reply to Gert Wollny from comment #6)
> I've pushed the series, so this might be fixed (although I've seen new
> piglits that fail with the same error also with this array split patch
> series applied.

hi, I wanted to tell you that with these last patches many of the problems in the new blender 2.8 I had announced here https://bugs.freedesktop.org/show_bug.cgi?id=107454
they have been solved, but some have remained in particular the subsurface scattering translucency and the use of multiple shader mixes ...

blender 2.8 is in the alpha phase but the advanced management of the shaders is stable enough you could try yourself to download the latest build here along with some demos scroll down the page at this link https://www.blender.org/2-8/ will find both the link to the download of the last builds and the demo files.
the wasp_bot demo for example, it is one of those that shows problems on the surface

I also believe that these demos can give you some indications on the performances

the gpu I tested was a radeon hd 7670m

thanks for the great work
Comment 8 amonpaike 2018-08-12 21:22:42 UTC
I also forgot.... the ESM shadow (exponential shadow mapping) do not work, while the VSM (variance shadow mapping) work 
I hope you have a little knowledge of blender, they are settable in the render panel (camera icon)
Comment 9 Gert Wollny 2018-08-13 07:25:16 UTC
@amonpaike thanks for pointing out the problems with blender, yes did a bit of playing around with it, so I hope I'll be able to get some insight of what might go wrong. However, there are some limits on how the r600 drivers were written that might make it difficult to fix all issues.
Comment 10 Gert Wollny 2018-08-13 08:02:20 UTC
I had a look at VSM versus ESM, and I see a difference that would indicate an error in the ESM shader (be it in blender or the drivers), but there is no specific output (like given in this bug) that would indicate where the problem lies. 

If you have an example that gives this output like given above, then please add the steps to reproduce this view in blender. 

For all other failures please add such comments to the other bug report: https://bugs.freedesktop.org/show_bug.cgi?id=107454

You can also try to create a rendering trace by using apitrace: 

  https://github.com/apitrace/apitrace

run "apitrace trace blender"), get to the point where things go wrong in blender, and after it has made visible you can just close blender and then try to attach the resulting blender.trace file to the bug. 

It is best to open a new bug for each issue. 

Best, 
Gert
Comment 11 MWATTT 2018-08-15 16:45:34 UTC
Hello,

Many thanks for this series. It fix a lot of bugs. Animated leaders in Civ5 and Civ6 now works fine. It also solves a lot of Minecraft shaderpack's issues. Dolphin-emu's ubershader still however cause problems (160 registers). I will create a bug report with an apitrace for that.
Comment 12 amonpaike 2018-08-16 16:05:29 UTC
 
> run "apitrace trace blender"), get to the point where things go wrong in
> blender, and after it has made visible you can just close blender and then
> try to attach the resulting blender.trace file to the bug. 
> 
> It is best to open a new bug for each issue. 
> 
> Best, 
> Gert


I created for you a video and the apitrace of two of the scenes where bugs appear (no shadow ESM) and (transmission on principled BSDF shader)

my video card from glxinfo is AMD TURKS (radeon hd 7670m 2gb videoram) 
(for information, the two scenes work perfectly on the other gpu Mesa DRI Intel(R) Ivybridge Mobile -intel hd 4000) 

in the compressed file there are the two apitrace and the two blender scenes

https://youtu.be/8p_mU_EPNoo
https://drive.google.com/open?id=1vBJphv68fpdZhdxNu-OvicdTRb6rBiIy


the version of mesa I tried is not the last one, but the one when I reported you progress on the bug (padoka ppa not updated)
bender is the last night build
Comment 13 amonpaike 2018-08-17 10:37:30 UTC
today the repository padoka ppa has updated the drivers so I could test blender 2.8 ... all the problems are now solved! (Except the shadow ESMs that probably depend on blender, I will present to the blender developers of the problem)
Great job Thank you !!
Comment 14 Gert Wollny 2018-08-17 11:00:57 UTC
If by ESM error you refer to the overgright light blue artifact in the SPACE RACESHIP scene, then is is a driver problem. If you run 

    R600_DEBUG=nosb blender 

it should go away. Unfortunately, things might run a bit slower like this, but this optimizer (sb) is not completely reliable, so disabling it might be the way to go. 

Using apitrace I've even be able to pinpoint one of the shaders where the optimizer creates wrong code, unfortunately is is a shader that amounts to 15k machine code instructions, and to find a the bug there will be very difficult.
Comment 15 amonpaike 2018-08-20 15:58:00 UTC
(In reply to Gert Wollny from comment #14)
> If by ESM error you refer to the overgright light blue artifact in the SPACE
> RACESHIP scene, then is is a driver problem. If you run 
> 
>     R600_DEBUG=nosb blender 
> 

I've tested your suggestion, and actually the ESM shadows now work. Also other less noticeable artifacts have disappeared ..

note that disabling this shader compiler the performance drops, I hope you can find this bug.

thank you very much for your work
Comment 16 mirh 2018-08-25 22:37:25 UTC
(In reply to mirh from comment #5)
> (In reply to mirh from comment #1)
> > Can confirm it fixes shader 2 and 5 of GraphicsFuzz demo 
> > http://www.graphicsfuzz.com/benchmark/android-v1.html
> > 
> > Should I wait for this (or, I dunno, some day sw fp64) to land before
> > reporting of the others "gcm_sched_late_pass: unscheduled ops" errors?
> 
> Well, colour me shocked, but after building mesa-git with the last patch
> series all the tests now pass. 
> 
> Which is quite remarkable considering not even latest GCN closed drivers are
> compliant.

This with firefox though. 
Just noticed chromium 68 is still getting those errors for SETGT_DX10 and MULADD_IEEE, and indeed fails six tests. 

Nosb fixes it.
Comment 17 Emil Velikov 2018-12-07 16:24:06 UTC
Gert, you've tagged this issue in the following commit. Is the problem fully resolved or there's more work needed?

commit d8c2119f9b0b257a23ceb398f6d0d78da916417e
Author: Gert Wollny <gw.fossdev@gmail.com>
Date:   Tue Jun 5 22:26:47 2018 +0200

    mesa/st/glsl_to_tgsi: Expose array live range tracking and merging
Comment 18 amonpaike 2018-12-31 06:00:22 UTC
Radeon HD 7600, MESA 19-devel, Linux Gallium r600, Light and Shadows Broken

Blender 2.8 beta https://builder.blender.org/download/ 
Demofiles here At the bottom of the page https://www.blender.org/2-8/

someone who has the skills can take a look at what happens here..
(new bug reported at developer.blender.org)

https://developer.blender.org/T60001

Thanks!
Comment 19 amonpaike 2019-01-11 21:38:19 UTC
is not there anyone who can give a view to what happens in what I reported in the previous comment?

are these GPUs really old enough to be abandoned?
on windows blender on these gpu runs that is a beauty ...
it's a pity that on linux they stay so far back ... both in performance and in graphic rendering ...
the intel gpu on the same pc much lower has better performance and graphics performance with the mesa drivers intel ..
I'm forced to use windows for this unique reason...
:(((
Comment 20 mirh 2019-01-11 21:51:11 UTC
A couple of devs are working into reinventing the wheel so that you could basically have r600 cards work and be supported almost like they had been released in 2018 (well, sans vulkan)

And you have been already provided with an explanation of why this is kinda difficult to debug, and with a workaround. 
So please, just wait.
Comment 21 amonpaike 2019-01-22 16:43:08 UTC
(In reply to mirh from comment #20)
> A couple of devs are working into reinventing the wheel so that you could
> basically have r600 cards work and be supported almost like they had been
> released in 2018 (well, sans vulkan)
> 
> And you have been already provided with an explanation of why this is kinda
> difficult to debug, and with a workaround. 
> So please, just wait.

sorry if I asked for solicitation, it was because nobody answered, I did not know about this ..
so is someone rewriting these drivers?

Is there a new branch or something?
Comment 22 amonpaike 2019-02-01 15:52:20 UTC
good and bad news...

I've just tested the latest blender build with the latest mesa-devel (February 1 2019) driver..

the standard settings situation remained predominantly the same.

the good news is that if you launch blender with these parameters:
"DRI_PRIME=1 R600_DEBUG=nosb ./blender" 
with the shader backend acceleration off, blender with eevee works perfectly again..

the bad news is that the performance goes fuckoff ..

so the problem is of the gallium drivers in particular of the shader compiler with the lights and shadows of the realtime blender eevee rendering engine... 
:((
Comment 23 GitLab Migration User 2019-09-18 19:25:25 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/633.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.