Summary: | R9270X pyrit benchmark perf regressions with latest kernel/llvm | ||
---|---|---|---|
Product: | Mesa | Reporter: | Andy Furniss <adf.lists> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | b747xx, haagch |
Version: | git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
good
bad Flush HDP cache via the ring on SI Only flush HDP cache for indirect buffers from userspace drm/ttm: move fpfn and lpfn into each placement drm/radeon: Add RADEON_GEM_CPU_ACCESS BO creation flag r600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU valley worse pausing with stream buffer change valley better with stream buffer change reverted valley vanilla mesa bad num bytes moved valley better with revert num bytes moved Elemental screen showing vram usage |
Description
Andy Furniss
2014-08-02 12:04:50 UTC
Can you bisect? There was a recent change to LLVM which increased conformance with OpenCL floating point semantics at some performance cost. That might explain at least some of the difference. (In reply to comment #2) > There was a recent change to LLVM which increased conformance with OpenCL > floating point semantics at some performance cost. That might explain at > least some of the difference. Pyrit doesn't use any floating-point operations, so this shouldn't be an issue. I bisected LLVM and it came up with - ph4[llvm]$ git bisect good ee17bf3fd4189d1981a6e908b4519e600ec7b002 is the first bad commit commit ee17bf3fd4189d1981a6e908b4519e600ec7b002 Author: Matt Arsenault <Matthew.Arsenault@amd.com> Date: Fri Jul 25 23:02:42 2014 +0000 R600/SI: Allow partial unrolling and increase thresholds. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213985 91177308-0d34-0410-b5e6-96231b3b80d8 I don't know when I'll get to do kernel yet. Can you post the output of R600_DEBUG=cs from both the "good" and "bad" commits? Created attachment 104082 [details]
good
Created attachment 104083 [details]
bad
kernel - fb240a2534802a86742db51b7334138675bc435e is the first bad commit commit fb240a2534802a86742db51b7334138675bc435e Author: Michel Dänzer <michel.daenzer@amd.com> Date: Thu Jul 31 18:43:49 2014 +0900 drm/radeon: Always flush the HDP cache before submitting a CS to the GPU This ensures the GPU sees all previous CPU writes to VRAM, which makes it safe: * For userspace to stream data from CPU to GPU via VRAM instead of GTT * For IBs to be stored in VRAM instead of GTT * For ring buffers to be stored in VRAM instead of GTT, if the HPD flush is performed via MMIO Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Created attachment 104475 [details] [review] Flush HDP cache via the ring on SI Does this patch help for the kernel regression? Though this seems to make some piglit test results unstable... (In reply to comment #9) > Created attachment 104475 [details] [review] [review] > Flush HDP cache via the ring on SI > > Does this patch help for the kernel regression? > > Though this seems to make some piglit test results unstable... There is no difference with this. FWIW I found another regression with this kernel that is caused by the same commit. Maybe regression is the wrong word, as there was already an issue, just it's worse now. I will file a separate bug in time (was planning to do new xorg and retest first) but in summary - Unigine Valley always did have some 1/2 to 1 sec pauses ever since I could run it, first on HD4890 and now radeonsi R9 270X. Since this kernel commit they are 2 to 4 times longer - also unchanged by patch. Strangely, if I use ffmpegs x11 grab to make a recording @ 30fps they become short again. Created attachment 104549 [details]
Only flush HDP cache for indirect buffers from userspace
Does this patch help?
(In reply to comment #11) > Created attachment 104549 [details] > Only flush HDP cache for indirect buffers from userspace > > Does this patch help? No, I'm afraid that doesn't help either. Valley is the same - pyrit only slightly different, probably within random variation. I am testing with "bad" llvm so the numbers are all low. As I recorded them here's a paste of pyrit good (kernel), head, patch 1 and patch 2 On good - Running benchmark (57982.3 PMKs/s)... / Computed 58917.21 PMKs/s total. #1: 'OpenCL-Device 'AMD PITCAIRN'': 55101.4 PMKs/s (RTT 1.1) #2: 'CPU-Core (SSE2)': 757.2 PMKs/s (RTT 2.9) #3: 'CPU-Core (SSE2)': 756.0 PMKs/s (RTT 3.0) #4: 'CPU-Core (SSE2)': 755.5 PMKs/s (RTT 2.8) #5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0) On head Running benchmark (50267.7 PMKs/s)... \ Computed 50096.30 PMKs/s total. #1: 'OpenCL-Device 'AMD PITCAIRN'': 48501.1 PMKs/s (RTT 1.2) #2: 'CPU-Core (SSE2)': 757.5 PMKs/s (RTT 2.9) #3: 'CPU-Core (SSE2)': 757.1 PMKs/s (RTT 2.9) #4: 'CPU-Core (SSE2)': 757.0 PMKs/s (RTT 2.9) #5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0) Head + patch one Running benchmark (50883.7 PMKs/s)... - Computed 51220.59 PMKs/s total. #1: 'OpenCL-Device 'AMD PITCAIRN'': 48583.5 PMKs/s (RTT 1.2) #2: 'CPU-Core (SSE2)': 756.5 PMKs/s (RTT 3.0) #3: 'CPU-Core (SSE2)': 756.0 PMKs/s (RTT 2.9) #4: 'CPU-Core (SSE2)': 754.2 PMKs/s (RTT 2.9) #5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0) Head + patch two Running benchmark (51348.9 PMKs/s)... | Computed 50781.53 PMKs/s total. #1: 'OpenCL-Device 'AMD PITCAIRN'': 48676.9 PMKs/s (RTT 1.2) #2: 'CPU-Core (SSE2)': 752.4 PMKs/s (RTT 2.9) #3: 'CPU-Core (SSE2)': 755.4 PMKs/s (RTT 2.9) #4: 'CPU-Core (SSE2)': 752.8 PMKs/s (RTT 2.9) #5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0) Does reverting Mesa commit 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 help for Valley or pyrit with the latest kernel? (In reply to comment #13) > Does reverting Mesa commit 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 help for > Valley or pyrit with the latest kernel? Yes, with that reverted perf is roughly back to "good" kernel for both. Does pyrit transfer much data from the GPU to the CPU? If so, my patch "gallium/radeon: Do not use u_upload_mgr for buffer downloads" that I have just sent to the mesa-dev mailing list might help... (In reply to comment #15) > Does pyrit transfer much data from the GPU to the CPU? If so, my patch > "gallium/radeon: Do not use u_upload_mgr for buffer downloads" that I have > just sent to the mesa-dev mailing list might help... It does help pyrit, but as expected I guess, not valley. (In reply to comment #14) > Yes, with that reverted perf is roughly back to "good" kernel for both. Can you try restoring the old behaviour for only PIPE_USAGE_DYNAMIC or PIPE_USAGE_STREAM respectively, to see if one of them alone fixes the problem in Valley? Does the pyrit benchmark include compile time when calculating PMKs/s ? The patch you've bisected unrolls a loop that makes the pyrit kernel really big, so it will take longer to compile. Is it possible to run the benchmark for longer? If so, does the gap between good and bad shrink? (In reply to comment #17) > (In reply to comment #14) > > Yes, with that reverted perf is roughly back to "good" kernel for both. > > Can you try restoring the old behaviour for only PIPE_USAGE_DYNAMIC or > PIPE_USAGE_STREAM respectively, to see if one of them alone fixes the > problem in Valley? Stream as below gets the old behavior, doing below with dynamic makes no difference AFAICT. Testing this is subjective as the pauses stop the clock in benchmark mode, so don't show and of course "working" is still somewhat broken :-). diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c index 22bc97e..9262823 100644 --- a/src/gallium/drivers/radeon/r600_buffer_common.c +++ b/src/gallium/drivers/radeon/r600_buffer_common.c @@ -110,11 +110,12 @@ bool r600_init_resource(struct r600_common_screen *rscreen, enum radeon_bo_flag flags = 0; switch (res->b.b.usage) { + case PIPE_USAGE_STREAM: + flags = RADEON_FLAG_GTT_WC; case PIPE_USAGE_STAGING: /* Transfers are likely to occur more often with these resources. */ res->domains = RADEON_DOMAIN_GTT; break; - case PIPE_USAGE_STREAM: case PIPE_USAGE_DYNAMIC: /* Older kernels didn't always flush the HDP cache before * CS execution (In reply to comment #18) > Does the pyrit benchmark include compile time when calculating PMKs/s ? The > patch you've bisected unrolls a loop that makes the pyrit kernel really big, > so it will take longer to compile. > > Is it possible to run the benchmark for longer? If so, does the gap between > good and bad shrink? It runs for 1min 16s as is, TBH I don't use/know pyrit - the only reason I have it is when I got my radeonsi and was reading how to get opencl there was a link to it as an example of a working app in the wiki. It does seem to build up speed as time progresses, but now it's slower it seems to plateau slightly longer before the end than it used to. Maybe it's just not a representative test - hence my query about glxgears, I haven't found any "real" opencl use to benchmark yet - x264 would be nice, but it seems it needs things not yet implemented. Created attachment 105316 [details] [review] drm/ttm: move fpfn and lpfn into each placement Created attachment 105317 [details] [review] drm/radeon: Add RADEON_GEM_CPU_ACCESS BO creation flag Created attachment 105318 [details] [review] r600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU Does this Mesa patch (instead of the PIPE_USAGE_STREAM change) together with the previous two kernel patches I attached help Valley? (In reply to comment #22) > Created attachment 105317 [details] [review] [review] > drm/radeon: Add RADEON_GEM_CPU_ACCESS BO creation flag Just a general note: We need to define that flag negated for compatibility reasons. E.g. RADEON_GEM_NO_CPU_ACCESS because code must assume with an old client that the buffer is always CPU accessed. (In reply to comment #24) > Just a general note: We need to define that flag negated for compatibility > reasons. E.g. RADEON_GEM_NO_CPU_ACCESS because code must assume with an old > client that the buffer is always CPU accessed. No, CPU access works fine even with old clients which don't set the flag. The flag is just an optimization, preventing BOs which are expected to be accessed by the CPU from being stored in the CPU-inaccessible part of VRAM. (In reply to comment #23) > Created attachment 105318 [details] [review] [review] > r600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU > > Does this Mesa patch (instead of the PIPE_USAGE_STREAM change) together with > the previous two kernel patches I attached help Valley? No difference with those. The big kernel patch didn't apply on drm-fixes-3.17-wip, but it only failed in noveau so I deleted that from it. (In reply to comment #26) > No difference with those. Bummer, thanks for testing anyway. I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for review, but it's strange: I couldn't notice any significant difference in stutter in Valley regardless of any of these changes. BTW, what CPU are you using? (In reply to comment #27) > (In reply to comment #26) > > No difference with those. > > Bummer, thanks for testing anyway. > > I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for > review, but it's strange: I couldn't notice any significant difference in > stutter in Valley regardless of any of these changes. > > BTW, what CPU are you using? It's an AMD Phenom II x4 965be. I always set cpufreq ondemand to perf when testing so it's forced @ 3.4GHz. When I noticed that ffmpeg x11grab made the pauses "normal" length I did try a different test with cpus loaded by compiling, but this didn't do it. I also reported on irc Valley stutter on Kabini, but now i am somhow against reverting because performance suffer with reverting in other games. One other reason simply because i tested it first time on Windows today and there i have stutter even worse then then any case we have here :D. And i did't know that :D DX11/DX9/OpenGL any mode all stutter a lot. Our worst combination is a lot smoother than with Catalyst on Windows :). So question is, is there other stuter examples than Unigine Valley? Yes. Minecraft is unplayable with latest Kernel+latest Mesa. In the beginning, it's smooth.. after 30 sec or so it start to stuter a little... By the five minutes mark. if you move in the game, it pause for like 5 to 7 seconde, move for 2 secondes, pause for 5 secondes... By pause I mean the whole system pause, mouse, other terminals.. Everything. I think it's related to this bug, it started with the kernel 3.17 (Like the OP, I did update mesa, llvm, libdrm, glamor...) going back to 3.16 did not fix it, I got to downgrade Mesa and LLVM too (to the first relase of the first of this month to be sure) and after that, I played for like 4 hours without any issue. I tried to revert 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 but I got massive gfx corruptions. (In reply to comment #30) > Yes. > > Minecraft is unplayable with latest Kernel+latest Mesa. > > In the beginning, it's smooth.. after 30 sec or so it start to stuter a > little... By the five minutes mark. if you move in the game, it pause for > like 5 to 7 seconde, move for 2 secondes, pause for 5 secondes... > > By pause I mean the whole system pause, mouse, other terminals.. Everything. > > I think it's related to this bug, it started with the kernel 3.17 (Like the > OP, I did update mesa, llvm, libdrm, glamor...) > > going back to 3.16 did not fix it, I got to downgrade Mesa and LLVM too (to > the first relase of the first of this month to be sure) and after that, I > played for like 4 hours without any issue. > > I tried to revert 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 but I got massive > gfx corruptions. I only meant about XYZ game which stutter with, and where reverting 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 helps, if game stutter even with/without reverting than that is i think another issue :). (In reply to comment #30) Mathieu, it sounds like your problem isn't related to this report. Please file your own report, and it would be great if you could bisect Mesa or the kernel. (In reply to comment #28) > > I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for > > review, but it's strange: I couldn't notice any significant difference in > > stutter in Valley regardless of any of these changes. Also, according to GALLIUM_HUD=requested-VRAM+VRAM-usage,requested-GTT+GTT-usage, Valley only seems to allocate about 10-20 MB for streaming BOs, so I'm not sure why putting them in VRAM or not makes such a big difference for you. > > BTW, what CPU are you using? > > It's an AMD Phenom II x4 965be. I assume the chipset for that doesn't support PCIe 3.0, does it? I wonder if maybe streaming BOs should be in VRAM with PCIe 3.0 but not with PCIe 2.0. (In reply to comment #29) > I also reported on irc Valley stutter on Kabini, but now i am somhow > against reverting because performance suffer with reverting in other games. > > One other reason simply because i tested it first time on Windows today and > there i have stutter even worse then then any case we have here :D. And i > did't know that :D DX11/DX9/OpenGL any mode all stutter a lot. Our worst > combination is a lot smoother than with Catalyst on Windows :). I tried on Windows with the same settings and you are right that there are stutters. For me they are about 10x shorter than my best Linux case, which means that some effectively don't exist and the ones that do are more like a frame or two. It does I guess illustrate that Valley may be doing something stupid - some are in the same places I see on Linux. > So question is, is there other stuter examples than Unigine Valley? Will have to test more - there are some with Initially with Unreal Reflections. I am using a pure 64bit setup, which means I don't get to test steam or etqw - you may have a point that if only Valley is really bad and other things gain Valley may be an exception that could be sacrificed. (In reply to comment #33) > (In reply to comment #28) > > > I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for > > > review, but it's strange: I couldn't notice any significant difference in > > > stutter in Valley regardless of any of these changes. > > Also, according to > GALLIUM_HUD=requested-VRAM+VRAM-usage,requested-GTT+GTT-usage, Valley only > seems to allocate about 10-20 MB for streaming BOs, so I'm not sure why > putting them in VRAM or not makes such a big difference for you. > > > > > BTW, what CPU are you using? > > > > It's an AMD Phenom II x4 965be. > > I assume the chipset for that doesn't support PCIe 3.0, does it? I wonder if > maybe streaming BOs should be in VRAM with PCIe 3.0 but not with PCIe 2.0. Yea I am PCIE 2.0. Other settings which may or may not be relavent - vblank_mode=0, swapbufferswait off, 1920x1080 fullscreen, quality high, antialiasing off. I tried with hud and see 10-20MB requested with the stream change reverted and 8kb with it. The fps counter on hud does show the pauses - though even the good case looks bad on that - but the biggest pauses it shows are between scenes when the screen has faded to black, I guess you kind of expect something to be loading then. I'll upload a couple of screens. Created attachment 105543 [details]
valley worse pausing with stream buffer change
Created attachment 105544 [details]
valley better with stream buffer change reverted
(In reply to comment #36) > Created attachment 105543 [details] > valley worse pausing with stream buffer change I notice that I seem to be pegged more to a single core on this one. (In reply to comment #34) > I tried on Windows with the same settings and you are right that there are > stutters. For me they are about 10x shorter than my best Linux case, which > means that some effectively don't exist and the ones that do are more like a > frame or two. It does I guess illustrate that Valley may be doing something > stupid - some are in the same places I see on Linux. Actually there is workaround on Windows by not using Aero, but some Basic theme So driver has problems with Aero or Aero with the driver and this app i don't know much about Windows i don't use it much of the time. If you let it run with Basic theme and few rounds you will spot i guess that only first round there is unusual maybe 2-3 times 1-2 sec sttuters, then second time and later it is stutter free. All in all people must not use Aero when play Valley, so app even on Windows is not 100% trouble free :) (In reply to comment #39) > If you let it run with Basic theme and few rounds you will spot i guess > that only first round there is unusual maybe 2-3 times 1-2 sec stutters, > then second time and later it is stutter free. Andy, you have behavior like that (more or less those seconds for stuter) if PIPE_USAGE_STREAM is reverted, right? Then maybe that is the right way to go, global performance will suffer a little but if nothing better can't be done then revert of PIPE_USAGE_STREAM is OK :) (In reply to comment #40) > (In reply to comment #39) > > If you let it run with Basic theme and few rounds you will spot i guess > > that only first round there is unusual maybe 2-3 times 1-2 sec stutters, > > then second time and later it is stutter free. > > Andy, you have behavior like that (more or less those seconds for stuter) > if PIPE_USAGE_STREAM is reverted, right? The Bad was with vanilla mesa (couple of days old) The good was that + the patch in Comment 19 I looked again at Unreal Reflections - there is a difference but it's only right at the start, both have a couple of stutters and they are longer with vanilla then the rest is OK in both cases. Playing more with hud I can see that there is a 1 to 1 correlation between the pauses and spikes in num-bytes-moved. The scale on the graphs did get squashed a bit by outliers, which seemed a bit random sometimes - I saw 330 MB on one run - but anyway here's a couple of screens - I got these using sleep 120 && xwd -root ... the first one landed on a scene change so is black. Created attachment 105563 [details]
valley vanilla mesa bad num bytes moved
Created attachment 105564 [details]
valley better with revert num bytes moved
(In reply to comment #41) > (In reply to comment #40) > > (In reply to comment #39) > > > If you let it run with Basic theme and few rounds you will spot i guess > > > that only first round there is unusual maybe 2-3 times 1-2 sec stutters, > > > then second time and later it is stutter free. > > > > Andy, you have behavior like that (more or less those seconds for stuter) > > if PIPE_USAGE_STREAM is reverted, right? > > The Bad was with vanilla mesa (couple of days old) > > The good was that + the patch in Comment 19 I asked is Valley play the same with your good case here and with using Basic theme in Windows :). That is the case for me, and average fps is around 80% in comparasion. > I looked again at Unreal Reflections - there is a difference but it's only > right at the start, both have a couple of stutters and they are longer with > vanilla then the rest is OK in both cases. Those Unreal 4 Engine linux demos are slide show fest on Kabini, so i can't recognize if there is stutter between two frames :D. Iguess those simply needs at least 10X+ more powerfull GPU then i have. I've seen some stutters without any corresponding buffer moves though. Still not sure why it's stuttering so bad sometimes. BTW, Andy, does the stuttering also seem to get better for you if you run Valley repeatedly? (In reply to comment #45) > (In reply to comment #41) > > (In reply to comment #40) > > > (In reply to comment #39) > > > > If you let it run with Basic theme and few rounds you will spot i guess > > > > that only first round there is unusual maybe 2-3 times 1-2 sec stutters, > > > > then second time and later it is stutter free. > > > > > > Andy, you have behavior like that (more or less those seconds for stuter) > > > if PIPE_USAGE_STREAM is reverted, right? > > > > The Bad was with vanilla mesa (couple of days old) > > > > The good was that + the patch in Comment 19 > > I asked is Valley play the same with your good case here and with using > Basic theme in Windows :). That is the case for me, and average fps is > around 80% in comparasion. Next time I'm in Windows I'll try changing desktop - but as I said, with default desktop valley is 10x better than my best Linux case and that was the one and only run I did. (In reply to comment #46) > I've seen some stutters without any corresponding buffer moves though. Still > not sure why it's stuttering so bad sometimes. > > BTW, Andy, does the stuttering also seem to get better for you if you run > Valley repeatedly? No, it's quite consistent if I quit and re-run. The amount moved doesn't seem to correlate with the length of pause - and sometimes there are small moves without stutter, so maybe it's not totally this. Looking at Heaven 4.0 there are no moves at all after load, but there are a few very brief stutters on the night scenes - these are the same with or without patch though. What does num-bytes-moved measure - from where to where? (In reply to comment #48) > (In reply to comment #46) > > I've seen some stutters without any corresponding buffer moves though. Still > > not sure why it's stuttering so bad sometimes. > > > > BTW, Andy, does the stuttering also seem to get better for you if you run > > Valley repeatedly? > > No, it's quite consistent if I quit and re-run. > > The amount moved doesn't seem to correlate with the length of pause - and > sometimes there are small moves without stutter, so maybe it's not totally > this. > > Looking at Heaven 4.0 there are no moves at all after load, but there are a > few very brief stutters on the night scenes - these are the same with or > without patch though. > > What does num-bytes-moved measure - from where to where? The HUD always displays an average value per frame. It's the average of all values between the current and the last update of the HUD. (In reply to comment #49) > (In reply to comment #48) > > (In reply to comment #46) > > What does num-bytes-moved measure - from where to where? > > The HUD always displays an average value per frame. It's the average of all > values between the current and the last update of the HUD. Ahh, so the fact that HUD stops rendering during the pauses means that spikes are likely anyway. Though my question wasn't really about the HUD as such, I was wondering where they were moving to/from - I guess the answer may be too obvious, but just to confirm. I assume it's across PCIE to the card (or maybe from/both) - is it DMA or CPU transfer? Is it dependent on app behavior or driver - eg. running Unigine Reflections I saw a blip in the graph first run, but not again. num-bytes-moved comes from TTM. It's the size of all buffer moves done by TTM. This usually happens during command submission if VRAM is full. Just updated llvm and my perf on pyrit is back to normal - Computed 77586.36 PMKs/s total. #1: 'OpenCL-Device 'AMD PITCAIRN'': 73865.3 PMKs/s (RTT 0.8) #2: 'CPU-Core (SSE2)': 744.3 PMKs/s (RTT 2.9) #3: 'CPU-Core (SSE2)': 746.4 PMKs/s (RTT 3.0) #4: 'CPU-Core (SSE2)': 745.7 PMKs/s (RTT 2.9) #5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0) (In reply to comment #52) > Just updated llvm and my perf on pyrit is back to normal - > > Computed 77586.36 PMKs/s total. > #1: 'OpenCL-Device 'AMD PITCAIRN'': 73865.3 PMKs/s (RTT 0.8) > #2: 'CPU-Core (SSE2)': 744.3 PMKs/s (RTT 2.9) > #3: 'CPU-Core (SSE2)': 746.4 PMKs/s (RTT 3.0) > #4: 'CPU-Core (SSE2)': 745.7 PMKs/s (RTT 2.9) > #5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0) Not llvm it's mesa - radeonsi: Compile dummy pixel shader on demand (In reply to comment #53) > > Just updated llvm and my perf on pyrit is back to normal - [...] > Not llvm it's mesa - > > radeonsi: Compile dummy pixel shader on demand Sounds like pyrit ends up creating a lot of Gallium contexts. You might get even better performance with the LLVM regression fixed. BTW, this could also mean that the pyrit performance regression was simply due to LLVM now taking slightly longer to compile a shader. (In reply to comment #55) > BTW, this could also mean that the pyrit performance regression was simply > due to LLVM now taking slightly longer to compile a shader. The llvm commit still reverts cleanly, so I tested and didn't gain anything significant. So almost a month has gone by... I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still broken like this: https://www.youtube.com/watch?v=NvgA9_B0dMo (ignore the excessive jumpy frames that come from dri3 offloading) R600_DEBUG=nodma does not help by the way. Has there been any progress? (In reply to comment #57) > I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still > broken like this: > https://www.youtube.com/watch?v=NvgA9_B0dMo Are you sure that's directly related to the Unigine Heaven stuttering discussed in this report? E.g., does reverting the Mesa commit in question help, or do you see similar symptoms in the Gallium HUD? (In reply to comment #58) > (In reply to comment #57) > > I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still > > broken like this: > > https://www.youtube.com/watch?v=NvgA9_B0dMo > > Are you sure that's directly related to the Unigine Heaven stuttering > discussed in this report? E.g., does reverting the Mesa commit in question > help, or do you see similar symptoms in the Gallium HUD? It does look like the same symptoms. Only rare and short hangs in unigine heaven, but frequent hangs of ~1-2 sconds in unigine valley. The HUD shows that these hangs mostly correlate with jumps in vram/gtt usage. Is the mesa commit in question 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938? If so, it doesn't revert cleanly anymore, but I can have a look if I can manually see how to do it. (In reply to comment #59) > (In reply to comment #58) > > (In reply to comment #57) > > > I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still > > > broken like this: > > > https://www.youtube.com/watch?v=NvgA9_B0dMo > > > > Are you sure that's directly related to the Unigine Heaven stuttering > > discussed in this report? E.g., does reverting the Mesa commit in question > > help, or do you see similar symptoms in the Gallium HUD? > > It does look like the same symptoms. Only rare and short hangs in unigine > heaven, but frequent hangs of ~1-2 sconds in unigine valley. The HUD shows > that these hangs mostly correlate with jumps in vram/gtt usage. > > Is the mesa commit in question 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938? If > so, it doesn't revert cleanly anymore, but I can have a look if I can > manually see how to do it. Some Unreal are OK for me after a glitchy start. I can reproduce what you see with Scifi hallway and Elemental - the latter is very bad, the former did come good after a while. I hadn't "seen" these two before today, as in the past they just bailed with an llvm error. I'll try, when I have time, to see if they are better with the revert. Note that some of the Unreal Engine 4 demos want to use more than 1G of graphics memory (as shown by the GALLIUM_HUD queries requested-VRAM and requested-GTT), so if the GPU has 'only' 1G of VRAM or less, that's a very difficult situation for the graphics memory management code. (In reply to comment #61) > Note that some of the Unreal Engine 4 demos want to use more than 1G of > graphics memory (as shown by the GALLIUM_HUD queries requested-VRAM and > requested-GTT), so if the GPU has 'only' 1G of VRAM or less, that's a very > difficult situation for the graphics memory management code. I do have 2 gig, but looking at the screenshot of elemantal to be attached I see that used and requested differ. This shot doesn't really show how long the pauses are - they are really bad, it takes about 2 minutes to render the first few frames with pauses of many seconds after that. It's so bad it's hard to tell whether the revert helps - probably not, I guess it's something different. Have you tried Elemental? Created attachment 107183 [details]
Elemental screen showing vram usage
(In reply to comment #61) > Note that some of the Unreal Engine 4 demos want to use more than 1G of > graphics memory (as shown by the GALLIUM_HUD queries requested-VRAM and > requested-GTT), so if the GPU has 'only' 1G of VRAM or less, that's a very > difficult situation for the graphics memory management code. I too have 2 Gigabyte VRAM. Here is a short clip with some HUD graphs: https://www.youtube.com/watch?v=vvqbAFV06pA It's pretty clear that the stutters correlate with activity in "num bytes moved"... I have also tried the native borderlands 2 for a few minutes today and I'm seeing similar stuttering. It doesn't happen quite so often, but it's still often enough to be an issue. So is that revert helps http://lists.freedesktop.org/archives/mesa-dev/2014-August/066746.html Keep in mind that revert broke 32bit complitely, lot of corruption :) About performance for UE4 demos i can't say a lot, this is on Kabini :) Or if not reverting just try how it goes with kernel 3.16 that should not hit this stuttering. And watch requested-VRAM at begening of these demos, for me that is actually higher then with kernel 3.17 or 3.18-next for the same app :). I am not trying other demos , but seems like newer kernels requests more VRAM from the apps :) I mean kernel 3.16 requested-VRAM is lower, then with 3.17+ kernels :D (In reply to comment #64) > It's pretty clear that the stutters correlate with activity in "num bytes > moved"... I brought this up earlier and as was explained way the graphing/counting works may mean this is not related. In summary AIUI the fact there is a pause causes a spike because the count is from the last frame rendered - which is way longer than normal due to the pause. Offtopic... but if someone has sound crackling in those UE4 demos (at least Elemental and Vehicle, demos i tried) that is probably because openal 1.15 they shipped, 1.14 an 1.16 works fine for me... Sorry for offtopic, but there are bugs all over the place and might be related, one never knows :) @Andy Oops didn't notice... Elemental demo makes GPU faults for me, is it the same for you or if you have assertation enabled llvm... there is a bug 82544 Michel filled. (In reply to comment #69) > @Andy > > Oops didn't notice... Elemental demo makes GPU faults for me, is it the > same for you or if you have assertation enabled llvm... there is a bug 82544 > Michel filled. No gpu faults for me, llvm used to assert but not now, sound seems OK using alsa (I don't have pulse). (In reply to comment #70) > (In reply to comment #69) > > @Andy > > > > Oops didn't notice... Elemental demo makes GPU faults for me, is it the > > same for you or if you have assertation enabled llvm... there is a bug 82544 > > Michel filled. > > No gpu faults for me, llvm used to assert but not now, sound seems OK using > alsa (I don't have pulse). Yeah you are right, i tested with llvm 3.5 that, runing 3.6svn normaly... Michel should close that on i guess. About openal, yes i also use plain alsa no pulse, but have sound crackling with openal 1.15 with any game which ship that and also one which is in Debian sid... does not happen with 1.14 or 1.16 so i basically i replace it with mine 1.16... but OK that does not matter maybe that is only for me :) (In reply to comment #62) > I do have 2 gig, but looking at the screenshot of elemantal to be attached I > see that used and requested differ. That's probably because of VRAM fragmentation. (BTW, I find it easier to keep track of this with requested-VRAM+VRAM-usage,requested-GTT+GTT-usage instead of requested-VRAM+requested-GTT,VRAM-usage+GTT-usage) > This shot doesn't really show how long the pauses are - they are really bad, > it takes about 2 minutes to render the first few frames with pauses of many > seconds after that. > > It's so bad it's hard to tell whether the revert helps - probably not, I > guess it's something different. Have you tried Elemental? Yes, but even on Kaveri with only 1G of VRAM, it doesn't take two minutes for it to get going, and I don't notice such long pauses either. So I think it's better if we track the UE4 issues in a separate report, and it would be great if you guys could bisect the kernel or Mesa for that. (In reply to comment #65) > Keep in mind that revert broke 32bit complitely, lot of corruption :) I haven't been able to reproduce that. If you still can, please file a bug for it, as there's nothing preventing the kernel from using GTT instead of VRAM when the latter is full. > I am not trying other demos , but seems like newer kernels requests more > VRAM from the apps :) The Mesa commit in question makes the r600g and radeonsi drivers try to use VRAM for more things, but only with newer kernels, because older kernels didn't guarantee reliability when using VRAM for those things. (In reply to Andy Furniss from comment #67) > In summary AIUI the fact there is a pause causes a spike because the count > is from the last frame rendered - which is way longer than normal due to the > pause. Still, it means that *some* BOs were moved during the pause, so it's not impossible that the pause is somehow related to the BO moves. BTW, make sure CONFIG_CMA isn't enabled in your kernels, in particular those using Ubuntu. (In reply to smoki from comment #68) > Offtopic... Please don't clutter up bug reports with off-topic comments. Well. I have said that I used drm-next-3.18 and had these hangs. When I applied http://lists.freedesktop.org/archives/mesa-dev/2014-August/066746.html it did not help. Now I am using 3.17-rc7 with that mesa patch and I do not see these hangs anymore. Or maybe they are these very short stutters. Sorry if drm-next-3.18 behavior is not relevant here. As for the num bytes moved: Does the HUD graph only accumulate everything that happened in the hang? If so, then the hundreds of megabytes still seem more than normal and the used graphs definitely show change before and after the hangs. Whatever you make of that... CONFIG_CMA is not enabled on either kernel. Indeed, there's less moving of data with the rc kernel I think. For comparison: https://www.youtube.com/watch?v=mFaqHGle9Hg (In reply to Christoph Haag from comment #74) > Well. > > I have said that I used drm-next-3.18 and had these hangs. > When I applied > http://lists.freedesktop.org/archives/mesa-dev/2014-August/066746.html it > did not help. > > Now I am using 3.17-rc7 with that mesa patch and I do not see these hangs > anymore. Or maybe they are these very short stutters. > > Sorry if drm-next-3.18 behavior is not relevant here. > > As for the num bytes moved: Does the HUD graph only accumulate everything > that happened in the hang? If so, then the hundreds of megabytes still seem > more than normal and the used graphs definitely show change before and after > the hangs. Whatever you make of that... > > > CONFIG_CMA is not enabled on either kernel. > > Indeed, there's less moving of data with the rc kernel I think. > For comparison: https://www.youtube.com/watch?v=mFaqHGle9Hg It would be useful to know if Elemental also worked with 3.17-rc7. (In reply to Andy Furniss from comment #75) > It would be useful to know if Elemental also worked with 3.17-rc7. It's stuttering quite severely, but it feels more like "normal" performance drops and I don't think it completely hangs like in the videos I made before. I actually tried it for the first time in months because in the past it hung the gpu and the operating system completely with gpu faults I think. Today I ran it for the first time without any severe problems, so radeonsi is definitely making good progress! (In reply to Christoph Haag from comment #76) > (In reply to Andy Furniss from comment #75) > > > It would be useful to know if Elemental also worked with 3.17-rc7. > > It's stuttering quite severely, but it feels more like "normal" performance > drops and I don't think it completely hangs like in the videos I made before. > > I actually tried it for the first time in months because in the past it hung > the gpu and the operating system completely with gpu faults I think. Today I > ran it for the first time without any severe problems, so radeonsi is > definitely making good progress! Ok, I'm going to open a new bug for this one when I have time to test more. I can get the behavior you see, but only on the last kernel with the old firmware I have installed, anything more recent including current agd5f 3.17 fixes gets long pauses for me. What is your card? (In reply to Andy Furniss from comment #77) > Ok, I'm going to open a new bug for this one when I have time to test more. Bisected to the same kernel commit as this one, but did a new bug - https://bugs.freedesktop.org/show_bug.cgi?id=84662 (In reply to Andy Furniss from comment #78) > https://bugs.freedesktop.org/show_bug.cgi?id=84662 I think that should cover Unigine as well. Is there still an issue with pyrit? (In reply to Michel Dänzer from comment #79) > (In reply to Andy Furniss from comment #78) > > https://bugs.freedesktop.org/show_bug.cgi?id=84662 > > I think that should cover Unigine as well. Yea. > Is there still an issue with pyrit? Pyrit is OK, so closing this one. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.