Bug 104271

Summary: i965: Timeout in dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.5
Product: Mesa Reporter: Chad Versace <chadversary>
Component: Drivers/DRI/i965Assignee: Francisco Jerez <currojerez>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: enhancement    
Priority: high CC: chadversary, eero.t.tamminen, mattst88
Version: git   
Hardware: Other   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=103322
Whiteboard:
i915 platform: i915 features:

Description Chad Versace 2017-12-14 23:48:41 UTC
When running dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.5 on Brasswell Chrome OS devices, using the Android CTS runner, this test fails due to a timeout. Ilja says the test needs to complete in less than 30 seconds.

This is the last remaining dEQP failure, to my knowledger, that blocks GLES 3.1 on Brasswell Chromebooks. It passed on Skylake and Apollo Lake Chromebooks.

Armchair diagnosis: The failures seems unrelated to Brasswell itself. I think the problem is that CPU speed on Brasswell Chromebooks is slower than other Chromebooks.
Comment 1 Mark Janes 2017-12-15 18:10:55 UTC
FWIW, this tests completes in 3m3s on the BSW systems that Intel runs in CI.  It takes 52 seconds on KBL/SKL, so I'm not sure how BSW could possibly meet a 30s deadline.

Are you sure that there is a 30s requirement?
Comment 2 Mark Janes 2017-12-15 18:27:29 UTC
After reading the bug that Chad linked to this, I re-ran with binaries compiled with release flags.  My i7 KBL system completes the test in 34s, not the 52s reported by CI.
Comment 3 Mark Janes 2017-12-15 19:14:28 UTC
bisected to:
9355116bdad6ee9914554de8e48ba271bd36a8eb is the first bad commit
commit 9355116bdad6ee9914554de8e48ba271bd36a8eb
Author: Francisco Jerez <currojerez@riseup.net>
Date:   Mon Oct 23 13:47:10 2017 -0700
    intel/fs: Don't let undefined values prevent copy propagation.


Test time increases from 10s to 35s on KBL i7.
Comment 4 Kenneth Graunke 2017-12-15 21:15:34 UTC
Oh, interesting.  Basically the entire time is spent in fs_copy_prop_dataflow::run().  There are 2600 basic blocks and 5000 values.  About 40% of them are constants, which could probably be handled separately to avoid the dataflow costs (who cares what block it came from, it's an immediate, you can just reconstruct it).  Not sure about undefined values.
Comment 5 Francisco Jerez 2017-12-15 21:23:07 UTC
IMHO the ultimate reason for the regression is that the dataflow propagation logic of the copy propagation pass has a serious run-time complexity problem in acyclic CFGs.  The patch this was bisected to is merely increasing the amount of work done in each iteration which substantially exacerbates the problem.  A proper fix would involve reworking the copy propagation dataflow logic to reduce the polynomial degree of its run-time complexity in the acyclic case.  Working on it -- It will likely give you better performance than before my undef handling patch.

Updating importance to high-priority enhancement since this is purely a (largely pre-existing) compile-time performance issue, spec conformance is unaffected by it.
Comment 6 Mark Janes 2017-12-15 21:38:53 UTC
On BSW, the run-time of this test goes from 39s to 175s, on the bisected commit.  We need to improve performance by >25% beyond just fixing Curro's regression.

As per Chad's comment, this performance is bad enough that it will prevent us from passing the android CTS.

It may not be a *Khronos* conformance issue, but it is a conformance issue nonetheless, for all of our platforms.
Comment 7 Francisco Jerez 2017-12-15 21:48:33 UTC
(In reply to Mark Janes from comment #6)
> On BSW, the run-time of this test goes from 39s to 175s, on the bisected
> commit.  We need to improve performance by >25% beyond just fixing Curro's
> regression.
> 

There's a good chance that the fix I'm working on will bring us below the 30s bar, but it's hard to predict with certainty since there may be other passes behaving inefficiently in this test-case.

> As per Chad's comment, this performance is bad enough that it will prevent
> us from passing the android CTS.
> 
> It may not be a *Khronos* conformance issue, but it is a conformance issue
> nonetheless, for all of our platforms.

Except that to the extent that the CTS doesn't match the specification we are more likely to be talking about a conformance problem of the CTS rather than of our implementation...
Comment 8 Francisco Jerez 2017-12-20 02:27:41 UTC
Patch sent to the mailing list [1]. It reduces the time spent in dataflow propagation by 30x on my CHV system.  Test run-time is now around 13s well below the time-out even on CHV.

[1] https://lists.freedesktop.org/archives/mesa-dev/2017-December/180456.html
Comment 9 Francisco Jerez 2018-01-17 23:43:10 UTC
Shouldn't timeout anymore with current master.  Closing.
Comment 10 Chad Versace 2018-01-18 00:02:33 UTC
Thanks Curro. FYI, I've cherry-picked your fix, plus what I believed to be prerequisite commits too, to the ARC++ repo.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.