Bug 106780

Summary: intel_powerclamp: Start idle injection to reduce power with [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B
Product: DRI Reporter: Chris Murphy <bugzilla>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard: Triaged
i915 platform: SKL i915 features:
Bug Depends on:    
Bug Blocks: 105981    
Attachments:
Description Flags
lspci vvnn
none
dmesg none

Description Chris Murphy 2018-06-02 04:36:10 UTC
Summary:
While Firefox is under heavy load, the system becomes unresponsive, mouse arrow is jerky, switching applications has delays. Top shows four kinject processes consuming just under 50% CPU each. This is a recent problem and could be one of:
a. kernel 4.16, I didn't experience it with 4.15
b. use of external display, which approximately coincides with a.

Report is for kernel 4.16.13-300.fc28.x86_64, but has happened with all the 4.17 rc's as well.

HP Spectre Notebook
SKU Number: W2K28UA#ABA

thermald is running, thermald-1.7.1-2.fc28.x86_64

lspci
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1904] (rev 08)

00:02.0 VGA compatible controller [0300]: Intel Corporation Skylake GT2 [HD Graphics 520] [8086:1916] (rev 07) (prog-if 00 [VGA controller])

/proc/cpuinfo
model name	: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
microcode	: 0xc2


Example from dmesg:

[15262.909362] intel_powerclamp: Start idle injection to reduce power
[15267.716816] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=671808 end=671809) time 169 us, min 1192, max 1199, scanline start 1189, end 1201
[15344.030304] intel_powerclamp: Stop forced idle injection
[15432.139779] intel_powerclamp: Start idle injection to reduce power
[15460.194230] intel_powerclamp: Stop forced idle injection
[16329.281047] intel_powerclamp: Start idle injection to reduce power
[16406.504159] intel_powerclamp: Stop forced idle injection
[16410.510943] intel_powerclamp: Start idle injection to reduce power
[16502.669255] intel_powerclamp: Stop forced idle injection
[16558.736709] intel_powerclamp: Start idle injection to reduce power
[16568.533524] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=749792 end=749793) time 263 us, min 1192, max 1199, scanline start 1187, end 1206
[16769.567613] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=761844 end=761845) time 261 us, min 1192, max 1199, scanline start 1186, end 1206


Attaching dmesg, lspci output. Will enable drm.debug=0xe and reproduce and report back.
Comment 1 Chris Murphy 2018-06-02 04:36:38 UTC
Created attachment 139961 [details]
lspci vvnn
Comment 2 Chris Murphy 2018-06-02 04:36:55 UTC
Created attachment 139962 [details]
dmesg
Comment 3 Francesco Balestrieri 2018-06-05 06:00:11 UTC
Could you also try with drm-tip?
Comment 4 Jani Nikula 2018-06-05 08:32:26 UTC
(In reply to bugzilla from comment #0)
> This is a recent problem and
> could be one of:
> a. kernel 4.16, I didn't experience it with 4.15
> b. use of external display, which approximately coincides with a.

Please rule out b. After that, please bisect between v4.15 and v4.16.
Comment 5 Jani Saarinen 2018-06-25 10:07:30 UTC
Reporter, is this issue still? 
Try using https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M?
Comment 6 Jani Saarinen 2018-06-26 06:09:14 UTC
(In reply to Jani Saarinen from comment #5)
> Reporter, is this issue still? 
> Try using https://cgit.freedesktop.org/drm-tip and send dmesg with
> drm.debug=0x1e log_buf_len=4M?

And try to bisect as suggested by Jani Nikula
Comment 7 Chris Murphy 2018-06-26 15:59:55 UTC
I haven't been able to reproduce the intel_pipe_update_end error since the original report. But I am able to fairly easily reproduce the kinject slowing the whole system down to a crawl with any kernel version, with Firefox and an external display. It happens no where near as easily without external display connected.

Since the original report I've only been using normal Fedora kernels (without extra debug options) and only stable versions of 4.17x. I'll retest soon with 4.18rc+debug kernels, and include the additional debug options you've recommended.
Comment 8 Jani Saarinen 2018-06-27 16:50:55 UTC
ok, thanks.
Comment 9 Maarten Lankhorst 2018-06-29 09:23:41 UTC
I would say the kinject is the real root cause here. It injects idle time because your system is overheating, and atomic updates are really time sensitive. If we throttle during atomic updates then we won't finish in time and you get the failure.

Marking not our bug, because it probably isn't. :)
Comment 10 Chris Murphy 2018-06-29 18:32:46 UTC
I can reliably induce 4x kinject threads soaking up 200% CPU, just by pointing Firefox or Chrome to Comedy Central's web site (oh the irony) with an external display attached. The problem never happens if an external display is not attached. And always happens with the external display is attached.

Also, the system basically becomes unusable. Video playback is jerky, audio stutters, the whole UI and even the mouse arrow becomes unresponsive.

So I think there's something wrong with the video driver when it comes to external display support, that then induces the kinject to calm things down, which then causes the atomic update failure. For sure this is not happening when I reboot and use Windows 10 with the external display attached.

Is it even remotely plausible the problem is induced by Wayland or Mutter (this is GNOME)? I could try to reproduce with X. And then try to reproduce with KDE. I'll reopen it for now so it doesn't get lost.
Comment 11 Maarten Lankhorst 2018-07-03 08:20:45 UTC
The system is getting hotter, why I don't know. That hotness is the real issue, and it's likely something is using all the cpu before kinject happens.

That would help you narrowing down what is going on. So is there anything using a lot of cpu before?
Comment 12 Chris Murphy 2018-07-03 15:30:24 UTC
There is no workload change built-in vs external. The only difference is the fact the rendering of video is happening on an external display and that almost immediately is unworkable (too hot, fans, kinject, stuttered video and audio).

built-in is 1920x1080
external is 1920x1200

So it's not radically more pixels being rendered on the external.
Comment 13 Jani Saarinen 2018-08-13 09:41:20 UTC
Reporter, can you try using latest https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M?
Comment 14 Chris Murphy 2018-08-17 01:48:32 UTC
Is git f91e654474d4 from Linux next close enough or is there newer stuff in drm-tip?
Comment 15 Maarten Lankhorst 2018-08-17 07:58:51 UTC
drm-tip is always newest
Comment 16 Lakshmi 2018-08-27 11:41:41 UTC
Reporter, were you able reproduce this issue with the latest drm-tip?
Comment 17 Chris Murphy 2018-08-29 17:09:55 UTC
I haven't seen the reported problem "Atomic update failure" with any kernel-4.18.x. And with 4.16 and 4.17 kernels, it was a transient problem that wasn't readily reproducible.

So I think this can be closed.
Comment 18 Lakshmi 2018-08-29 20:41:52 UTC
Thankyou Chris. Closing the bug as this issue doesn't occur with latest kernel.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.