Bug 97344

Summary: intel_dp WARN_ON(!msg->buffer != !msg->size)
Product: DRI Reporter: mwa <matthew.auld>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: aleksey, andrey.vihrov, dex+fdobugzilla, intel-gfx-bugs, leho, mihai.dontu, peter.ujfalusi, reddy.harshak, yex.tian, ziegler
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: BDW i915 features: display/DP
Attachments:
Description Flags
dmesg.txt
none
dmesg with drm.debug=0x1e log_buf_len=1M
none
dmesg with drm.debug=0x1e log_buf_len=1M none

Description mwa 2016-08-14 13:23:24 UTC
No visible issues, just lots of the following dmesg-warns: 

[    2.939715] WARNING: CPU: 2 PID: 6 at drivers/gpu/drm/i915/intel_dp.c:1037 intel_dp_aux_transfer+0x1f1/0x260 [i915]
[    2.939716] WARN_ON(!msg->buffer != !msg->size)
[    2.939720] Modules linked in: i915 i2c_algo_bit drm_kms_helper rtsx_pci_sdmmc mmc_core drm e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ptp serio_raw rtsx_pci pps_core video fjes
[    2.939721] CPU: 2 PID: 6 Comm: kworker/u16:0 Tainted: G        W       4.8.0-rc1-drm-intel+ #61
[    2.939722] Hardware name: LENOVO 20BW000FUK/20BW000FUK, BIOS JBET54WW (1.19 ) 11/06/2015
[    2.939753] Workqueue: i915-dp i915_digport_work_func [i915]
[    2.939755]  0000000000000286 00000000da108d1d ffff88023465fb80 ffffffff813dc74d
[    2.939756]  ffff88023465fbd0 0000000000000000 ffff88023465fbc0 ffffffff810a750b
[    2.939758]  0000040dda108d1d ffff88023465fca0 ffff88022b796160 ffff88022b7960e8
[    2.939758] Call Trace:
[    2.939760]  [<ffffffff813dc74d>] dump_stack+0x63/0x86
[    2.939762]  [<ffffffff810a750b>] __warn+0xcb/0xf0
[    2.939764]  [<ffffffff810a758f>] warn_slowpath_fmt+0x5f/0x80
[    2.939766]  [<ffffffff817e0919>] ? schedule_hrtimeout_range_clock+0xb9/0x1b0
[    2.939798]  [<ffffffffa0233c21>] intel_dp_aux_transfer+0x1f1/0x260 [i915]
[    2.939802]  [<ffffffffa0152e62>] drm_dp_dpcd_access+0x72/0x120 [drm_kms_helper]
[    2.939806]  [<ffffffffa0152f2b>] drm_dp_dpcd_write+0x1b/0x20 [drm_kms_helper]
[    2.939837]  [<ffffffffa022e7b8>] intel_dp_start_link_train+0x178/0x280 [i915]
[    2.939866]  [<ffffffffa022feda>] intel_dp_check_link_status+0xba/0x110 [i915]
[    2.939896]  [<ffffffffa023670e>] intel_dp_hpd_pulse+0x1ee/0x350 [i915]
[    2.939927]  [<ffffffffa021c023>] i915_digport_work_func+0x93/0x110 [i915]
[    2.939928]  [<ffffffff810c0824>] process_one_work+0x184/0x410
[    2.939929]  [<ffffffff810c0afe>] worker_thread+0x4e/0x480
[    2.939931]  [<ffffffff810c0ab0>] ? process_one_work+0x410/0x410
[    2.939932]  [<ffffffff810c6618>] kthread+0xd8/0xf0
[    2.939934]  [<ffffffff817e18bf>] ret_from_fork+0x1f/0x40
[    2.939936]  [<ffffffff810c6540>] ? kthread_worker_fn+0x180/0x180
[    2.939937] ---[ end trace 5ae7a222f3deeff8 ]---

And a little bit more info:

msg->buffer != null, msg->size = 0, intel_dp->lane_count = 0
Comment 1 Jani Nikula 2016-08-15 14:19:38 UTC
So this will bisect to

commit dd788090822300a66ff469ae9e50f6d28d124eb8
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Thu Jul 28 17:55:04 2016 +0300

    drm/i915: Warn about aux msg buffer vs. size mismatch

but that commit just uncovers a pre-existing bug elsewhere.

The bug is lane count being 0 when we end up in link training. How that happens, I don't know.
Comment 2 yann 2016-08-23 14:23:03 UTC
Matthew can you attach kernel log and update i915 platform field?
Comment 3 yann 2016-08-29 14:17:46 UTC
Fix (from Matthew) landing here: https://patchwork.freedesktop.org/series/11667/
Comment 4 yann 2016-10-18 15:54:38 UTC
*** Bug 98304 has been marked as a duplicate of this bug. ***
Comment 5 yann 2016-10-18 16:05:08 UTC
*** Bug 98288 has been marked as a duplicate of this bug. ***
Comment 6 Martin Ziegler 2016-11-15 14:16:41 UTC
Bug https://bugs.freedesktop.org/show_bug.cgi?id=98287 "gpu hangs after hibernation" which hit me in 4.9-rc1 is still there in 4.9.0-rc5
Comment 7 Leho Kraav (:macmaN :lkraav) 2016-11-16 19:15:02 UTC
Created attachment 128018 [details]
dmesg.txt

Jumping here from https://bugzilla.kernel.org/show_bug.cgi?id=187571

I'm on HSW, and 4.9-rc5 is flooding dmesg with the subject matter. This did not occur on 4.8-rc4 that I somehow ended up running without a reboot for 70 days straight (it set a new `uptimed` record).

4.9 also surprised me by not recognizing HDMI connector unplugging anymore. This hasn't occured for a long time and I've been on bleeding edge kernels here since mid-3.x.

When I unplug the monitor, no display re-configurations happen. Suspend-resume cycle helps restore connector state sanity, after wakeup the extra display is gone (verified in Gnome Display Settings).

...
nov   14 00:52:46 papaya kernel: [drm] Memory usable by graphics device = 2048M
nov   14 00:52:46 papaya kernel: [drm] VT-d active for gfx access
nov   14 00:52:46 papaya kernel: [drm] Replacing VGA console driver
nov   14 00:52:46 papaya kernel: [drm] DMAR active, disabling use of stolen memory
nov   14 00:52:46 papaya kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
nov   14 00:52:46 papaya kernel: [drm] Driver supports precise vblank timestamp query.
nov   14 00:52:46 papaya kernel: vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
nov   14 00:52:46 papaya kernel: ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
nov   14 00:52:46 papaya kernel: input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input6
nov   14 00:52:46 papaya kernel: [drm] Initialized i915 1.6.0 20160919 for 0000:00:02.0 on minor 0
nov   14 00:52:46 papaya kernel: ------------[ cut here ]------------
nov   14 00:52:46 papaya kernel: WARNING: CPU: 0 PID: 38 at drivers/gpu/drm/i915/intel_dp.c:1062 intel_dp_aux_transfer+0x1dc/0x220 [i915]
nov   14 00:52:46 papaya kernel: WARN_ON(!msg->buffer != !msg->size)
nov   14 00:52:46 papaya kernel: Modules linked in:
nov   14 00:52:46 papaya kernel:  crc32_pclmul i915 fbcon bitblit softcursor font intel_gtt sdhci_pci sdhci mmc_core drm_kms_helper cfbfillrect syscopyarea cfbimgblt ehci_pci sysfillrect sysimgblt ehci_hcd fb_sys_
nov   14 00:52:46 papaya kernel: CPU: 0 PID: 38 Comm: kworker/0:1 Tainted: G     U          4.9.0-rc5-bfq-gentoo+ #20
nov   14 00:52:46 papaya kernel: Hardware name: Dell Inc. Latitude E7440/0PC4X0, BIOS A18 04/28/2016
nov   14 00:52:46 papaya kernel: Workqueue: events i915_hotplug_work_func [i915]
nov   14 00:52:46 papaya kernel:  ffffc90000167b98 ffffffff8137adfb ffffc90000167be8 0000000000000000
nov   14 00:52:46 papaya kernel:  ffffc90000167bd8 ffffffff810523cc 00000426607500e1 ffffc90000167ca8
nov   14 00:52:46 papaya kernel:  ffff880407a800e0 0000000000000003 0000000000000000 ffff880407a80158
nov   14 00:52:46 papaya kernel: Call Trace:
nov   14 00:52:46 papaya kernel:  [<ffffffff8137adfb>] dump_stack+0x4d/0x72
nov   14 00:52:46 papaya kernel:  [<ffffffff810523cc>] __warn+0xcc/0xf0
nov   14 00:52:46 papaya kernel:  [<ffffffff8105243a>] warn_slowpath_fmt+0x4a/0x50
nov   14 00:52:46 papaya kernel:  [<ffffffffa021a31c>] ? intel_dp_aux_transfer+0xcc/0x220 [i915]
nov   14 00:52:46 papaya kernel:  [<ffffffffa021a42c>] intel_dp_aux_transfer+0x1dc/0x220 [i915]
nov   14 00:52:46 papaya kernel:  [<ffffffffa00f0bc8>] drm_dp_dpcd_access+0x58/0xf0 [drm_kms_helper]
nov   14 00:52:46 papaya kernel:  [<ffffffffa00f0c76>] drm_dp_dpcd_write+0x16/0x20 [drm_kms_helper]
nov   14 00:52:46 papaya kernel:  [<ffffffffa0215cc8>] intel_dp_start_link_train+0x2a8/0x4c0 [i915]
nov   14 00:52:46 papaya kernel:  [<ffffffffa0217106>] intel_dp_check_link_status+0xb6/0xf0 [i915]
nov   14 00:52:46 papaya kernel:  [<ffffffffa021ba0b>] intel_dp_detect+0x72b/0xbb0 [i915]
nov   14 00:52:46 papaya kernel:  [<ffffffffa02049ff>] i915_hotplug_work_func+0x1df/0x2b0 [i915]
nov   14 00:52:46 papaya kernel:  [<ffffffff8106a3a0>] process_one_work+0x140/0x3e0
nov   14 00:52:46 papaya kernel:  [<ffffffff8106a689>] worker_thread+0x49/0x480
nov   14 00:52:46 papaya kernel:  [<ffffffff8106a640>] ? process_one_work+0x3e0/0x3e0
nov   14 00:52:46 papaya kernel:  [<ffffffff8106a640>] ? process_one_work+0x3e0/0x3e0
nov   14 00:52:46 papaya kernel:  [<ffffffff8106f9a5>] kthread+0xc5/0xe0
nov   14 00:52:46 papaya kernel:  [<ffffffff8106f8e0>] ? kthread_park+0x60/0x60
nov   14 00:52:46 papaya kernel:  [<ffffffff81632fd2>] ret_from_fork+0x22/0x30
nov   14 00:52:46 papaya kernel: ---[ end trace 60f064180c1be639 ]---
nov   14 00:52:46 papaya kernel: ------------[ cut here ]------------
Comment 8 Mihai Dontu 2016-11-27 22:10:17 UTC
I can still see the warning with 4.9-rc7, but other than that no other visible issues (suspend/resume works OK on my HSW).
Comment 9 Martin Ziegler 2016-11-27 22:28:15 UTC
With 4.9-rc7 I still get aroiund 49 warnings at boot and
a crash of the X-server after hibernation:

Nov 27 23:20:04 kernel: [drm] GPU HANG: ecode 8:0:0x5d1a7470, in Xorg [2162], reason: Hang on render ring, action: reset
Nov 27 23:20:04 kernel: [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Nov 27 23:20:04 kernel: [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Nov 27 23:20:04 kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Nov 27 23:20:04 kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Nov 27 23:20:04 kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
Nov 27 23:20:04 kernel: drm/i915: Resetting chip after gpu hang
Comment 10 Jani Nikula 2016-11-28 09:37:48 UTC
I think this is a symptom of something worse, and our CI should catch this.

The GPU hang seen in comment #9 is unrelated, though.
Comment 11 Peter Ujfalusi 2016-12-20 11:14:45 UTC
Created attachment 128581 [details]
dmesg with drm.debug=0x1e log_buf_len=1M

It is still happening with 4.9 kernel: flood of WARN_ON(!msg->buffer != !msg->size) on Toshiba Satellite Z30-A during boot.
Note that I need i915.enable_psr=0 in order to boot since 4.6.
Comment 12 Peter Ujfalusi 2016-12-20 11:16:03 UTC
Created attachment 128582 [details]
dmesg with drm.debug=0x1e log_buf_len=1M

Different laptop with 4.9 kernel: flood of WARN_ON(!msg->buffer != !msg->size) on Dell Latitude E7440 during boot.
Comment 13 Martin Ziegler 2016-12-20 11:59:47 UTC
The patch
 
  http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-chris@chris-wilson.co.uk

applied to 4.9 solved my problem.
Comment 14 Jani Nikula 2016-12-20 14:17:38 UTC
(In reply to Martin Ziegler from comment #13)
> The patch
>  
>  
> http://patchwork.freedesktop.org/patch/msgid/20160921135108.29574-3-
> chris@chris-wilson.co.uk
> 
> applied to 4.9 solved my problem.

That should be totally unrelated.
Comment 15 Peter Ujfalusi 2016-12-20 14:20:58 UTC
If I disable:
CONFIG_DRM_FBDEV_EMULATION, CONFIG_FB and CONFIG_FRAMEBUFFER_CONSOLE the WARN_ON flood is gone, but obviously I no longer see the boot messages.
Comment 16 Andrey Vihrov 2017-01-14 08:55:19 UTC
With stable kernel 4.9.3 I can confirm that the warning is gone and the "failed to update link training" error is gone too on Intel HD Graphics 5500 (no external monitor).

4.9.3 includes this commit: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=d4cb3fd9b548b8bfe2a712ec920b9ebabd3547ab
Comment 17 Mihai Dontu 2017-01-14 14:43:44 UTC
I can confirm 4.9.3 fixes the issue for me too.
Comment 18 Peter Ujfalusi 2017-01-14 20:07:14 UTC
It is gone for me also with 4.9.3.

Regards,
Péter
Comment 19 yann 2017-01-16 08:49:37 UTC
So based on latest comments, it looks like this is fixed in upstream: resolving as fixed. 
mwa, please confirm that we can close it.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.