Summary: | drm/i915 GPU Hang in Artful Advark 17.10 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Luka Paunovic <internetfazoni> | ||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Severity: | major | ||||||||||||
Priority: | high | CC: | elizabethx.de.la.torre.mena, intel-gfx-bugs, internetfazoni, jeparre, marc, omega | ||||||||||
Version: | XOrg git | ||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | ReadyForDev | ||||||||||||
i915 platform: | I965GM | i915 features: | GPU hang | ||||||||||
Attachments: |
|
Also, I have to add.. after I log out and log in again or restart lightdm service (which also requires me to log in again) everything works really best and after 10-15 minutes everything starts happening again :/ So weird, and so annoying, and it's preventing me to use my laptop. I hope this is fixable, also I forgot to mention I have disabled my internal monitor from grub with: video=LVDS-1:d and I am using my external one VGA1 That's error state is from -modesetting... Might be worth attaching the xorg.log to confirm. here you are Xorg.0.log https://pastebin.com/P9UuWhxe Xorg.0.log.old https://pastebin.com/uC1zTZCw I just caught this in XORG LOG when I started having issues again (EE) intel(0): Failed to submit rendering commands (No such file or directory), disabling acceleration. What this mean? How to fix this. Hello, just in case, what Mesa version do you have? And if reproducible a dmesg with debug info, drm.debug=0xe on grub, may be helpful. scorpius@scorpius-Vostro-A860:~$ glxinfo | grep "OpenGL version" OpenGL version string: 2.1 Mesa 17.2.2 I have enabled debug as you told me in grub, I am now waiting for issue to come up (if it does :() and I will send the dmesg output. Created attachment 135201 [details]
DMESG LOG
I have succeeded making bug appear again with Geeks3D GpuTest - GPU monitoring
Here is dmesg log
@Elizabeth this is a severe issue. Is there going to be a fix for this? (In reply to Luka Paunovic from comment #8) > @Elizabeth this is a severe issue. Is there going to be a fix for this? Good afternoon Luka, Please retest with tip branch https://cgit.freedesktop.org/drm-tip, here are the latest commits that are being developed, particulary https://bugs.freedesktop.org/show_bug.cgi?id=103502#c3 commit 1d033beb20d6d5885587a02a393b6598d766a382 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Oct 31 10:36:07 2017 +0000 drm/i915: Check incoming alignment for unfenced buffers (on i915gm) May help with Active (rcs0) [59]: 00000000_020ef000 8294400 7e 00 [ 1d404 00 00 00 00 ] 00 X dirty uncached (fence: 7) 00000000_00c4d000 524288 7e 00 [ 1d405 00 00 00 00 ] 00 X dirty uncached (fence: 8) 00000000_00fe6000 327680 7e 00 [ 1d405 00 00 00 00 ] 00 X dirty uncached (fence: 12) and commit e5330ac1f50b897d245753828e8887f297f69dd0 (Patch) author Chris Wilson <chris@chris-wilson.co.uk> 2017-10-31 12:22:35 (GMT) committer Chris Wilson <chris@chris-wilson.co.uk> 2017-11-01 13:43:14 (GMT) drm/i915: Check that the breadcrumb wasn't disarmed automatically before parking with [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x60/0x80 [i915], irq posted? yes, current seqno=185cf, last=185d3 Also from error state, this is the latest instruction before gpu hang: 0x00070c98: 0x54f08806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 1, dst tile 1) 0x00070c9c: 0x03cc1400: format 8888, pitch 5120, rop 0xcc, clipping disabled, 0x00070ca0: 0x00000044: dst (68,0) 0x00070ca4: 0x00010045: dst (69,1) 0x00070ca8: 0x1da93000: dst offset 0x1da93000 0x00070cac: 0x00000000: src (0,0) 0x00070cb0: 0x00000080: src pitch 128 0x00070cb4: 0x0001c000: src offset 0x0001c000 BR, Elizabeth. > Please retest with tip branch
I do not know how to do that.
Can you please tell me when will the fix be available for Ubuntu Artful Aardvark from the official repositories?
You harry along the distribution; they should make sure that the fix is pushed out in a timely manner. To fix it yourself, you either grab a ppa that follows drm-tip (now is not the greatest moment since 4.15-rc1 is proving to be a rough ride), or roll back to the previous kernel. (Gah, wrong bug. This is not the 915gm bug who I thought was asking where they could find the fixed kernel.) (In reply to Chris Wilson from comment #12) > (Gah, wrong bug. This is not the 915gm bug who I thought was asking where > they could find the fixed kernel.) Can you please give me ANY ETA when will those fixes be available in "INTEL GRAPHICS UPDATE TOOL FOR LINUX* OS V2.0.6" Will that be when a new version of the tool is released or is it possible that fixes come even with the current version? Chris, can you please tell me to which kernel version can I downgrade in order to get my pc to work normally again. It's terrible I am programmer and Linux sysadmin and I have to work, this happens CONSTANTLY I CONSTANTLY have to stop my work and restart lightdm. I lost willingness to work because of this. This taught me to NEVER upgrade the kernel on my desktop PC again unless I'm upgrading to newer version of the distro! I can't believe something like this (bug) even got in the repo. And this is the latest version I upgraded to and the issue is still present Linux scorpius-Vostro-A860 4.13.0-19-generic #22-Ubuntu SMP Mon Dec 4 11:58:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux @Luka: Linux sauron 4.8.0-46-generic #49-Ubuntu SMP Fri Mar 31 13:57:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux works for me. Latest kernel does not. @Elisabeth: where do I get the package for testing the fix you provided? @Luka: Linux sauron 4.8.0-46-generic #49-Ubuntu SMP Fri Mar 31 13:57:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux works for me. Latest kernel does not. @Elisabeth: where do I get the package for testing the fix you provided? (In reply to omega from comment #17) > @Luka: Linux sauron 4.8.0-46-generic #49-Ubuntu SMP Fri Mar 31 13:57:14 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux works for me. Latest kernel does not. > > @Elisabeth: where do I get the package for testing the fix you provided? Thanks for the information, Omega. Luka, could you try it? You can download latest stable or mainline from https://www.kernel.org and build it. This release has the latest fixes merged upstream, and you can build the package using this guide, section building kernel, steps 2 to 5. It may take a while to compile though. FWIW, https://cgit.freedesktop.org/drm-tip has all the latest changes developed even the ones that aren't upstream yet, you also can try this branch if you have the time. Installed Debian kernel package from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14.7/ Kernel is latest stable: Linux version 4.14.7-041407-generic (kernel@gloin) (gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3)) #201712171031 SMP Sun Dec 17 15:33:35 UTC 2017 X does not fire up. Kernel log says: Dec 18 18:06:33 sauron kernel: [ 25.790562] [drm] GPU HANG: ecode 7:0:0x85dffffc, in Xorg [1181], reason: Hang on rcs0, action: reset Dec 18 18:06:33 sauron kernel: [ 25.790563] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. Dec 18 18:06:33 sauron kernel: [ 25.790563] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel Dec 18 18:06:33 sauron kernel: [ 25.790563] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. Dec 18 18:06:33 sauron kernel: [ 25.790564] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. Dec 18 18:06:33 sauron kernel: [ 25.790564] [drm] GPU crash dump saved to /sys/class/drm/card0/error Dec 18 18:06:33 sauron kernel: [ 25.790614] i915 0000:00:02.0: Resetting chip after gpu hang Dec 18 18:06:34 sauron kernel: [ 26.204998] random: crng init done Dec 18 18:06:41 sauron kernel: [ 33.756066] i915 0000:00:02.0: Resetting chip after gpu hang Dec 18 18:06:49 sauron kernel: [ 41.756094] i915 0000:00:02.0: Resetting chip after gpu hang Dec 18 18:06:57 sauron kernel: [ 49.754415] i915 0000:00:02.0: Resetting chip after gpu hang Dec 18 18:07:05 sauron kernel: [ 57.753701] i915 0000:00:02.0: Resetting chip after gpu hang Dec 18 18:07:13 sauron kernel: [ 65.785453] i915 0000:00:02.0: Resetting chip after gpu hang Dec 18 18:07:27 sauron kernel: [ 79.801912] i915 0000:00:02.0: Resetting chip after gpu hang (In reply to omega from comment #19) > Installed Debian kernel package from here: > http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14.7/ > > Kernel is latest stable: > > Linux version 4.14.7-041407-generic (kernel@gloin) (gcc version 7.2.0 > (Ubuntu 7.2.0-8ubuntu3)) #201712171031 SMP Sun Dec 17 15:33:35 UTC 2017 > > X does not fire up. Kernel log says: > > Dec 18 18:06:33 sauron kernel: [ 25.790562] [drm] GPU HANG: ecode > 7:0:0x85dffffc, in Xorg [1181], reason: Hang on rcs0, action: reset... That looks different. Could you please try latest Mesa 17.3 release? Is your xorg 1.9? Updated X to 1.19.5 and Mesa to 17.2.4 (the latest available for Ubuntu). Still no joy. Created attachment 136252 [details]
/sys/class/drm/card0/error with X 1.19 and Mesa 17.2.4
Added /sys/class/drm/card0/error with X 1.19 and Mesa 17.2.4
I am no kernel developer, but is this so hard to fix? What's the catch here? I mean is it the lack of funds or what. It looks like this has been fixed in Linux scorpius-Vostro-A860 4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Is it? Or am I just hallucinating?! I tried to trigger the bug with the Geeks3D GpuTest (Stress test) and I couldn't do it. So probably it's fixed? Anyone? It's not fixed :( :(((((((((((( Still not fixed in 4.13.0-25-generic Damn Intel.......... Even though this is not fixed I found a workaround. I was playing with the settings and I realized that UXA is more stable than SNA. The bug didn't occur with UXA cat /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" #Option "TearFree" "true" #Option "AccelMethod" "sna" Option "AccelMethod" "uxa" Option "DRI" "3" EndSection The proposed change of AccelMethod does not work for me. Intel(R) HD Graphics 4600 in Intel(R) Core(TM) i7-4790K CPU Linux kernel: 4.15.2-041502-generic #201802072230 SMP First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. The issue is still present in a fully updated Ubuntu 17.10 with the newest mainline kernel 4.16rc7 from http://kernel.ubuntu.com/~kernel-ppa/mainline/ Yes this is still a bug. I tried switching back to sna cuz uxa sucks. But with uxa I do not have bug. It's still present with SNA OK, thanks for the feedback. Chris, any advice here? Chris? Sorry for the delay... Luka, Do you still have the issue? Please try to reproduce the issue using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot. Will this be merged into the mainline kernel? Ubuntu has not yet the latest commit: https://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack/log/ Would be much much less effort for me to use a prebuild kernel from the Ubuntu mainline repo instead of patching and building a kernel myself. Created attachment 141685 [details]
gzipped dmesg from boot
I tried Kernel 4.19-rc4 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc4/ with kernel parameters drm.debug=0x1e log_buf_len=4M. The boot process gets past the console, X server fires up but starts to hang and then crashes. Attached please find dmesg as requested. I installed kernel 4.19.0-041900rc6-generic #201809301631. This seems to fix the issue. # dmesg|grep "\(i915\|drm\)" [ 2.324905] fb: switching to inteldrmfb from VESA VGA [ 2.324969] [drm] Replacing VGA console driver [ 2.325456] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 2.325456] [drm] Driver supports precise vblank timestamp query. [ 2.325620] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem [ 2.328116] [drm] Initialized i915 1.6.0 20180719 for 0000:00:02.0 on minor 0 [ 2.352467] fbcon: inteldrmfb (fb0) is primary device [ 2.402622] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device (In reply to omega from comment #38) > I installed kernel 4.19.0-041900rc6-generic #201809301631. This seems to fix > the issue. Thanks for the feedback. Closing this bug as Fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 135158 [details] /sys/class/drm/card0/error I have installed fresh Ubuntu Mate 17.10 - Artful Advark. This is my Intel configuration file: $ cat /etc/X11/xorg.conf.d/20-intel.conf Section "Device" Identifier "Intel Graphics" Driver "intel" Option "TearFree" "true" Option "AccelMethod" "sna" Option "DRI" "3" EndSection I have installed latest available drivers using Intel Graphics Update Tools for Linux Because tool wasn't able to run on 17.10 I temporary changed /etc/lsb-release to corespond 17.04 Zesty Zapus and then I successfully installed drivers using tool. After all this trouble I still often have issues. My screen randomly goes black for 10 seconds because of GPU HANG. Also after some time elements rapidly flicker/disappear in programs which use HW acceleration (mostly chrome, disabling hardware acceleration is not an option because chrome works terrible) VGA adapter info: $ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) (rev 0c) Dmesg: [ 1964.877703] [drm] GPU HANG: ecode 4:0:0x54f4e8fb, in Xorg [874], reason: Hang on rcs0, action: reset [ 1964.877707] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [ 1964.877708] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [ 1964.877709] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [ 1964.877710] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [ 1964.877711] [drm] GPU crash dump saved to /sys/class/drm/card0/error [ 1964.919361] drm/i915: Resetting chip after gpu hang [ 1972.939781] drm/i915: Resetting chip after gpu hang [ 2004.879875] drm/i915: Resetting chip after gpu hang [ 2258.924142] drm/i915: Resetting chip after gpu hang [ 2376.394596] perf: interrupt took too long (7689 > 7688), lowering kernel.perf_event_max_sample_rate to 26000 [ 2417.923699] drm/i915: Resetting chip after gpu hang [ 2708.941780] drm/i915: Resetting chip after gpu hang [ 2738.869020] drm/i915: Resetting chip after gpu hang [ 2760.862012] drm/i915: Resetting chip after gpu hang [ 2770.846041] drm/i915: Resetting chip after gpu hang [ 2780.862186] drm/i915: Resetting chip after gpu hang CRASH DUMP IS IN ATTACHMENT