Summary: | i915 - GPU hung in i915_gem.c | ||
---|---|---|---|
Product: | Mesa | Reporter: | freedesktop |
Component: | Drivers/DRI/i965 | Assignee: | Ian Romanick <idr> |
Status: | RESOLVED WORKSFORME | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | astrand, ben, chris, daniel, jbarnes |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
freedesktop
2012-07-09 22:32:05 UTC
We need the i915_error_state file from debugfs (only contains useful data if you haven't rebooted since the last hang) to diagnose this issue. The OOPS is a bit worrying - can you please try to reproduce this on recent 3.5-rc kernel? Things are changing quite a bit in the relevant code, so only grabbing the OOPS some random stable kernel with unknown amounts of backported patches isn't too useful. Also, does the OOPS always happen together with the gpu hang? Have not rebooted. Here is it: http://links.flashdance.cx/i915_error_state.txt Well, the problem is that for many reasons I need to have this server running, so I have limited time to break it. When I tried I couldnt even get my own 3.3.1 kernel to boot. I guess stuff have changed in the last few years when creating kernels. But this GPU problem has existed at least since kernel 3.3.1. No, the OOPS doesnt always happend. Sometimes its just GPU hung and X freeze for about 2 sec. The screen gets black. Then everything is back to normal. Examples: Jul 9 10:53:58 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung <nothing more happends!> Jul 9 23:04:14 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung <nothing more happends!> Jul 3 13:50:44 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jul 3 13:50:44 flashdance kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state Jul 5 23:36:17 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jul 5 23:36:17 flashdance kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state notice that it didnt even say GPU hung here! Jul 6 00:07:05 flashdance kernel: ------------[ cut here ]------------ Jul 6 00:07:05 flashdance kernel: WARNING: at drivers/gpu/drm/i915/i915_gem.c:2410 i915_gem_object_put_fence+0xab/0xd0 [i915]() Jul 6 00:07:05 flashdance kernel: Hardware name: System Product Name Jul 6 00:07:05 flashdance kernel: Modules linked in: ext3 jbd uas usb_storage fuse tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs cpufreq_ondemand acpi_cpufreq freq_table mperf bridge stp llc ipv6 vhost_net $ Jul 6 00:07:05 flashdance kernel: mod [last unloaded: scsi_wait_scan] Jul 6 00:07:05 flashdance kernel: Pid: 9657, comm: Xorg Not tainted 3.4.4-1.el6.elrepo.x86_64 #1 Jul 6 00:07:05 flashdance kernel: Call Trace: Jul 6 00:07:05 flashdance kernel: [<ffffffff81050b8f>] warn_slowpath_common+0x7f/0xc0 Jul 6 00:07:05 flashdance kernel: [<ffffffff81050bea>] warn_slowpath_null+0x1a/0x20 Jul 6 00:07:05 flashdance kernel: [<ffffffffa0060b5b>] i915_gem_object_put_fence+0xab/0xd0 [i915] Jul 6 00:07:05 flashdance kernel: [<ffffffffa0062231>] i915_gem_object_unbind+0x91/0x240 [i915] Jul 6 00:07:05 flashdance kernel: [<ffffffffa00640a8>] i915_gem_evict_something+0x228/0x4a0 [i915] Jul 6 00:07:05 flashdance kernel: [<ffffffffa0061c52>] i915_gem_object_bind_to_gtt+0x1e2/0x480 [i915] Jul 6 00:07:05 flashdance kernel: [<ffffffffa0062e2f>] i915_gem_fault+0x23f/0x2c0 [i915] Jul 6 00:07:05 flashdance kernel: [<ffffffff8113b922>] __do_fault+0x72/0x5a0 Jul 6 00:07:05 flashdance kernel: [<ffffffff8113bf37>] handle_pte_fault+0xe7/0x210 Jul 6 00:07:05 flashdance kernel: [<ffffffff81047bb7>] ? pte_alloc_one+0x37/0x50 Jul 6 00:07:05 flashdance kernel: [<ffffffff811373e5>] ? __pte_alloc+0xa5/0x170 Jul 6 00:07:05 flashdance kernel: [<ffffffff8113c235>] handle_mm_fault+0x1d5/0x330 Jul 6 00:07:05 flashdance kernel: [<ffffffffa00633d0>] ? i915_gem_pread_ioctl+0x320/0x320 [i915] Jul 6 00:07:05 flashdance kernel: [<ffffffff8157145e>] do_page_fault+0x13e/0x460 Jul 6 00:07:05 flashdance kernel: [<ffffffff811860dc>] ? do_vfs_ioctl+0x8c/0x340 Jul 6 00:07:05 flashdance kernel: [<ffffffff8156dea5>] page_fault+0x25/0x30 Jul 6 00:07:05 flashdance kernel: ---[ end trace 3aa0b5e48548b2b9 ]--- Now X crasched so hard that I was logged out. Jul 11 20:15:48 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jul 11 20:15:49 flashdance console-kit-daemon[3531]: WARNING: Couldn't read /proc/9170/environ: Failed to open file '/proc/9170/environ': No such file or directory Jul 11 20:15:49 flashdance NetworkManager[2386]: <warn> error requesting auth for org.freedesktop.NetworkManager.network-control: (6) Remote Exception invoking org.freedesktop.PolicyKit1.Authority.CheckAuthorization() on /org/freedesktop/PolicyKit1/Authority at name org.freedesktop.PolicyKit1: org.freedesktop.DBus .Error.NameHasNoOwner: Remote Exception invoking org.freedesktop.DBus.GetConnectionUnixUser() on / at name org.freedesktop.DBus: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get UID of name ':1.649': no such name Jul 11 20:15:49 flashdance NetworkManager[2386]: <warn> error requesting auth for org.freedesktop.network-manager-settings.system.wifi.share.open: (6) Remote Exception invoking org.freedesktop.PolicyKit1.Authority.CheckAuthorization() on /org/freedesktop/PolicyKit1/Authority at name org.freedesktop.PolicyKit1: org .freedesktop.DBus.Error.NameHasNoOwner: Remote Exception invoking org.freedesktop.DBus.GetConnectionUnixUser() on / at name org.freedesktop.DBus: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get UID of name ':1.649': no such name Jul 11 20:15:49 flashdance NetworkManager[2386]: <warn> error requesting auth for org.freedesktop.network-manager-settings.system.wifi.share.protected: (6) Remote Exception invoking org.freedesktop.PolicyKit1.Authority.CheckAuthorization() on /org/freedesktop/PolicyKit1/Authority at name org.freedesktop.PolicyKit1 : org.freedesktop.DBus.Error.NameHasNoOwner: Remote Exception invoking org.freedesktop.DBus.GetConnectionUnixUser() on / at name org.freedesktop.DBus: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get UID of name ':1.649': no such name Jul 11 20:15:49 flashdance NetworkManager[2386]: <warn> User connections unavailable: (6) Remote Exception invoking org.freedesktop.PolicyKit1.Authority.CheckAuthorization() on /org/freedesktop/PolicyKit1/Authority at name org.freedesktop.PolicyKit1: org.freedesktop.DBus.Error.NameHasNoOwner: Remote Exception invo king org.freedesktop.DBus.GetConnectionUnixUser() on / at name org.freedesktop.DBus: org.freedesktop.DBus.Error.NameHasNoOwner: Could not get UID of name ':1.649': no such name Jul 11 20:15:50 flashdance NetworkManager[2386]: <error> [1342030550.97638] [nm-manager.c:1360] user_proxy_init(): could not init user settings proxy: (3) Could not get owner of name 'org.freedesktop.NetworkManagerUserSettings': no such name Jul 11 20:15:50 flashdance dbus-daemon: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.690" (uid=42 pid=6525 comm="gnome-power-manager) interface="org.freedesktop.Hal.Device.LaptopPanel" member="GetBrightness" error name="(unset)" requested_reply=0 destination=":1.6" (uid=0 pid=247 0 comm="hald)) Jul 11 20:15:50 flashdance dbus-daemon: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.690" (uid=42 pid=6525 comm="gnome-power-manager) interface="org.freedesktop.Hal.Device.LaptopPanel" member="SetBrightness" error name="(unset)" requested_reply=0 destination=":1.6" (uid=0 pid=247 0 comm="hald)) Jul 11 20:15:50 flashdance gdm-simple-greeter[6521]: Gtk-WARNING: gtkwidget.c:5460: widget not within a GtkWindow Jul 11 20:15:50 flashdance gdm-simple-greeter[6521]: WARNING: Unable to parse history: (null) 3#012 When I logged in again, desktop effects was disabled, its called "compositing" I belive. Thats bad but something I could live with, what I cant live with is that I cant play any movies anymore when this happend. Also, gdm is also started on a new tty. Before I had it on tty7 (moved it from tty1 that are default in CentOS nowadays).. And now its started on tty8 instead. The only way I know to fix this, is to reboot :( And thats really tragic. Is this GPU bug related to my compositing problem or is it something else? I think you know this better than me... Also, is it possible to reset this compositing WITHOUT rebooting? Here is the i915_error_state file from what just happend http://links.flashdance.cx/i915_error_state2.txt However, it doesnt differ from the http://links.flashdance.cx/i915_error_state.txt file... very strange, shouldnt it be new debug error state stuff in it when GPU bug happends the second time? Apparent hang in mesa, please make sure you have the latest drivers and see if you can identify the likely culprit. Ok. Yes I have the latest drivers. CentOS 6.2 was updated to CentOS 6.3 recently. Do you mean that the GPU hung problem is a mesa problem or that its two separate problems and just what I reported latest is a mesa problem that doesnt have anything to do with the GPU hung problem? The kernel is reporting that somebody hung the GPU and delivering the error state. That error state implicates mesa/i965 as the guilty party. If you can reproduce on the latest drivers you have available, we will be interested in knowing the details. I see. It hasnt happend for almost 6 days now but according my logs, it has happend before that it wasnt any problem for 6 days. Jun 26 -> Jul 3 is 6 days. I'll wait and see for a few days and see if it happends again. Now it happend again. Jul 18 07:35:25 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jul 18 07:35:25 flashdance kernel: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state http://links.flashdance.cx/i915_error_state3.txt Are there specific steps to reproduce the problem? Sadly no. Its random. Now it hasnt happend for 16 days, but I dont think its solved. Its just luck that it hasnt happend. Happend again. Interesting, now I cant even cat /sys/kernel/debug/dri/0/i915_error_state it just says like this: cat: /sys/kernel/debug/dri/0/i915_error_state: Cannot allocate memory This is what the kernel says: Aug 26 20:42:19 flashdance kernel: cat: page allocation failure: order:9, mode:0xd0 Aug 26 20:42:19 flashdance kernel: Pid: 16096, comm: cat Tainted: G W 3.4.4-1.el6.elrepo.x86_64 #1 Aug 26 20:42:19 flashdance kernel: Call Trace: Aug 26 20:42:19 flashdance kernel: [<ffffffff81118f53>] warn_alloc_failed+0xf3/0x160 Aug 26 20:42:19 flashdance kernel: [<ffffffff8156ca80>] ? _cond_resched+0x30/0x40 Aug 26 20:42:19 flashdance kernel: [<ffffffff8111b319>] ? __alloc_pages_direct_compact+0x1e9/0x1f0 Aug 26 20:42:19 flashdance kernel: [<ffffffff8111b74a>] __alloc_pages_slowpath+0x42a/0x710 Aug 26 20:42:19 flashdance kernel: [<ffffffff8111bc5a>] __alloc_pages_nodemask+0x22a/0x240 Aug 26 20:42:19 flashdance kernel: [<ffffffff8111bc5a>] ? __alloc_pages_nodemask+0x22a/0x240 Aug 26 20:42:19 flashdance kernel: [<ffffffff8115dbad>] kmem_getpages+0x6d/0x1a0 Aug 26 20:42:19 flashdance kernel: [<ffffffff8115e4f9>] fallback_alloc+0x199/0x270 Aug 26 20:42:19 flashdance kernel: [<ffffffff8115e296>] ____cache_alloc_node+0x96/0x160 Aug 26 20:42:19 flashdance kernel: [<ffffffff8115ed43>] __kmalloc+0x173/0x1f0 Aug 26 20:42:19 flashdance kernel: [<ffffffff81195b58>] ? seq_read+0x148/0x430 Aug 26 20:42:19 flashdance kernel: [<ffffffff81195b58>] seq_read+0x148/0x430 Aug 26 20:42:19 flashdance kernel: [<ffffffff81174605>] vfs_read+0xc5/0x190 Aug 26 20:42:19 flashdance kernel: [<ffffffff811747d1>] sys_read+0x51/0x90 Aug 26 20:42:19 flashdance kernel: [<ffffffff81575d29>] system_call_fastpath+0x16/0x1b Aug 26 20:42:19 flashdance kernel: Mem-Info: Aug 26 20:42:19 flashdance kernel: Node 0 DMA per-cpu: Aug 26 20:42:19 flashdance kernel: CPU 0: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 1: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 2: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 3: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 4: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 5: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 6: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 7: hi: 0, btch: 1 usd: 0 Aug 26 20:42:19 flashdance kernel: Node 0 DMA32 per-cpu: Aug 26 20:42:19 flashdance kernel: CPU 0: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 1: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 2: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 3: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 4: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 5: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 6: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 7: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: Node 0 Normal per-cpu: Aug 26 20:42:19 flashdance kernel: CPU 0: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 1: hi: 186, btch: 31 usd: 177 Aug 26 20:42:19 flashdance kernel: CPU 2: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 3: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 4: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 5: hi: 186, btch: 31 usd: 31 Aug 26 20:42:19 flashdance kernel: CPU 6: hi: 186, btch: 31 usd: 0 Aug 26 20:42:19 flashdance kernel: CPU 7: hi: 186, btch: 31 usd: 166 Aug 26 20:42:19 flashdance kernel: active_anon:2537617 inactive_anon:399908 isolated_anon:6 Aug 26 20:42:19 flashdance kernel: active_file:242613 inactive_file:4677477 isolated_file:0 Aug 26 20:42:19 flashdance kernel: unevictable:0 dirty:96 writeback:0 unstable:0 Aug 26 20:42:19 flashdance kernel: free:128866 slab_reclaimable:124585 slab_unreclaimable:36887 Aug 26 20:42:19 flashdance kernel: mapped:37887 shmem:94738 pagetables:23039 bounce:0 Aug 26 20:42:19 flashdance kernel: Node 0 DMA free:15904kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unrecla imable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Aug 26 20:42:19 flashdance kernel: lowmem_reserve[]: 0 3195 32231 32231 Aug 26 20:42:19 flashdance kernel: Node 0 DMA32 free:196584kB min:6696kB low:8368kB high:10044kB active_anon:428360kB inactive_anon:521036kB active_file:61224kB inactive_file:1966068kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3272368kB mlocked:0kB dirty:0kB writeback:0kB mapped:3760kB shmem:13 64kB slab_reclaimable:72540kB slab_unreclaimable:3160kB kernel_stack:392kB pagetables:1348kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Aug 26 20:42:19 flashdance kernel: lowmem_reserve[]: 0 0 29035 29035 Aug 26 20:42:19 flashdance kernel: Node 0 Normal free:302976kB min:60852kB low:76064kB high:91276kB active_anon:9722108kB inactive_anon:1078596kB active_file:909228kB inactive_file:16743840kB unevictable:0kB isolated(anon):24kB isolated(file):0kB present:29732380kB mlocked:0kB dirty:384kB writeback:0kB mapped:1477 88kB shmem:377588kB slab_reclaimable:425800kB slab_unreclaimable:144388kB kernel_stack:5392kB pagetables:90808kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Aug 26 20:42:19 flashdance kernel: lowmem_reserve[]: 0 0 0 0 Aug 26 20:42:19 flashdance kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB Aug 26 20:42:19 flashdance kernel: Node 0 DMA32: 8569*4kB 4062*8kB 1695*16kB 844*32kB 443*64kB 206*128kB 66*256kB 8*512kB 0*1024kB 0*2048kB 0*4096kB = 196612kB Aug 26 20:42:19 flashdance kernel: Node 0 Normal: 36632*4kB 2919*8kB 1819*16kB 1123*32kB 534*64kB 184*128kB 33*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 302632kB Aug 26 20:42:19 flashdance kernel: 5014841 total pagecache pages Aug 26 20:42:19 flashdance kernel: 0 pages in swap cache Aug 26 20:42:19 flashdance kernel: Swap cache stats: add 0, delete 0, find 0/0 Aug 26 20:42:19 flashdance kernel: Free swap = 2095100kB Aug 26 20:42:19 flashdance kernel: Total swap = 2095100kB Aug 26 20:42:19 flashdance kernel: 8388080 pages RAM Aug 26 20:42:19 flashdance kernel: 150614 pages reserved Aug 26 20:42:19 flashdance kernel: 1102279 pages shared Aug 26 20:42:19 flashdance kernel: 7172629 pages non-shared Aug 26 20:42:19 flashdance kernel: SLAB: Unable to allocate memory on node 0 (gfp=0xd0) Aug 26 20:42:19 flashdance kernel: cache: size-2097152, object size: 2097152, order: 9 Aug 26 20:42:19 flashdance kernel: node 0: slabs: 0/0, objs: 0/0, free: 0 Aug 26 20:42:26 flashdance kernel: cat: page allocation failure: order:9, mode:0xd0 [iocc@flashdance log]$ grep GPU messages* messages:Aug 26 18:07:16 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung messages-20120826:Aug 25 12:43:15 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung messages-20120826:Aug 25 20:28:47 flashdance kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Cant use an newer kernel as the keyboard doesnt work at boot so I cant write encryption pass. I believe I have the same problem, reported here: http://bugs.centos.org/view.php?id=5911 I have since tried tweaking my xorg.conf: Section "Module" Load "record" # Load "dri2" # Load "dri" Load "glx" Load "extmod" Load "dbe" EndSection ... Section "Device" ### Available Driver options are:- ### Values: <i>: integer, <f>: float, <bool>: "True"/"False", ### <string>: "String", <freq>: "<f> Hz/kHz/MHz", ### <percent>: "<f>%" ### [arg]: arg optional Option "DRI" "false" #Option "ColorKey" # <i> #Option "VideoKey" # <i> #Option "FallbackDebug" # [<bool>] #Option "Tiling" # [<bool>] #Option "LinearFramebuffer" # [<bool>] #Option "Shadow" # [<bool>] #Option "SwapbuffersWait" # [<bool>] #Option "TripleBuffer" # [<bool>] #Option "XvMC" # [<bool>] #Option "XvPreferOverlay" # [<bool>] #Option "DebugFlushBatches" # [<bool>] #Option "DebugFlushCaches" # [<bool>] #Option "DebugWait" # [<bool>] #Option "HotPlug" # [<bool>] #Option "RelaxedFencing" # [<bool>] Identifier "Card0" Driver "intel" BusID "PCI:0:2:0" EndSection Does not help. Any other options I can try? Peter Åstrand, please file a separate bug report for your issue - I don't see any evidence that your gpu hang has the same cause as this one here. And mixing things up in this fashion generally leads to decent confusion. Thanks. If anyone can reproduce, please file a new bug. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.