Summary: | Xorg intermittently freezes in drm_intel_gem_bo_start_gtt_access | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Peter <pva> | ||||||||||||||||||||||||||||||
Component: | Driver/intel | Assignee: | Eric Anholt <eric> | ||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||||||||||||||||
Severity: | critical | ||||||||||||||||||||||||||||||||
Priority: | medium | CC: | bugzilla, Magnus.Kessler, n-roeser, yohan.bataille | ||||||||||||||||||||||||||||||
Version: | 7.4 (2008.09) | ||||||||||||||||||||||||||||||||
Hardware: | Other | ||||||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||||||
Bug Depends on: | 22374 | ||||||||||||||||||||||||||||||||
Bug Blocks: | 19675 | ||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Peter
2009-03-09 14:26:46 UTC
Now this happened another time. This time I got bt: 0x00007f8c64f463a7 in ioctl () from /lib/libc.so.6 (gdb) bt #0 0x00007f8c64f463a7 in ioctl () from /lib/libc.so.6 #1 0x00007f8c632604fd in drm_intel_gem_bo_start_gtt_access (bo=0x43be630, write_enable=<value optimized out>) at intel_bufmgr_gem.c:835 #2 0x00007f8c52438c81 in intelFinish (ctx=<value optimized out>) at intel_context.c:535 #3 0x00007f8c63d59f96 in __glXDisp_SwapBuffers (cl=0x40ef008, pc=<value optimized out>) at glxcmds.c:1431 #4 0x00007f8c63d5d282 in __glXDispatch (client=0x414eb10) at glxext.c:512 #5 0x000000000044c2f4 in Dispatch () at dispatch.c:454 #6 0x0000000000432b3d in main (argc=9, argv=0x7fff6ef411f8, envp=<value optimized out>) at main.c:438 (gdb) bt full #0 0x00007f8c64f463a7 in ioctl () from /lib/libc.so.6 No symbol table info available. #1 0x00007f8c632604fd in drm_intel_gem_bo_start_gtt_access (bo=0x43be630, write_enable=<value optimized out>) at intel_bufmgr_gem.c:835 bufmgr_gem = (drm_intel_bufmgr_gem *) 0x3fbd810 set_domain = {handle = 3, read_domains = 64, write_domain = 0} ret = <value optimized out> #2 0x00007f8c52438c81 in intelFinish (ctx=<value optimized out>) at intel_context.c:535 irb = (struct intel_renderbuffer *) 0xfffffffffffffe00 fb = (struct gl_framebuffer *) 0x4363060 i = 0 #3 0x00007f8c63d59f96 in __glXDisp_SwapBuffers (cl=0x40ef008, pc=<value optimized out>) at glxcmds.c:1431 client = (ClientPtr) 0x414eb10 tag = 1 drawId = 155 glxc = (__GLXcontext *) 0x413a820 pGlxDraw = <value optimized out> error = <value optimized out> #4 0x00007f8c63d5d282 in __glXDispatch (client=0x414eb10) at glxext.c:512 stuff = (xGLXSingleReq *) 0x6ccc578 opcode = <value optimized out> cl = (__GLXclientState *) 0x40ef008 retval = 1 #5 0x000000000044c2f4 in Dispatch () at dispatch.c:454 result = 0 client = (ClientPtr) 0x414eb10 nready = 0 start_tick = 5320 #6 0x0000000000432b3d in main (argc=9, argv=0x7fff6ef411f8, envp=<value optimized out>) at main.c:438 i = 1 error = 0 xauthfile = <value optimized out> alwaysCheckForInput = {0, 1} It's a bit different then at Ubuntu bugzilla but still very similar. I get something similar with an 945GM running KWin's OpenGL compositor: #0 0x00a33584 in ioctl () from /lib/libc.so.6 #1 0x00f91fdf in drm_intel_gem_bo_start_gtt_access () #2 0x00f92095 in drm_intel_gem_bo_wait_rendering () #3 0x00f8f272 in drm_intel_bo_wait_rendering () #4 0x006a2bd9 in intelFinish () from /usr/lib/dri/i915_dri.so #5 0x0100fb71 in _mesa_Finish () from /usr/lib/dri/libdricore.so #6 0x00529d80 in __glXDisp_WaitGL () #7 0x0052e650 in __glXDispatch () #8 0x08086857 in Dispatch () #9 0x0806bb5d in main () This was on xorg-server 1.6.0-2 on fc11. by the way killing X in this case with "kill -9" crashes my system completly. I see the terminal, the mouse pointer atop it and cannot reach the system even over ssh. Adjusting severity: crashes & hangs should be marked critical. For GPU hangs like this (mouse moves but X doesn't respond to input), please attach the output of intel_gpu_dump so we can take a look at what the card is stuck on. Created attachment 25838 [details]
after_boot.dump.gz
To make intel_gpu_dump workable I had to update kernel to the drm-intel-next branch from here: git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git
In this attachment you find output of intel_gpu_dump just after boot (Xorg was not even started).
Created attachment 25839 [details]
intel-freeze.dump.gz
And this attachment is dump after freeze.
*** Bug 21639 has been marked as a duplicate of this bug. *** Could you try with kernel 2.6.30? Wondering if the gtt-mapping-versus-swap fixes helped. (Oh, and thanks for the dump with the freeze -- it definitely confirmed that you don't have one of various popular hangs that we've fixed recently.) Thank you Eric. I'll try as soon as I manage... adding dependency on bug 22374. (In reply to comment #10) > (Oh, and thanks for the dump with the freeze -- it definitely confirmed that > you don't have one of various popular hangs that we've fixed recently.) > Is the fullscreen hang on bug #22225 the same one? Looks like it might be a dupe. The freeze is still here. But note, backtrace if a bit different now: (gdb) bt #0 0x00007fe1a5fb1227 in ioctl () from /lib/libc.so.6 #1 0x00007fe1a42cd6fd in drm_intel_gem_bo_start_gtt_access (bo=0x4587d80, write_enable=<value optimized out>) at intel_bufmgr_gem.c:892 #2 0x00007fe1a3bef051 in intelFinish (ctx=<value optimized out>) at intel_context.c:582 #3 0x00007fe1a49bbf15 in __glXDisp_CopySubBufferMESA (cl=0x4b92ec0, pc=<value optimized out>) at glxcmds.c:1628 #4 0x00007fe1a49bb0e2 in __glXDisp_VendorPrivate (cl=0x4b92ec0, pc=0x55a2208 "\230\020\b") at glxcmds.c:2268 #5 0x00007fe1a49bf552 in __glXDispatch (client=0x4758650) at glxext.c:541 #6 0x000000000044d454 in Dispatch () at dispatch.c:437 #7 0x000000000043344d in main (argc=9, argv=0x7fff401e6108, envp=<value optimized out>) at main.c:397 (gdb) bt full #0 0x00007fe1a5fb1227 in ioctl () from /lib/libc.so.6 No symbol table info available. #1 0x00007fe1a42cd6fd in drm_intel_gem_bo_start_gtt_access (bo=0x4587d80, write_enable=<value optimized out>) at intel_bufmgr_gem.c:892 bufmgr_gem = (drm_intel_bufmgr_gem *) 0x450a360 set_domain = {handle = 955, read_domains = 64, write_domain = 0} ret = <value optimized out> #2 0x00007fe1a3bef051 in intelFinish (ctx=<value optimized out>) at intel_context.c:582 irb = (struct intel_renderbuffer *) 0xfffffffffffffe00 fb = (struct gl_framebuffer *) 0x46c9f90 i = 0 #3 0x00007fe1a49bbf15 in __glXDisp_CopySubBufferMESA (cl=0x4b92ec0, pc=<value optimized out>) at glxcmds.c:1628 tag = 1 glxc = (__GLXcontext *) 0x46d9e80 pGlxDraw = <value optimized out> client = (ClientPtr) 0x4758650 drawId = 107 error = <value optimized out> x = 0 y = 0 width = 1400 height = 1016 #4 0x00007fe1a49bb0e2 in __glXDisp_VendorPrivate (cl=0x4b92ec0, pc=0x55a2208 "\230\020\b") at glxcmds.c:2268 No locals. #5 0x00007fe1a49bf552 in __glXDispatch (client=0x4758650) at glxext.c:541 stuff = (xGLXSingleReq *) 0x55a2208 opcode = <value optimized out> cl = (__GLXclientState *) 0x4b92ec0 retval = 1 #6 0x000000000044d454 in Dispatch () at dispatch.c:437 result = 0 client = (ClientPtr) 0x4758650 nready = 0 start_tick = 7420 #7 0x000000000043344d in main (argc=9, argv=0x7fff401e6108, envp=<value optimized out>) at main.c:397 i = 1 alwaysCheckForInput = {0, 1} Created attachment 27048 [details]
intel-gpu-dump.dump.gz
Updated gpu dump.
I just noticed that you're using DRI1. You'll need to reproduce this with UXA and DRI2, as that fixed many GPU hangs along with fixing correctness of compositing. That Xorg.0.log is outdated and it means that bug exists in both cases (DRI1 and DRI2). I'll attach recent Xorg.0.log. Hm, well I'll update bug this evening with the current git versions of everything. Hiding from your search until then. Ok, here is update. Still the same story with updated git mesa(7.4 branch)/libdrm/xf86-intel-video(2.7 branch)/xorg-server(1.6 branch). xorg hangs with the following backtrace: (gdb) bt #0 0x00007fad03559227 in ioctl () from /lib/libc.so.6 #1 0x00007fad018716fd in drm_intel_gem_bo_start_gtt_access (bo=0x5076e20, write_enable=<value optimized out>) at intel_bufmgr_gem.c:873 #2 0x00007fad01191051 in intelFinish (ctx=<value optimized out>) at intel_context.c:582 #3 0x00007fad0257af15 in __glXDisp_CopySubBufferMESA (cl=0x5187f50, pc=<value optimized out>) at glxcmds.c:1633 #4 0x00007fad0257a0e2 in __glXDisp_VendorPrivate (cl=0x5187f50, pc=0x58b7bd4 "\230\020\b") at glxcmds.c:2273 #5 0x00007fad0257e572 in __glXDispatch (client=0x50fe8c0) at glxext.c:541 #6 0x000000000044d464 in Dispatch () at dispatch.c:437 #7 0x000000000043344d in main (argc=9, argv=0x7fff4d6ccb48, envp=<value optimized out>) at main.c:397 Created attachment 27492 [details]
Xorg.0.log
Created attachment 27493 [details]
gpu_dump.out.gz
updated Xorg.0.log and gpu_dump.
(In reply to comment #18) > Ok, here is update. Still the same story with updated git mesa(7.4 > branch)/libdrm/xf86-intel-video(2.7 branch)/xorg-server(1.6 branch). 2.7 branch has been outdated. Could you try master branch? better with mesa_7_5_branch. drm-intel-next is still the good choice. Created attachment 27521 [details]
Xorg.0.log
Ok, I've updated mesa, libdrm, xf86-video-intel, xorg-server, inputproto, xineramaproto, pixman, xf86-input-evdev, xf86-input-synaptics to the recent git (mesa mesa_7_5_branch, all others use master). But it still fails. Here is Xorg.0.log.
Created attachment 27522 [details]
gpu_dump.out.gz
And gpu_dump for updated configuration.
That's an interesting dump -- the head pointer is off in the weeds (not in any known batchbuffers), which is pretty strange. I've been running compiz all day hoping to reproduce the problem myself with no luck. Can you correlate the failure to any particular activity you're doing? Are any of your compiz settings non-default? (oops, note to self: the HEAD pointer is in the ringbuffer, not some mystery batchbuffer) (In reply to comment #24) Currently I'm unable to check if I have any non-default setting, but if you need me to check that just ask and I'll do that next week. Anyway I don't remember I've changed anything. X hangs practically immediately: I just start fusion-icon (well, probably I forgot to say in this bug about fusion-icon: I start compiz with it) and sometimes it hangs even before decorator starts, sometimes I need to move windows a bit or expand on full desktop and back to initial size. On the other hand I've tried to play nexuiz, and while it was impossible to play (everything too slow) I was able to start game and close it without any hangs... Also I can tar full filesystem so you could try to reproduce problem with the system I have. But again, this may happen not earlier then next week. Created attachment 27733 [details]
gpu dump
Hello:
I don't know if I'm on time to provide further information but just in case, this is my backtrace:
#0 0x00007f49c484f7d7 in ioctl () from /lib/libc.so.6
#1 0x00007f49c25c860d in drm_intel_gem_bo_start_gtt_access (bo=0x4f7d560, write_enable=<value optimized out>)
at ../../../libdrm/intel/intel_bufmgr_gem.c:892
#2 0x00007f49b193eef1 in intelFinish () from /usr/lib/dri/i965_dri.so
#3 0x00007f49c32c51f7 in __glXDisp_WaitGL (cl=<value optimized out>, pc=<value optimized out>) at ../../glx/glxcmds.c:756
#4 0x00007f49c32c9452 in __glXDispatch (client=0x4f82720) at ../../glx/glxext.c:541
#5 0x000000000044d0a4 in Dispatch () at ../../dix/dispatch.c:437
#6 0x000000000043320a in main (argc=8, argv=0x7fff25930e98, envp=<value optimized out>) at ../../dix/main.c:397
I get this when I go away from keyboard for a while, when I'm back I get a black screen. System still works but screen is unusable. Tried to go to a text console, doing vbetool post remotely... both failed.
Debian sid with self build 2.6.30.1, no KMS, mesa 7.4.4, xserver 1.6.1.901, intel driver 2.7.99.901.
Let me know if I can provide further information.
HTH,
OK, I've pushed a commit series that should fix GPU hangs with DRI2 in use. The symptoms in the GPU dumps are unreliable, as sometimes the corruption would occur in a state buffer not dumped in the intel_gpu_dump output, but I've definitely seen the symptoms of this in a number of the reports. Please retest with this set of changes, and reopen if the problem continues. (In your case, your dump shows a depth buffer setup that would have drawn about 30 pages outside of its allocation!) xf86-video-intel: commit e8f0763d405a8152c74c28792c52fe12c1d41dd5 Author: Eric Anholt <eric@anholt.net> Date: Fri Aug 7 18:24:44 2009 -0700 Fix math in the tiling alignment fix. commit 222b52ef16895823fbf3a0fc0be4eb23b930ed1b Author: Eric Anholt <eric@anholt.net> Date: Fri Aug 7 18:05:29 2009 -0700 Align tiled pixmap height so we don't address beyond the end of our buffers. Mesa: commit ceb8afcca5b0a52b005a782ea54b301beaee1a15 Author: Eric Anholt <eric@anholt.net> Date: Fri Aug 7 18:09:31 2009 -0700 intel: Align region height as required for tiled regions. Otherwise, we would address beyond the end of our buffers. Fixes reliable GPU segfault with texture_tiling=true and oglconform shadow.c. Bug #22406. Regretfully bug is still here. Updated: kernel: 2.6.31-rc2, drm-intel-next branch =media-libs/mesa-9999 =x11-libs/libdrm-9999 =x11-drivers/xf86-video-intel-9999 =x11-base/xorg-server-9999 =x11-proto/inputproto-9999 =x11-proto/xineramaproto-9999 =x11-drivers/xf86-input-evdev-9999 =x11-proto/fixesproto-4* =x11-proto/renderproto-0.11* all 9999 - versions a git master branches, so fixes should be in but I still reproduce the problem. I'll attache dump and log next. Created attachment 28540 [details]
Xorg.0.log.gz
Created attachment 28541 [details]
intel-gpu-dump.gz
Created attachment 28592 [details]
intel_gpu_dump output
I had another occurrence of this bug when playing with multi-screen setup on a Lenovo T500. The server froze when starting KDE's display settings tool after previously using xrandr. intel-gpu-dump file attached.
commit 5604b27b9326ac542069a49ed9650c4b0d3e939a Author: Eric Anholt <eric@anholt.net> Date: Wed Sep 9 12:35:30 2009 -0700 i965: Fix relocation delta for WM surfaces. This was a regression in 0f328c90dbc893e15005f2ab441d309c1c176245. Bug #23688 Bug #23254 New fix that gets some hangs in compiz, for a regression that was present at the last time you said you were testing. Could you retest? With the current git mesa, libdrm and xf86-video-intel it still hangs :( Created attachment 29368 [details]
Xorg.0.log.gz
Created attachment 29369 [details]
intel-gpu-dump.gz
Eric if some more debugging needed/possible on my side, let me know. And thanks for you help anyway. unblocking Q3 release, but still a high priority bug. Hello, I think I hit this bug with a X4500HD running KWin's OpenGL compositor. This bug makes it impossible to use desktop effects for more than a couple of minutes. cat /var/log/Xorg.0.log : [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x49e758] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49e124] 2: /usr/bin/Xorg (xf86PostMotionEventP+0xce) [0x478ede] 3: /usr/bin/Xorg (xf86PostMotionEvent+0xa9) [0x479099] 4: /usr/lib64/xorg/modules/input/wacom_drv.so (0x7f87a0f29000+0xc5b9) [0x7f87a0f355b9] 5: /usr/lib64/xorg/modules/input/wacom_drv.so (0x7f87a0f29000+0xd258) [0x7f87a0f36258] 6: /usr/lib64/xorg/modules/input/wacom_drv.so (0x7f87a0f29000+0x74c9) [0x7f87a0f304c9] 7: /usr/lib64/xorg/modules/input/wacom_drv.so (0x7f87a0f29000+0x451f) [0x7f87a0f2d51f] 8: /usr/lib64/xorg/modules/input/wacom_drv.so (0x7f87a0f29000+0x357e) [0x7f87a0f2c57e] 9: /usr/bin/Xorg (0x400000+0x6bdf7) [0x46bdf7] 10: /usr/bin/Xorg (0x400000+0x116993) [0x516993] 11: /lib64/libpthread.so.0 (0x3cb3200000+0xf320) [0x3cb320f320] 12: /lib64/libc.so.6 (ioctl+0x7) [0x3cb26d9c07] 13: /usr/lib64/libdrm_intel.so.1 (drm_intel_gem_bo_start_gtt_access+0x4d) [0x7f87a24375ed] 14: /usr/lib64/dri/i965_dri.so (intelFinish+0x36) [0x7f87a177e1ac] 15: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f87a311c000+0x325e4) [0x7f87a314e5e4] 16: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f87a311c000+0x313d2) [0x7f87a314d3d2] 17: /usr/lib64/xorg/modules/extensions/libglx.so (0x7f87a311c000+0x3592e) [0x7f87a315192e] 18: /usr/bin/Xorg (0x400000+0x2c60c) [0x42c60c] 19: /usr/bin/Xorg (0x400000+0x21c9a) [0x421c9a] 20: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x3cb261eb4d] 21: /usr/bin/Xorg (0x400000+0x21849) [0x421849] Nom : xorg-x11-server-Xorg Architecture : x86_64 Version : 1.7.0 Révision : 1.fc12 Nom : xorg-x11-drv-intel Architecture : x86_64 Version : 2.9.1 Révision : 1.fc12 $ uname -r 2.6.31.5-97.fc12.x86_64 $ cat /etc/fedora-release Fedora release 11.92 (Rawhide) fixing up severity/priority: until we have some clues to work on for what's going wrong, I can't call it high priority. Eric, any hints what can we do to give you that clues? Bug is perfectly reproducible, so if you could provide patch to gain some debugging output or whatever it should be not hard to do. And I think Severity reflects the issue but not our abilities, so I'm not sure why you've changed it. Ha, and after I told that, I decided to reproduce this problem another time and for 5 minutes failed! Probably bug is gone. I'll keep compiz working and will reopen bug in case hang will reproduced. Current software: x11-base/xorg-server-1.7.2 media-libs/mesa-7.5.2 vanilla-sources-2.6.32-rc8 BUT now compiz works with interruptions. :( Previously at time before session hangs it always worked very smoothly (it could last for minutes at times bug was opened). Now after I click maximize window or expand window it takes about second before something happens, and everything happened is not smooth at all. Well, but as I saw, it's possible to have compiz smooth on this hardware, so bugs still here and compiz still unusable :( Any way, thanks for you work. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.