Summary: | [GM965] Random X freezes | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | zOOm_ER <zOOmER.gm> | ||||||||||||||||
Component: | Driver/intel | Assignee: | Carl Worth <cworth> | ||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||
Severity: | critical | ||||||||||||||||||
Priority: | high | CC: | freedesktop-bugs, kan.liang, pete, quanxian.wang | ||||||||||||||||
Version: | 7.4 (2008.09) | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Attachments: |
|
Description
zOOm_ER
2009-06-25 12:59:55 UTC
I have a question about when i should run the indel_gpu_dump tool? I mean, how much time from the hang i have? As far as i understand, tool dumps most recently calls to gpu. So dump can be bloated after some time with some useless data, which will drop actual data on moment of the hang out of dump? Just run intel_gpu_dump after the hang, though you don't need to be super quick like within 1 second. "random freeze" is normally hard to reproduce and fix. So please provide more info according to http://intellinuxgraphics.org/how_to_report_bug.html. Created attachment 27156 [details]
/var/log/Xorg.0.log
Created attachment 27157 [details]
/etc/X11/xorg.conf
here is my system config: u1@linux-giq8:~> uname -a Linux linux-giq8 2.6.30-5-default #1 SMP Fri Jun 26 01:49:07 MSD 2009 i686 i686 i386 GNU/Linux u1@linux-giq8:~> pkg-config --modversion libdrm 2.4.11 u1@linux-giq8:~> glxinfo | grep vers server glx version string: 1.2 client glx version string: 1.4 OpenGL version string: 2.0 Mesa 7.4.2 glu version: 1.3 now trying to catch a hang... ok. i got it. Xorg-log and intel-gpu-dump attached Created attachment 27181 [details]
Xorg.0.log.old - when hang happened
intel-gpu-dump log is 1,5 MB it cannot be uploaded here or to pastebin due to it's size how can i provide it to be easily accesible? Created attachment 27187 [details]
here is compressed version of intel-gpu-dump
We just fix an random hang issue on OpenSolaris/GEM. Since OpenSolaris and Linux use the similar X code, hope these two patchs can help on your problem. Thanks! Liang Kan Created attachment 27327 [details] [review] 2d patch Created attachment 27328 [details] [review] mesa patch (In reply to comment #10) > We just fix an random hang issue on OpenSolaris/GEM. Since OpenSolaris and > Linux use the similar X code, hope these two patchs can help on your problem. > > Thanks! > Liang Kan > I never tried to build X\Mesa by myself. I will be so thankful, if you can give some hints on how to apply those patches? (In reply to comment #10) > We just fix an random hang issue on OpenSolaris/GEM. Since OpenSolaris and > Linux use the similar X code, hope these two patchs can help on your problem. > > Thanks! > Liang Kan Unless I'm misreading things, your patches only change the driver's behavior for 865, but this bug report is concerning the 965 hardware, so the patches won't be relevant. Meanwhile, have you submitted the patches to the intel-gfx mailing list? That's a good place for them, (or else in an independent bug report). Thanks, and let me know if you have any questions. -Carl (In reply to comment #14) > (In reply to comment #10) > > We just fix an random hang issue on OpenSolaris/GEM. Since OpenSolaris and > > Linux use the similar X code, hope these two patchs can help on your problem. > > > > Thanks! > > Liang Kan > > Unless I'm misreading things, your patches only change the driver's behavior > for 865, but this bug report is concerning the 965 hardware, so the patches > won't be relevant. > > Meanwhile, have you submitted the patches to the intel-gfx mailing list? That's > a good place for them, (or else in an independent bug report). > > Thanks, and let me know if you have any questions. > > -Carl > from what i understand, patches do change behavior of all chipsets, except 865. (In reply to comment #15) > > Unless I'm misreading things, your patches only change the driver's behavior > > for 865, but this bug report is concerning the 965 hardware, so the patches > > won't be relevant. > > from what i understand, patches do change behavior of all chipsets, except 865. Yes, I did misread this. Thank you. Eric Anholt just reviewed the patch and suspects that for OpenSolaris the patch works around a hardware issue that was worked around in Linux (in version 2.6.30) with the below commit. So we would not expect the patch to help here. Of course, if the patch included a description of what and why it was changing, then that would help too. -Carl commit 13f4c435ebf2a7c150ffa714f3b23b8e4e8cb42f Author: Eric Anholt <eric@anholt.net> Date: Tue May 12 15:27:36 2009 -0700 drm/i915: Don't allow binding objects into the last page of the aperture. This should avoid a class of bugs where the hardware prefetches past the end of the object, and walks into unallocated memory when the object is bound to the last page of the aperture. fd.o bug #21488 Signed-off-by: Eric Anholt <eric@anholt.net> (In reply to comment #16) > (In reply to comment #15) > > > Unless I'm misreading things, your patches only change the driver's behavior > > > for 865, but this bug report is concerning the 965 hardware, so the patches > > > won't be relevant. > > > > from what i understand, patches do change behavior of all chipsets, except 865. > Yes, I did misread this. Thank you. > Eric Anholt just reviewed the patch and suspects that for OpenSolaris the patch > works around a hardware issue that was worked around in Linux (in version > 2.6.30) with the below commit. > So we would not expect the patch to help here. > Of course, if the patch included a description of what and why it was changing, Sorry for so less info about this patch. Quanxian is preparing a patch with the description. Now I think he is testing it too before send it out. > then that would help too. > -Carl > commit 13f4c435ebf2a7c150ffa714f3b23b8e4e8cb42f > Author: Eric Anholt <eric@anholt.net> > Date: Tue May 12 15:27:36 2009 -0700 > drm/i915: Don't allow binding objects into the last page of the aperture. > This should avoid a class of bugs where the hardware prefetches past the > end of the object, and walks into unallocated memory when the object is > bound to the last page of the aperture. > fd.o bug #21488 > Signed-off-by: Eric Anholt <eric@anholt.net> Yes, I once try Eric's patch in Solaris. It helps on some hang issues but not all. When I debuging the random hang issue on Solaris, I found the debug register DMA_FADD_P is always overstep 0x80 when hang. E.g if the hang batchbuffer obj's gtt offset is 0x8a31000, then length is ff0, DMA_FADD_P is 0x8a32080. So I guess this maybe another prefetch related issue. So I don't use the last page of every batchbuffer object. That works on Solaris. (In reply to comment #13) > (In reply to comment #10) > > We just fix an random hang issue on OpenSolaris/GEM. Since OpenSolaris and > > Linux use the similar X code, hope these two patchs can help on your problem. > > > > Thanks! > > Liang Kan > > > I never tried to build X\Mesa by myself. > I will be so thankful, if you can give some hints on how to apply those > patches? Please refer to the Intel Linux Graphics Driver Installation Guide http://intellinuxgraphics.org/install.html This is the description for the patch of comment 11 and comment 2. === For random hang of bug 22482, we are doubt it is related DMA Prefetch issue. For 865G,we keep 16 as the prefetch size. For others, we define 4096 as the reserved size in case of DMA prefetch outside of his range. === Actually why we provide this patch, we just get the result from a bunch of data for hang issue. DMA prefetch data from the batch buffer and then analysis the prefetch content. Once it comes across the BATCH BUFFER END symbol, it will stop. However, the size of prefetch is fixed, and it maybe cause DMA to prefetch outside of the page we allocated. It should be a logic analysis error. Command parser should stop once he come across the END, and don't need to analysis more even it got error content from outside. (it got mess of 0) For example, we allocated 4KB, DMA prefetch size 1.5KB, in the last time, it will prefetch 0.5KB from outside of batch buffer. It will cause something unexpected if hardware logic has some fault. After the changing from 16 to 4096 in openSolaris, hang disappears. (We ever want to change it to 0x100, and else, not works). This change will lost 25% space. (We generally allocate 4*4KB, 1KB for reserved). It should be a hardware issue, however I can not imagine the logic of it. Just guess. zOOm_ER, I suggest you have a try, after all, there is not a reasonable patch for this. ok. i've done some testing with these patches: Mesa patch itself does not seem to change anything (i.e. i still catching hangs) Patch of xf86-video-intel works (no hangs, and clobbered text-input fields disappeared with some other visible windows defects. [i didn't mention them before, because didn'n know, they are related to hangs]) But The Bad News(tm) are that 2d-driver pacth breaks opengl. All opengl screensavers give me a blank screen or just frozen screen (luckily not completely hang) Ang glxgears does not start - it throws on console this: #glxgears X Error of failed request: BadRequest (invalid request code or no such operation) Major opcode of failed request: 136 (DRI2) Minor opcode of failed request: 7 () Serial number of failed request: 30 Current serial number in output stream: 30 (In reply to comment #20) > ok. i've done some testing with these patches: > Mesa patch itself does not seem to change anything (i.e. i still catching > hangs) > Patch of xf86-video-intel works (no hangs, and clobbered text-input fields > disappeared with some other visible windows defects. [i didn't mention them > before, because didn'n know, they are related to hangs]) > But The Bad News(tm) are that 2d-driver pacth breaks opengl. > All opengl screensavers give me a blank screen or just frozen screen (luckily > not completely hang) > Ang glxgears does not start - it throws on console this: > #glxgears > X Error of failed request: BadRequest (invalid request code or no such > operation) > Major opcode of failed request: 136 (DRI2) > Minor opcode of failed request: 7 () > Serial number of failed request: 30 > Current serial number in output stream: 30 Could you please provide the xf86-video-intel/xserver/mesa/libdrm version which you build? > Could you please provide the xf86-video-intel/xserver/mesa/libdrm version which > you build? > xf86-video-intel is git version with this last commit: ### commit 74227141923a2f5049592219ab80e8733062a5d9 Author: Barry Scott <barry.scott@onelan.co.uk> Date: Tue Jun 23 14:14:50 2009 +0100 Fix segv for clipped movie window ### mesa, which i build, and as i said does not hange anything after patch is git version with last commit: ### commit 862488075c5537b0613753b0d14c267527fc6199 Merge: 060c7f2... 94e1117... Author: Jakob Bornecrantz <jakob@vmware.com> Date: Fri Jul 3 18:53:58 2009 +0200 Merge branch 'mesa_7_5_branch' ### drm and xserver i didn't touch, so as i said before: X.Org X Server 1.6.1 Release Date: 2009-4-14 u1@linux-giq8:~> uname -a Linux linux-giq8 2.6.30-5-default #1 SMP Fri Jun 26 01:49:07 MSD 2009 i686 i686 i386 GNU/Linux u1@linux-giq8:~> pkg-config --modversion libdrm 2.4.11 (In reply to comment #20) > ok. i've done some testing with these patches: > Mesa patch itself does not seem to change anything (i.e. i still catching > hangs) > Patch of xf86-video-intel works (no hangs, and clobbered text-input fields > disappeared with some other visible windows defects. [i didn't mention them > before, because didn'n know, they are related to hangs]) I'm glad you got the hangs to go away. Are you sure it's related to the patch, though? Can you try the driver from git without the patch to see if the hangs and clobbered text fields are also fixed? I'm suspecting that the patch here has nothing to do with the behavior you're seeing. > But The Bad News(tm) are that 2d-driver pacth breaks opengl. And that doesn't make any sense at all. The 2D driver patch can't change OpenGL. But again, if you were updating lots of components, (from packaged versions to git, for example), in order to be able to apply the patch, then one of those updates could cause an issue like this. And if so, then that would need to be a separate bug report. For here, let's focus on the original hang issue. I'll look forward to hearing whether the current driver from git exhibits the hangs or not. -Carl > I'm glad you got the hangs to go away. Are you sure it's related to the patch, > though? Can you try the driver from git without the patch to see if the hangs > and clobbered text fields are also fixed? I'm suspecting that the patch here > has nothing to do with the behavior you're seeing. > I tried unchanged git 2d driver, and i can confirm, that it fixes clobbered text and hangs(i will try little bit more, but i tortured it pretty much) > > But The Bad News(tm) are that 2d-driver pacth breaks opengl. > > And that doesn't make any sense at all. The 2D driver patch can't change > OpenGL. But again, if you were updating lots of components, (from packaged > versions to git, for example), in order to be able to apply the patch, then one > of those updates could cause an issue like this. And if so, then that would > need to be a separate bug report. > > For here, let's focus on the original hang issue. I'll look forward to hearing > whether the current driver from git exhibits the hangs or not. > > -Carl > so... i called "make" and "make install" in source directory, just after pulling 2d driver from git. it is enough to fix hangs and break glxgears... if i roll back to my distribution 2d driver, i got hangs again and working glxgears... i didn't touch mesa or drm. is it enough to crate new bugreport? (In reply to comment #24) > I tried unchanged git 2d driver, and i can confirm, that it fixes clobbered > text and hangs(i will try little bit more, but i tortured it pretty much) Great news. I'll mark this bug report as fixed now. > so... i called "make" and "make install" in source directory, just after > pulling 2d driver from git. it is enough to fix hangs and break glxgears... if > i roll back to my distribution 2d driver, i got hangs again and working > glxgears... i didn't touch mesa or drm. is it enough to crate new bugreport? It's definitely a separate bug report yes. So if you wouldn't mind opening that, that would be appreciated. The behavior does seem very odd to me still, but I'll let the new assignee comment in more detail. :-) Thanks again for your report and testing, -Carl (In reply to comment #25) > > so... i called "make" and "make install" in source directory, just after > > pulling 2d driver from git. it is enough to fix hangs and break glxgears... if > > i roll back to my distribution 2d driver, i got hangs again and working > > glxgears... i didn't touch mesa or drm. is it enough to crate new bugreport? > > It's definitely a separate bug report yes. glxgears might be broken because the DRI is disabled, which could also explain why the hangs no longer occur. (In reply to comment #26) > glxgears might be broken because the DRI is disabled, which could also explain > why the hangs no longer occur. > how can i check for this? (In reply to comment #27) > (In reply to comment #26) > > glxgears might be broken because the DRI is disabled, which could also explain > > why the hangs no longer occur. > > > how can i check for this? You can check the Xorg.0.log. Or you can run /usr/X11/bin/glxinfo to check "direct rendering" i tried to run glxinfo, but it throws almost the same error, as glxgears: # glxinfo name of display: :0.0 X Error of failed request: BadRequest (invalid request code or no such operation) Major opcode of failed request: 136 (DRI2) Minor opcode of failed request: 7 () Serial number of failed request: 25 Current serial number in output stream: 25 i will attach Xorg.0.log Created attachment 27497 [details]
xorg.0.log of starting X with 2d-driver from git
There are many warnings, but it seems, that DRI is enabled
|
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.