Summary: | Random freezes on i965 and i945 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Björn Ruberg <bjoern> | ||||||||||||
Component: | Driver/intel | Assignee: | Carl Worth <cworth> | ||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||
Severity: | critical | ||||||||||||||
Priority: | high | CC: | bjoern, dmitry.a.durnev, freedesktop-bugs, hez, jjardon, kai.kasurinen, marcus, mike.lifeguard, mishu, pete, sonne, sveinung84, svrmarty, vmarko, zack.evans, zOOmER.gm, zwaldowski | ||||||||||||
Version: | 7.4 (2008.09) | ||||||||||||||
Hardware: | x86 (IA32) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Bug Depends on: | 20152, 20560 | ||||||||||||||
Bug Blocks: | |||||||||||||||
Attachments: |
|
Description
Björn Ruberg
2009-03-26 14:36:53 UTC
If anybody needs it, I can confirm this bug on another Intel GM965/X3100. This happens very often in the Ubuntu 9.04 Dailies, Fedora 11 Beta, and current Arch Linux. This also occurs for me on AMD64, with EXA and UXA. All sorts of people on the Ubuntu Launchpad are reporting similar or exact same bugs on other Intel video chipsets such as the 915, 945, and 4500HD. Sure it is on Fedora11-Beta? I tried to trigger this bug with the Fedora-Beta on GM965 one week ago. Had kwin running with composite effects. It survived two hours of desktop activity as I described in my bugreport. Yes, I'm positive. In fact, it's happened to me on every Linux distro with the most recent Intel driver - up-to-date Arch, Ubuntu Jaunty, and Fedora 11 included. Due to the randomness of the bug, I've gone for up to 4 hours without it before, even with Compiz/no Compiz and EXA/UXA. Hi Björn, I know it's not much fun for you, but I'm delighted that you have such a repeatable way to cause your GPU to hang. For this case, we've developed a new tool that will provide us ver useful information for debugging the hang. The tool is called intel_gpu_dump and is contained within the intel-gpu-tools repository which you can obtain as follows: git clone git://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools (Eric has been threatening to make a tar-file release of that, so if getting it via git is a problem, let me know and I'll pester him to do that.) The one trick to getting intel_gpu_dump to work is that it requires some code only very recently added to the i915 kernel driver. The easiest way to get this is with the recently released 2.6.30-rc2 version of Linux. If you can run that, then running intel_gpu_dump should give a nice dump of the commands most-recently submitted to the GPU. (If the GPU is not hung, the output is often almost empty, but when it's hung, then you should definitely get some output.) If you could run that tool and send us the output, then that will be very helpful for us to identify and fix the bug. Please let me know if you need any help with that, Thanks, -Carl Well, I should be able to that, although it is of course not very funny. There is a report in the fedora-bugzilla that the the hang disappeared with a 2.6.29.1 kernel. I'll try that and if that does not solve the problem, I'll be motivated to try getting this dump. Well, the bug is definetly NOT gone with kernel 2.6.29. I tried to produce a dump but I failed. Not because of the kernel, I can compile one myself. But I didn't get your dump-tool built. You should really provide some readme with the requirements. I got some strange errors until I finally installed libtool. The configure-script wants to have libdrm_intel 2.4.6 installed (it relies on pkg-config, but Fedora does not install needed files). For Fedora10 only 2.4.0 is available. I started make but that results in this error: Making all in lib CC intel_batchbuffer.o In file included from intel_batchbuffer.c:34: intel_batchbuffer.h:13: Fehler: expected specifier-qualifier-list before »drm_intel_bufmgr« intel_batchbuffer.h:31: Fehler: expected »)« before »*« token intel_batchbuffer.h:44: Fehler: expected declaration specifiers or »...« before »drm_intel_bo« intel_batchbuffer.h: In Funktion »intel_batchbuffer_space«: intel_batchbuffer.h:57: Fehler: »struct intel_batchbuffer« hat kein Element namens »size« intel_batchbuffer.h:57: Fehler: »struct intel_batchbuffer« hat kein Element namens »ptr« intel_batchbuffer.h:57: Fehler: »struct intel_batchbuffer« hat kein Element namens »map« intel_batchbuffer.h: In Funktion »intel_batchbuffer_emit_dword«: and so on ... I am also very interested to help fix this bug, but have zero experience with git. If someone could give me clear step-by-step instructions on what to do, I'll do it. Currently I run stock Fedora 10 which (as I see from previous comments) is not recent enough, but nevertheless, I'd like to help as much as I can. I typically have another machine on the LAN and can read logs via ssh whenever the lockup happens. Besides, I can confirm that this bug is seen by a quite few people running Intel hardware. Just take a look at the above Fedora bug report and the e-mail list... :-) Created attachment 24939 [details]
pkg-config file for libdrm_intel
Well, install git, install libtool.
Then copy the git command above.
Before doing something you need packageconfig-files not installed by fedora. Put the file attached and the file I'll attach in the next post into /usr/share/pkgconfig .
Now you can go into the intel-... directory the git command created. Run ./autogen.sh .
After that make.
Well, that's the way I did it. But as I reported, it does not compile. If the tool really needs libdrm-2.4.6, that would mean trouble for me.
libdrm-2.4.6 is in Fedora11. Fedora11 users might have more success than I.
Created attachment 24940 [details]
pkg-config file for pciaccess
Fedora users will have to copy this to into /usr/share/pkgconfig/ too.
Oh, and don't forget to install libdrm-devel and pciaccess-devel
Well, I would really like to help but I need support on this issue. Currently only probably only brave Fedora users are affected by this bug because of its bleeding edge nature. But tomorrow Ubuntu 9.04 comes out and that uses the same faulty drivers. That means that we will have soon many many users with freezing desktops. It's worse that even deactivating compiz does not prevent this bug. So many people using intel-hardware will experience an unstable linux desktop because of this. I call this disasterous. I can only add that a driver version 2.5.1 is reported working in Fedora bugzilla. So the bug has been probably introduced after that. We already knew that it was not there in 2.4. Created attachment 25069 [details]
Debug stuff, all put together
All of the debug things obtained with the intel_gpu_tools. Also includes my dmesg and i915 debug files.
Link to ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/359392 At least I can confirm that the situation improved after having switched to Fedora 11 and the 2.7 driver. Still having freezes, but they are more rare. Didn't have one for three days now. In the 2.6 line I had three per day. Adjusting severity: crashes & hangs should be marked critical. Created attachment 25846 [details]
Dumps and logs from xf86-video-intel 2.7.1 on Ubuntu 9.04 x86_64
Adding debug information from the new 2.7.1 driver. I'm upset by the fact that it still doesn't fix anything.
Björn: could you get intel_gpu_dump output for one of your hangs with current stuff (Fedora)? I can confirm this issue on 2.6.30-rc8+git and xf86-intel git. System is debian-sid + self compiled libdrm2/xf86-intel/kernel. GPU is i945 all this on a Samsung NC10 netbook (which used to be rock stable before experiencing the xf86-intel 2.6.X issues). Do you still need gpu dumps? Confirm on 2.6.28-gentoo-r5, xf86-video-intel 2.7.1 (also 2.7.0), gentoo x86(32-bit) [GMA X3100 on intel DG33FB mainboard]. Xorg server is 1.6.1. One freeze happens in about 3-5 days. Killing X after freeze by Magic SysRq key leads to screen corruption, so have to reboot... confirmed on openSUSE-factory 2.6.29.1 X.org 7.4 build 07 Jun 2009 any news on this ? Created attachment 27109 [details]
intel_gpu_dump
00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02)
linux: 2.6.30-10-generic
xserver: 2:1.6.1.901+git20090622+server1.6-branch.dbac41b6-0ubuntu0sarvatt~jaunty
intel: 2:2.7.99.901+git20090619.534e73ad-0ubuntu0sarvatt2~jaunty
libdrm: 2.4.11+git20090519.f355ad89-0ubuntu0sarvatt~jaunty
mesa: 7.5.0~git20090622+mesa-7-5-branch.abfd56c2-0ubuntu0sarvatt
I'm not sure if this is the same or related bug, I am using 865G and get about three freezes in one day. Mostly when using some OpenGL app, but also after leaving the pc idle some minutes, I don't know exactly how many minutes.
After the freeze, there is no useful log, but I attached the output from intel_gpu_dump.
(In reply to comment #21) > Created an attachment (id=27109) [details] > intel_gpu_dump > > 00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics > Controller (rev 02) > > linux: 2.6.30-10-generic > xserver: > 2:1.6.1.901+git20090622+server1.6-branch.dbac41b6-0ubuntu0sarvatt~jaunty > intel: 2:2.7.99.901+git20090619.534e73ad-0ubuntu0sarvatt2~jaunty > libdrm: 2.4.11+git20090519.f355ad89-0ubuntu0sarvatt~jaunty > mesa: 7.5.0~git20090622+mesa-7-5-branch.abfd56c2-0ubuntu0sarvatt > > I'm not sure if this is the same or related bug, I am using 865G and get about > three freezes in one day. Mostly when using some OpenGL app, but also after > leaving the pc idle some minutes, I don't know exactly how many minutes. > > After the freeze, there is no useful log, but I attached the output from > intel_gpu_dump. Hi Götz, This is almost 100% guaranteed to be a different bug, (865 and 965 have almost no driver code in common, and although "random freeze" is a fairly common symptom, the underlying causes are almost always different). So, if you would be kind enough to open a new bug report for your issue, so that we can track it and determine when things are fixed for *you* that would be greatly appreciated. In fact, Zachary, if you could also open a separate bug report for your crashes and your GPU dumps, that would also be greatly appreciated, (and I apologize that bugzilla doesn't make it easier to "fork" bug reports like this). It seems as if the *original* bug report as reported by Björn is largely fixed. He reported that things did improve, and also didn't reply when asked for further information. So I'm closing this bug as fixed, by which I mean only the original issue encountered by Björn. Obviously, other people that have commented on this bug report also have similar issues still unresolved. Please open individual bug reports for each so that we can give each person and each issue the attention we would like to. We would *greatly* prefer to make one commit to the driver and have several bug reporters each respond "that fixed my bug" so we can close several bug reports. This is much better than making one commit that fixes an issue for one person, only to have several other people re-open the bug report only because they were actually dealing with an independent issue that happened to have similar systems. So, thanks for your patience as people like me are just coming to understand the constraints we're working with, (many different bugs that manifest in very similar ways), and for your patience with tools that aren't always easy to use, (bugzilla makes it a fair amount easier to comment on an existing report than to open a new bug). And do note that my request for *one-person-one-bug-report* is distinct from several other software projects that have to teach users to not open dozens of duplicate bug reports for a single software defect. The real difference here is that we're dealing with many different software defects, but with hardware that responds to these many defects with identical behavior, (just locking up). Thanks again. We really do appreciate your reports and we want to do everything we can to address the problems you've encountered. -Carl here is related bug: [GM965] Random X freezes https://bugs.freedesktop.org/show_bug.cgi?id=22482 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.