Bug 20893

Summary: Random freezes on i965 and i945
Product: xorg Reporter: Björn Ruberg <bjoern>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: high CC: bjoern, dmitry.a.durnev, freedesktop-bugs, hez, jjardon, kai.kasurinen, marcus, mike.lifeguard, mishu, pete, sonne, sveinung84, svrmarty, vmarko, zack.evans, zOOmER.gm, zwaldowski
Version: 7.4 (2008.09)   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 20152, 20560    
Bug Blocks:    
Attachments:
Description Flags
pkg-config file for libdrm_intel
none
pkg-config file for pciaccess
none
Debug stuff, all put together
none
Dumps and logs from xf86-video-intel 2.7.1 on Ubuntu 9.04 x86_64
none
intel_gpu_dump none

Description Björn Ruberg 2009-03-26 14:36:53 UTC
I'm experiencing random X freezes on my latitude d630 with intel i965gm gpu. It happens using composite effects with compiz or kwin. It still happens when having composite DISABLED - but more rarely. It usually happens for me from one to five times per day.
The mouse still moves, but no input is accepted. You cannot switch console or kill the x-server. You can still log in via ssh. If that's not available, the only help is a hard reset of the system.

I suspect this is similiar to bug 20560 - but more intel hardware suppers from that than only the i945. It may be even than this two.

There is a bug report in the fedora bugzilla too:
https://bugzilla.redhat.com/show_bug.cgi?id=464866

I first experienced this bug after having moved from Ubuntu 8.10 to fedora10 two month ago. It is still there in Fedora 11 Beta (intel 2.6.0). In the fedora bug report someone from arch-linux mentioned that this bug first appears in intel-2.5 . My oberservations confirm that as ubuntu 8.10 uses intel-2.4 .

I cannot reproduce this bug on Ubuntu 9.04 beta with a i915 chipset.

WORKAROUND:
The only solution I found is to put „NoAccel“ „yes“ in xorg.conf. Using XAA or tiling does not help.

Maybe important: It never happend when playing a 2D-windows game running fullscreen in wine in 1024x768 resolution. I played many hours, so it is really unlikely that I just had luck.

STEPS TO REPRODUCE:
This bug is triggered randomly. It happens during high activity and it happens when just moving the mouse. But I observed that the probability is higher when having action on your desktop. It happened most often for me when working with eclipse.
For triggering this bug I usually use compiz with many effects enabled. I open up and close applications by scripts every two seconds. When I start doing additional work on the desktop, the freeze always occured within 15 minutes.
The freeze most often occurs during compiz animations.
Comment 1 Zachary Waldowski 2009-04-12 07:53:53 UTC
If anybody needs it, I can confirm this bug on another Intel GM965/X3100.  This happens very often in the Ubuntu 9.04 Dailies, Fedora 11 Beta, and current Arch Linux.  This also occurs for me on AMD64, with EXA and UXA.

All sorts of people on the Ubuntu Launchpad are reporting similar or exact same bugs on other Intel video chipsets such as the 915, 945, and 4500HD.
Comment 2 Björn Ruberg 2009-04-12 07:59:39 UTC
Sure it is on Fedora11-Beta? I tried to trigger this bug with the Fedora-Beta on GM965 one week ago. Had kwin running with composite effects. It survived two hours of desktop activity as I described in my bugreport.
Comment 3 Zachary Waldowski 2009-04-12 09:47:28 UTC
Yes, I'm positive. In fact, it's happened to me on every Linux distro with the most recent Intel driver - up-to-date Arch, Ubuntu Jaunty, and Fedora 11 included.  Due to the randomness of the bug, I've gone for up to 4 hours without it before, even with Compiz/no Compiz and EXA/UXA.
Comment 4 Carl Worth 2009-04-16 13:39:02 UTC
Hi Björn,

I know it's not much fun for you, but I'm delighted that you have such a repeatable way to cause your GPU to hang.

For this case, we've developed a new tool that will provide us ver useful information for debugging the hang. The tool is called intel_gpu_dump and is contained within the intel-gpu-tools repository which you can obtain as follows:

git clone git://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools

(Eric has been threatening to make a tar-file release of that, so if getting it via git is a problem, let me know and I'll pester him to do that.)

The one trick to getting intel_gpu_dump to work is that it requires some code only very recently added to the i915 kernel driver. The easiest way to get this is with the recently released 2.6.30-rc2 version of Linux. If you can run that, then running intel_gpu_dump should give a nice dump of the commands most-recently submitted to the GPU. (If the GPU is not hung, the output is often almost empty, but when it's hung, then you should definitely get some output.)

If you could run that tool and send us the output, then that will be very helpful for us to identify and fix the bug.

Please let me know if you need any help with that,

Thanks,

-Carl
Comment 5 Björn Ruberg 2009-04-16 13:51:47 UTC
Well, I should be able to that, although it is of course not very funny.
There is a report in the fedora-bugzilla that the the hang disappeared with a 2.6.29.1 kernel. I'll try that and if that does not solve the problem, I'll be motivated to try getting this dump.
Comment 6 Björn Ruberg 2009-04-19 04:10:16 UTC
Well, the bug is definetly NOT gone with kernel 2.6.29.
I tried to produce a dump but I failed. Not because of the kernel, I can
compile one myself. But I didn't get your dump-tool built.

You should really provide some readme with the requirements. I got some strange
errors until I finally installed libtool.
The configure-script wants to have libdrm_intel 2.4.6 installed (it relies on
pkg-config, but Fedora does not install needed files). For Fedora10 only 2.4.0
is available.

I started make but that results in this error:
Making all in lib                                                               
  CC    intel_batchbuffer.o                                                     
In file included from intel_batchbuffer.c:34:                                   
intel_batchbuffer.h:13: Fehler: expected specifier-qualifier-list before
»drm_intel_bufmgr«                 
intel_batchbuffer.h:31: Fehler: expected »)« before »*« token               
intel_batchbuffer.h:44: Fehler: expected declaration specifiers or »...«
before »drm_intel_bo«              
intel_batchbuffer.h: In Funktion »intel_batchbuffer_space«:                   
intel_batchbuffer.h:57: Fehler: »struct intel_batchbuffer« hat kein Element
namens »size«                   
intel_batchbuffer.h:57: Fehler: »struct intel_batchbuffer« hat kein Element
namens »ptr«                    
intel_batchbuffer.h:57: Fehler: »struct intel_batchbuffer« hat kein Element
namens »map«                    
intel_batchbuffer.h: In Funktion »intel_batchbuffer_emit_dword«:           

and so on ...

Comment 7 Marko Vojinovic 2009-04-19 05:29:31 UTC
I am also very interested to help fix this bug, but have zero experience with git. If someone could give me clear step-by-step instructions on what to do, I'll do it. Currently I run stock Fedora 10 which (as I see from previous comments) is not recent enough, but nevertheless, I'd like to help as much as I can. I typically have another machine on the LAN and can read logs via ssh whenever the lockup happens.

Besides, I can confirm that this bug is seen by a quite few people running Intel hardware. Just take a look at the above Fedora bug report and the e-mail list... :-)

Comment 8 Björn Ruberg 2009-04-19 05:43:07 UTC
Created attachment 24939 [details]
pkg-config file for libdrm_intel

Well, install git, install libtool.
Then copy the git command above. 

Before doing something you need packageconfig-files not installed by fedora. Put the file attached and the file I'll attach in the next post into /usr/share/pkgconfig .

Now you can go into the intel-... directory the git command created. Run ./autogen.sh .
After that make.

Well, that's the way I did it. But as I reported, it does not compile. If the tool really needs libdrm-2.4.6, that would mean trouble for me.

libdrm-2.4.6 is in Fedora11. Fedora11 users might have more success than I.
Comment 9 Björn Ruberg 2009-04-19 05:45:13 UTC
Created attachment 24940 [details]
pkg-config file for pciaccess

Fedora users will have to copy this to into /usr/share/pkgconfig/ too. 
Oh, and don't forget to install libdrm-devel and pciaccess-devel
Comment 10 Björn Ruberg 2009-04-22 10:00:38 UTC
Well, I would really like to help but I need support on this issue.
Currently only probably only brave Fedora users are affected by this bug because of its bleeding edge nature. But tomorrow Ubuntu 9.04 comes out and that uses the same faulty drivers.
That means that we will have soon many many users with freezing desktops. It's worse that even deactivating compiz does not prevent this bug. So many people using intel-hardware will experience an unstable linux desktop because of this.
I call this disasterous.

I can only add that a driver version 2.5.1 is reported working in Fedora bugzilla. So the bug has been probably introduced after that. We already knew that it was not there in 2.4.
Comment 11 Zachary Waldowski 2009-04-23 10:44:20 UTC
Created attachment 25069 [details]
Debug stuff, all put together

All of the debug things obtained with the intel_gpu_tools.  Also includes my dmesg and i915 debug files.
Comment 12 Javier Jardón 2009-04-27 19:57:38 UTC
Link to ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/359392
Comment 13 Björn Ruberg 2009-05-05 16:04:36 UTC
At least I can confirm that the situation improved after having switched to Fedora 11 and the 2.7 driver. Still having freezes, but they are more rare. Didn't have one for three days now. In the 2.6 line I had three per day.
Comment 14 Jesse Barnes 2009-05-11 11:21:17 UTC
Adjusting severity: crashes & hangs should be marked critical.
Comment 15 Zachary Waldowski 2009-05-13 17:53:09 UTC
Created attachment 25846 [details]
Dumps and logs from xf86-video-intel 2.7.1 on Ubuntu 9.04 x86_64

Adding debug information from the new 2.7.1 driver.  I'm upset by the fact that it still doesn't fix anything.
Comment 16 Eric Anholt 2009-05-20 11:38:08 UTC
Björn: could you get intel_gpu_dump output for one of your hangs with current stuff (Fedora)?

Comment 17 Soeren Sonnenburg 2009-06-03 23:04:46 UTC
I can confirm this issue on 2.6.30-rc8+git and xf86-intel git. 

System is debian-sid + self compiled libdrm2/xf86-intel/kernel. GPU is i945 all this on a Samsung NC10 netbook (which used to be rock stable before experiencing the xf86-intel 2.6.X issues).

Do you still need gpu dumps?
Comment 18 Dmitry Durnev 2009-06-09 05:54:56 UTC
Confirm on 2.6.28-gentoo-r5, xf86-video-intel 2.7.1 (also 2.7.0), gentoo x86(32-bit) [GMA X3100 on intel DG33FB mainboard]. Xorg server is 1.6.1. One freeze happens in about 3-5 days. Killing X after freeze by Magic SysRq key leads to screen corruption, so have to reboot...
Comment 19 zOOm_ER 2009-06-09 09:23:51 UTC
confirmed on openSUSE-factory
2.6.29.1
X.org 7.4 build 07 Jun 2009
Comment 20 svrmarty@gmx.net 2009-06-11 08:09:14 UTC
any news on this ?
Comment 21 Götz 2009-06-24 19:39:59 UTC
Created attachment 27109 [details]
intel_gpu_dump

00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02)

linux: 2.6.30-10-generic 
xserver: 2:1.6.1.901+git20090622+server1.6-branch.dbac41b6-0ubuntu0sarvatt~jaunty
intel: 2:2.7.99.901+git20090619.534e73ad-0ubuntu0sarvatt2~jaunty
libdrm: 2.4.11+git20090519.f355ad89-0ubuntu0sarvatt~jaunty
mesa: 7.5.0~git20090622+mesa-7-5-branch.abfd56c2-0ubuntu0sarvatt

I'm not sure if this is the same or related bug, I am using 865G and get about three freezes in one day. Mostly when using some OpenGL app, but also after leaving the pc idle some minutes, I don't know exactly how many minutes.

After the freeze, there is no useful log, but I attached the output from intel_gpu_dump.
Comment 22 Carl Worth 2009-06-25 12:32:54 UTC
(In reply to comment #21)
> Created an attachment (id=27109) [details]
> intel_gpu_dump
> 
> 00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics
> Controller (rev 02)
> 
> linux: 2.6.30-10-generic 
> xserver:
> 2:1.6.1.901+git20090622+server1.6-branch.dbac41b6-0ubuntu0sarvatt~jaunty
> intel: 2:2.7.99.901+git20090619.534e73ad-0ubuntu0sarvatt2~jaunty
> libdrm: 2.4.11+git20090519.f355ad89-0ubuntu0sarvatt~jaunty
> mesa: 7.5.0~git20090622+mesa-7-5-branch.abfd56c2-0ubuntu0sarvatt
> 
> I'm not sure if this is the same or related bug, I am using 865G and get about
> three freezes in one day. Mostly when using some OpenGL app, but also after
> leaving the pc idle some minutes, I don't know exactly how many minutes.
> 
> After the freeze, there is no useful log, but I attached the output from
> intel_gpu_dump.

Hi Götz,

This is almost 100% guaranteed to be a different bug, (865 and 965 have almost no driver code in common, and although "random freeze" is a fairly common symptom, the underlying causes are almost always different).

So, if you would be kind enough to open a new bug report for your issue, so that we can track it and determine when things are fixed for *you* that would be greatly appreciated.

In fact, Zachary, if you could also open a separate bug report for your crashes and your GPU dumps, that would also be greatly appreciated, (and I apologize that bugzilla doesn't make it easier to "fork" bug reports like this).

It seems as if the *original* bug report as reported by Björn is largely fixed. He reported that things did improve, and also didn't reply when asked for further information.

So I'm closing this bug as fixed, by which I mean only the original issue encountered by Björn. Obviously, other people that have commented on this bug report also have similar issues still unresolved. Please open individual bug reports for each so that we can give each person and each issue the attention we would like to.

We would *greatly* prefer to make one commit to the driver and have several bug reporters each respond "that fixed my bug" so we can close several bug reports. This is much better than making one commit that fixes an issue for one person, only to have several other people re-open the bug report only because they were actually dealing with an independent issue that happened to have similar systems.

So, thanks for your patience as people like me are just coming to understand the constraints we're working with, (many different bugs that manifest in very similar ways), and for your patience with tools that aren't always easy to use, (bugzilla makes it a fair amount easier to comment on an existing report than to open a new bug).

And do note that my request for *one-person-one-bug-report* is distinct from several other software projects that have to teach users to not open dozens of duplicate bug reports for a single software defect. The real difference here is that we're dealing with many different software defects, but with hardware that responds to these many defects with identical behavior, (just locking up).

Thanks again. We really do appreciate your reports and we want to do everything we can to address the problems you've encountered.

-Carl
Comment 23 zOOm_ER 2009-06-27 05:42:05 UTC
here is related bug:

[GM965] Random X freezes
https://bugs.freedesktop.org/show_bug.cgi?id=22482

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.