Bug 26016

Summary: [945 pf interrupt] Freezes when compiz enabled
Product: xorg Reporter: Geir Ove Myhr <gomyhr>
Component: Driver/intelAssignee: Carl Worth <cworth>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: bryce, cfeck, tt.hogehoge
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Batchbufer dump. i915_regs excluded, since reading it causes hangs on this machine
none
BootDmesg.txt
none
CurrentDmesg.txt
none
PciDisplay.txt
none
XorgLog.txt
none
dri_debug tarball
none
dmesg output with Ubuntu kernel 2.6.35-8.13 (based on -rc5) none

Description Geir Ove Myhr 2010-01-12 14:29:54 UTC
Forwarding a freeze bug from Ubuntu user takashi torigoe:
  https://bugs.launchpad.net/bugs/475429
There are several other reports of freezes on 945G and 945GM in Ubuntu, but this seems to have more information.

[Problem]
GPU hangs on i945G when compiz is enabled. When compiz isn't enabled, 3D screensavers (especially euphoria) can make it hang. Bug is originally reported on Ubuntu 9.10, but has been verified with drm-intel-next kernel from 201001061342 and with the development verision 10.04 with xorg-edgers and the logs are taken from there.


[Original report]

Since upgrade to the 9.10 release, my screen freezes after visual effect (window move).
With visual effects disabled, there is no problem.
When freeze occurs, keyboard and mouse are unusable, but ssh login is OK.
So, I got Batchbuffer dump according to https://wiki.ubuntu.com/X/Troubleshooting/Freeze.
Dump is attached with this post.(dri_debug-20091105.tgz)

Operations
1. sudo INTEL_DEBUG=batch /etc/init.d/gdm restart
2. Set visual effects -> extra (compiz)
3. Window move by mouse operation.
4. Window swings (visual effect)
5. freeze occur
6. ssh & get Batchbuffer

I found that, when "Fusion-icon->Compiz options->Indirect Rendering" checkbox is on, 3D effects work.
compiz can disable the direct rendering by the checkbox.
But other applications (ex. 3D screensaves, blender, ... ) cause freeze as well as before.
It seems to be caused by kernel driver's lock, but it resumed by killing the freezed application.
I think that intel driver for i945 have some issue in the Direct Rendering function.

Kernel options that works
 acpi=off(nomodeset is no effect)
 no freeze, but there are some issue.
 - screen draw speed is very slow.
 - When compiz enabled, screen goes white blank.
    (Desktop Cube becomes white cube)
 - Render selection (direct/indirect) by compiz fusion icon is disabled.(fixed to indirect)

When trying older kernels, freeze seems to be introduced between 2.6.30 and 2.6.31-rc1, but using 2.6.30 seems to turn off direct rendering.

Architecture: i386
DistroRelease: Ubuntu 10.04
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Alpha i386 (20091209)
LiveMediaBuild: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
MachineType: MICRO-STAR INTERNATIONAL CO.,LTD MS-7314
Package: xserver-xorg-video-intel 2:2.10.0+git20100108.4902f546-0ubuntu0sarvatt
PackageArchitecture: i386
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-7-generic root=UUID=e1d041b6-3b07-4d6f-aae9-5fbce8eee93c ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=ja_JP.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-7.10-generic
RelatedPackageVersions:
 xserver-xorg 1:7.5+1ubuntu1
 libgl1-mesa-glx 7.8.0~git20100107.d699b672-0ubuntu0sarvatt
 libdrm2 2.4.17+git20091230.c5c503b5-0ubuntu0sarvatt3
 xserver-xorg-video-intel 2:2.10.0+git20100108.4902f546-0ubuntu0sarvatt
Tags: lucid
Uname: Linux 2.6.32-7-generic i686
UnreportableReason: これは正式な Ubuntu のパッケージではありません
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
XorgConf:
 Section "Device"
  Identifier "my-945G"
  Driver "intel"
  Option "DebugFlushCaches" "1"
 EndSection
dmi.bios.date: 07/14/2008
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: V1.1
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: MS-7314
dmi.board.vendor: MICRO-STAR INTERNATIONAL CO.,LTD
dmi.board.version: 1.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: MICRO-STAR INTERNATIONAL CO.,LTD
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrV1.1:bd07/14/2008:svnMICRO-STARINTERNATIONALCO.,LTD:pnMS-7314:pvr1.0:rvnMICRO-STARINTERNATIONALCO.,LTD:rnMS-7314:rvr1.0:cvnMICRO-STARINTERNATIONALCO.,LTD:ct3:cvr1.0:
dmi.product.name: MS-7314
dmi.product.version: 1.0
dmi.sys.vendor: MICRO-STAR INTERNATIONAL CO.,LTD
fglrx: Not loaded
system:
 distro:             Ubuntu
 architecture:       i686kernel:             2.6.32-7-generic
Comment 1 Geir Ove Myhr 2010-01-12 14:33:08 UTC
Created attachment 32598 [details]
Batchbufer dump. i915_regs excluded, since reading it causes hangs on this machine
Comment 2 Geir Ove Myhr 2010-01-12 14:35:45 UTC
Created attachment 32599 [details]
BootDmesg.txt
Comment 3 Geir Ove Myhr 2010-01-12 14:36:04 UTC
Created attachment 32600 [details]
CurrentDmesg.txt
Comment 4 Geir Ove Myhr 2010-01-12 14:36:32 UTC
Created attachment 32601 [details]
PciDisplay.txt
Comment 5 Geir Ove Myhr 2010-01-12 14:37:07 UTC
Created attachment 32602 [details]
XorgLog.txt
Comment 6 Geir Ove Myhr 2010-02-27 12:41:02 UTC
Created attachment 33630 [details]
dri_debug tarball

From downstream: Takashi has tested with drm-intel-next kernel that should detect a GPU hang and add information to i915_error_state. Possibly, this is because he uses Ubuntu 9.10 and not 10.04 now. Not sure which packages are relevant.

-- from downstream --

I install the latest drm-intel-next kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-next/2010-02-24/  .
And I got batchbuffer dump.
 sudo service apport start force_start=1
 mkdir dri_debug-$datestr
 sudo cp -r /sys/kernel/debug/dri/0/i915* dri_debug-$datestr
 sudo intel_gpu_dump > dri_debug-$datestr/intel_gpu_dump.txt
 dmesg > dri_debug-$datestr/dmesg.txt
 cp /var/log/Xorg.0.log dri_debug-$datestr/
 sudo cp /var/log/gdm/\:0.log dri_debug-$datestr/gdm.log
 sudo tar czf dri_debug-$datestr.tgz dri_debug-$datestr/

The batchbuffer dump is attached.
i915_error_state shows "no error state collected".
Comment 7 Bryce Harrington 2010-03-04 18:27:19 UTC
*** Bug 26898 has been marked as a duplicate of this bug. ***
Comment 8 Carl Worth 2010-03-22 14:41:43 UTC
Bumping the priority down on this bug, only because we don't expect to have this fixed in time for the release that's coming together right now.

-Carl
Comment 9 Chris Wilson 2010-06-24 11:04:56 UTC
A page-flipping bug. The Q2 release should have most of these fixed, at least the known ones...
Comment 10 Chris Wilson 2010-07-11 07:13:06 UTC
The kernel patches are upstream as part of 2.6.35-rc4:


commit 1afe3e9d4335bf3bc5615e37243dc8fef65dac8f
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Fri Mar 26 10:35:20 2010 -0700

    drm/i915: gen3 page flipping fixes
    
    Gen3 chips have slightly different flip commands, and also contain a bit
    that indicates whether a "flip pending" interrupt means the flip has
    been queued or has been completed.
    
    So implement support for the gen3 flip command, and make sure we use the
    flip pending interrupt correctly depending on the value of ECOSKPD bit
    0.
    
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>

commit 83f7fd055eb3f1e843803cd906179d309553967b
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Apr 5 14:03:51 2010 -0700

    drm/i915: don't queue flips during a flip pending event
    
    Hardware will set the flip pending ISR bit as soon as it receives the
    flip instruction, and (supposedly) clear it once the flip completes
    (e.g. at the next vblank).  If we try to send down a flip instruction
    while the ISR bit is set, the hardware can become very confused, and we
    may never receive the corresponding flip pending interrupt, effectively
    hanging the chip.
    
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>

I believe that this fixes the page-flipping issues on i945.
Comment 11 Geir Ove Myhr 2010-07-16 00:08:20 UTC
(In reply to comment #10)
> The kernel patches are upstream as part of 2.6.35-rc4:
[...]
> I believe that this fixes the page-flipping issues on i945.

Seems it didn't the current Ubuntu 10.10 (Maverick) development kernel is based on 2.6.35-rc5 and it still happens.

The original reporter says downstream:

It still freezes with Maverick (13th July), same as Tomas.

$cat /sys/kernel/debug/dri/0/i915_error_state
no error state collected

dmesg is attatched here.
dmesg shows that the freeze is caused by mutex lock.

It may be caused by below sequence.
1. mutex locked and not unlocked.
2. DRM_IOCTL_I915_GEM_PREAD wait the lock.
3. dmesg shows error ( 120s after freeze ).
Comment 12 Geir Ove Myhr 2010-07-16 00:10:28 UTC
Created attachment 37104 [details]
dmesg output with Ubuntu kernel 2.6.35-8.13 (based on -rc5)
Comment 13 Chris Wilson 2010-08-08 12:32:12 UTC
(In reply to comment #11)
> dmesg is attatched here.
> dmesg shows that the freeze is caused by mutex lock.
> 
> It may be caused by below sequence.
> 1. mutex locked and not unlocked.
> 2. DRM_IOCTL_I915_GEM_PREAD wait the lock.
> 3. dmesg shows error ( 120s after freeze ).

Or it's a missed interrupt ;-)

Want to place a bet?
Comment 14 Takashi Torigoe 2010-08-08 22:35:13 UTC
(In reply to comment #13)

There were 2 problem, I think.

 (1) screen lock at any time.
 (2) screen lock when 3D use (blender, compiz, ...).

Both of them show the similar dmesg error (lock time over 120s).
But, (1) is unlocked by mouse move or keyboard interrupt.
And, (1) might be fixed by 2.6.35-rc4.

> Or it's a missed interrupt ;-)

I'll bet below :-)

1. mutex locked
2. missed interrupt -> not unlocked.
3. DRM_IOCTL_I915_GEM_PREAD wait the lock.
4. dmesg shows error ( 120s after freeze ).
Comment 15 Chris Wilson 2010-09-06 11:29:02 UTC
Beyond the usual fixes in 2.6.35, 2.6.36-rc2 contains a patch to fixup missed interrupts,

http://cgit.freedesktop.org/~ickle/drm-intel drm-intel-fixes

contains a patch for one observed source of missed interrupts and

http://cgit.freedesktop.org/~ickle/drm-intel drm-intel-next

contains an enhanced hangcheck.
Comment 16 Takashi Torigoe 2010-09-08 08:40:22 UTC
I tested 2.6.36-rc3.
The Freeze may be fixed.
But I got dmesg below.

[  114.364014] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU idle, missed IRQ.
[  352.800012] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU idle, missed IRQ.
[  355.488019] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU idle, missed IRQ.
[  366.724024] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU idle, missed IRQ.

[  114.364014] is caused by blender.
Others are caused by blender or compiz.
Comment 17 Chris Wilson 2010-12-10 07:09:49 UTC
Still need to solve why the interrupt stops firing, but these two bugs have now been reduced to the same problem.

*** This bug has been marked as a duplicate of bug 25345 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.