Bug 23418

Summary: [GM45] 3D apps w/ Wine cause X Crash in driDestroyContext()
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Driver/intelAssignee: Ian Romanick <idr>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high CC: brian, eric225125, eric, hanno, jaimerave, jassmith, linuxhippy, manisandro
Version: 7.4 (2008.09)Keywords: regression
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
gdb-xorg.txt
none
Xorg.0.log
none
lshw
none
Wine output preceeding hang
none
Testcase
none
Testcase binary none

Description Bryce Harrington 2009-08-19 14:40:50 UTC
Forwarding this bug from Ubuntu reporter Sandro Mani:
http://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/401067

[Problem]
Occasionally X crashes while using 3d apps in wine, using the Xorg edgers PPA.  It does not crash with stock Karmic.

[Original Description]
Running an up-to-date Ubuntu Karmic x86_64, hardware details attached.

For some background, wine is failing on any attempt to run a non-native 3D application, with the following scenarios (in order of probability):
1) Immediate GLXBadDrawable, wine exits
2) The usual sequence of "WINED3DFMT_A8R8G8B8" and similar error messages (see below) followed by a GLXBadDrawable + exit.
3) The usual "WINED3DFMT_A8R8G8B8" plus wine displaying an empty window frame, the should-be 3D content missing, manual CTRL+C to exit.

The above can happen both with xorg-edgers and stock Karmic.  But with xorg-edgers (only), it also can result in a crash of the X server.  The X crash is what this bug report will focus on.

The wine output reads:

fixme:d3d_caps:IWineD3DImpl_FillGLCaps Received unrecognized GL_VENDOR Tungsten Graphics, Inc. Setting VENDOR_WINE.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R8G8B8 with rendertarget flag is not supported as FBO color attachment, and no fallback specified.
fixme:d3d:check_fbo_compat Format WINED3DFMT_A8R8G8B8 with rendertarget flag is not supported as FBO color attachment, and no fallback specified.
fixme:d3d:check_fbo_compat Format WINED3DFMT_X8R8G8B8 with rendertarget flag is not supported as FBO color attachment, and no fallback specified.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R5G6B5 rtInternal format is not supported as FBO color attachment.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R16G16_UNORM rtInternal format is not supported as FBO color attachment.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R16G16B16A16_UNORM with rendertarget flag is not supported as FBO color attachment, and no fallback specified.
fixme:win:EnumDisplayDevicesW ((null),0,0x32f7e0,0x00000000), stub!

It does not make any difference whether I run compiz or metacity.

Xorg.0.log is attached. Additionally I did some other testing, and I forgot to mention that the attempts often end with messages like

X Error of failed request: GLXBadDrawable
  Major opcode of failed request: 153 (GLX)
  Minor opcode of failed request: 5 (X_GLXMakeCurrent)
  Serial number of failed request: 505
  Current serial number in output stream: 505

Also here, if I try repeatedly, occasionally one attempt wont end with GLXBadDrawable but will either crash X or terminate on it's own/hang shortly after the

fixme:d3d_caps:IWineD3DImpl_FillGLCaps Received unrecognized GL_VENDOR Tungsten Graphics, Inc. Setting VENDOR_WINE.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R8G8B8 with rendertarget flag is not supported as FBO color attachment, and no fallback specified.
fixme:d3d:check_fbo_compat Format WINED3DFMT_A8R8G8B8 with rendertarget flag is not supported as FBO color attachment, and no fallback specified.
fixme:d3d:check_fbo_compat Format WINED3DFMT_X8R8G8B8 with rendertarget flag is not supported as FBO color attachment, and no fallback specified.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R5G6B5 rtInternal format is not supported as FBO color attachment.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R16G16_UNORM rtInternal format is not supported as FBO color attachment.
fixme:d3d:check_fbo_compat Format WINED3DFMT_R16G16B16A16_UNORM with rendertarget flag is not supported as FBO color attachment, and no fallback specified.

block.
I have never found any (EE) entries after a crash in Xorg.0.log (or in any other log as I mentioned above...)

Also native linux applications seem to have problems: I tried compiling and running vdrift, and X froze as soon as a track finished loading. Sound playback did continue for a while though, which seems to indicate that only X froze and not the whole system... Again, I did not find any helpful information in any logs...

I had a look at different setups and got the following results:
- Both Jaunty and Karmic, GM45 (the Lenovo T400 I am experiencing the problem on): crash as described
- Jaunty, proprietary ATI (HD3400) and proprietary NVIDIA: no problems of this kind (though I know nvidia has it's own driver framework)
- Jaunty, opensource ATI (Lenovo T60, x1400, xserver-xorg-video-ati): Often GlxBadDrawable errors, but once in a while the applications started without any subsequent crashing.
- Jaunty, older intel graphics (Intel GMA900): no problems of this kind

The crashing itself as described in the initial description only appeared using the intel driver on the GM45 chipset, therefore I suspect that the problem lies there. On the other hand the GlxBadDrawable error seems to be a more widespread problem, though I didn't have the chance to test the newest packages on the T60 system as I do not own the laptop.

Anyway, for the X crash, here's the backtrace:

Backtrace:
0: /usr/bin/X(xorg_backtrace+0x26) [0x4eff56]
1: /usr/bin/X(xf86SigHandler+0x41) [0x480761]
2: /lib/libc.so.6 [0x7fdbd57f1cd0]
3: /usr/lib/dri/i965_dri.so(intelDestroyContext+0xeb) [0x7fdbd36c898b]
4: /usr/lib/dri/i965_dri.so [0x7fdbd36be820]
5: /usr/lib/xorg/modules/extensions//libglx.so [0x7fdbd4ada059]
6: /usr/lib/xorg/modules/extensions//libglx.so(__glXFreeContext+0x6c) [0x7fdbd4acf9fc]
7: /usr/lib/xorg/modules/extensions//libglx.so [0x7fdbd4acfa33]
8: /usr/bin/X(FreeResourceByType+0x11f) [0x435e4f]
9: /usr/lib/xorg/modules/extensions//libglx.so [0x7fdbd4acc2ee]
10: /usr/lib/xorg/modules/extensions//libglx.so [0x7fdbd4acfc99]
11: /usr/bin/X(Dispatch+0x384) [0x44dff4]
12: /usr/bin/X(main+0x3b5) [0x433fa5]
13: /lib/libc.so.6(__libc_start_main+0xe6) [0x7fdbd57dd606]
14: /usr/bin/X [0x433429]


(gdb) backtrace full
#0  intelDestroyContext (driContextPriv=0x53599a0) at intel_context.c:851
        driDrawPriv = 0x4a036e0
        intel_fb = 0x300000000
        irbDepth = <value optimized out>
        irbStencil = <value optimized out>
        intel = 0x472e070
        __PRETTY_FUNCTION__ = "intelDestroyContext"
#1  0x00007ff6d83878d0 in driDestroyContext (pcp=0x53599a0)
    at ../common/dri_util.c:545
No locals.
...
Comment 1 Bryce Harrington 2009-08-19 14:41:58 UTC
Created attachment 28794 [details]
gdb-xorg.txt
Comment 2 Bryce Harrington 2009-08-19 14:43:44 UTC
Created attachment 28795 [details]
Xorg.0.log

This is an older Xorg.0.log (from well prior to collecting the backtrace).  If a fresher Xorg.0.log would help, just ask.
Comment 3 Bryce Harrington 2009-08-19 14:45:31 UTC
Created attachment 28796 [details]
lshw
Comment 4 Ian Romanick 2009-09-10 11:03:54 UTC
I think this may be fixed in xserver master by this commit:

commit 120286aef59dabdb7c9fa762e08457e5cc8ec3a6
Author: Michel Dänzer <daenzer@vmware.com>
Date:   Thu Sep 3 08:05:59 2009 +0200

    glx: Add screen DestroyWindow wrapper to destroy the GLX drawable.
    
    Fixes crashes exitting MacSlow's rgba-glx demo.

Could you try that?
Comment 5 Sandro Mani 2009-09-10 16:06:43 UTC
Created attachment 29398 [details]
Wine output preceeding hang

Tested with the latest development snapshot of fedora 12, same hardware as in the attached lshw. Not wine neither crashes X nor crashes with a glx_baddrawable error, but instead completely locks up the computer at a point that even ssh hangs (i.e. I am able to connect but not to do anything else) - hard power-off is the only remaining option. The console output of wine is attached.
Comment 6 Ian Romanick 2009-09-10 18:38:28 UTC
(In reply to comment #5)
> Created an attachment (id=29398) [details]
> Wine output preceeding hang
> 
> Tested with the latest development snapshot of fedora 12, same hardware as in
> the attached lshw. Not wine neither crashes X nor crashes with a
> glx_baddrawable error, but instead completely locks up the computer at a point
> that even ssh hangs (i.e. I am able to connect but not to do anything else) -
> hard power-off is the only remaining option. The console output of wine is
> attached.

I have no idea what versions those are, so it's not a very useful data point.  Do you know the SHA1 for the commits?

From the Wine log, it looks like they're blindly trying to use a shader that failed to compile.
Comment 7 Jaime Rave 2009-09-10 18:52:46 UTC
Hi Sandro, can you  give a sample app than can be tested to provide more information. You can put the download link here.
Comment 8 Sandro Mani 2009-09-11 02:37:19 UTC
Hello,
sorry but I am indeed not very familiar with commit sha's... I tested on fedora because it was the distribution using the most recent snapshot of xorg7.5 that kame to my mind, namely 7.6.99.900 (1.7.0 RC 0) (built on 07 September 2009 02:00:06AM).
This testcase was done with TrackMania United/Nations Forever, Nations can be freely downloaded at http://www.trackmania.com/index.php?rub=downloads. This program usually runs fine on Wine, notice you need to copy a d3dx9_36.dll (http://www.dll-files.com/dllindex/pop.php?d3dx9_36) to system32.

Comment 9 Brian Rogers 2009-09-13 21:24:43 UTC
Google Earth can trigger this as well. But I also have a small Windows app, with source code, that can trigger this bug. I'm going to strip it down to a minimum testcase and attach the source and binary.

The problem appears to be memory corruption. A pointer is being overwritten by the time driDestroyContext() is called. It can be a different pointer each time, and sometimes the crash doesn't occur at all.
Comment 10 Brian Rogers 2009-09-13 23:56:20 UTC
Specifically, intel_fb (driDrawPriv->driverPrivate) is the corrupted pointer. It often winds up pointing into libc or what I believe is graphics memory (shows up as "/drm mm object (deleted)" in /proc/<pid>/maps).

Surprisingly this doesn't always lead to a crash because the targeted memory often contains nulls or pointers to valid memory locations in the right places.
Comment 11 Brian Rogers 2009-09-14 06:07:23 UTC
I was able to get this out of valgrind:

==31602== Invalid read of size 8
==31602==    at 0xC29C0F4: intelDestroyContext (intel_context.c:877)
==31602==    by 0xC28CB7A: driDestroyContext (dri_util.c:545)
==31602==    by 0x80FE505: __glXDRIcontextDestroy (glxdri2.c:192)
==31602==    by 0x80ED0A1: __glXFreeContext (glxext.c:211)
==31602==    by 0x80ECD9F: ContextGone (glxext.c:110)
==31602==    by 0x437D55: FreeResourceByType (resource.c:598)
==31602==    by 0x80E333F: __glXDisp_DestroyContext (glxcmds.c:370)
==31602==    by 0x80ED95E: __glXDispatch (glxext.c:578)
==31602==    by 0x439AEC: Dispatch (dispatch.c:445)
==31602==    by 0x42678A: main (main.c:285)
==31602==  Address 0x1bbdc508 is 8 bytes inside a block of size 144 free'd
==31602==    at 0x4C255FD: free (vg_replace_malloc.c:323)
==31602==    by 0xC3796CC: _mesa_free (imports.c:85)
==31602==    by 0xC28CB33: dri_put_drawable (dri_util.c:516)
==31602==    by 0xC28CB50: driDestroyDrawable (dri_util.c:523)
==31602==    by 0x80FE2B7: __glXDRIdrawableDestroy (glxdri2.c:105)
==31602==    by 0x80ECF57: DrawableGone (glxext.c:163)
==31602==    by 0x437C09: FreeResource (resource.c:562)
==31602==    by 0x45AED1: CrushTree (window.c:877)
==31602==    by 0x45AFF2: DeleteWindow (window.c:914)
==31602==    by 0x437C09: FreeResource (resource.c:562)
==31602==    by 0x43A78F: ProcDestroyWindow (dispatch.c:751)
==31602==    by 0x439AEC: Dispatch (dispatch.c:445)

There's a race. intelDestroyContext() and __glXDRIdrawableDestroy() can be called in either order when the program closes, but the Intel mesa code doesn't do refcounting on the drawable. So if intelDestroyContext() is called second, the drawable is already destroyed and free'd, and may already be overwritten. Crash.
Comment 12 Brian Rogers 2009-09-14 06:54:34 UTC
Created attachment 29520 [details]
Testcase

Install mingw32, then compile like so:
i586-mingw32msvc-gcc -Wall -o testcase.exe testcase.c -lopengl32 -lgdi32

Run it in wine and close it until you see the crash. It might take around five or six tries.
Comment 13 Brian Rogers 2009-09-14 06:58:39 UTC
Created attachment 29521 [details]
Testcase binary
Comment 14 Ian Romanick 2009-09-15 11:30:13 UTC
Adding Michel Dänzer to the CC list.  He has worked in this area, so this might ring a bell for him.
Comment 15 Ian Romanick 2009-09-15 13:15:54 UTC
I'm going to try and fix this bug.  However, the ONLY way I have found to get into __glXDRIdrawableDestroy is by calling glXDestroyWindow.  glXDestroyWindow is part of GLX 1.3, and the current X server do *NOT* support GLX 1.3.  Calling glXDestroyWindow is a bug in the application.  Crashing because of it is a bug in the driver.  How can the application (Wine, in this case) expect anything good to happen when it calls unsupported extension functions?
Comment 16 Michel Dänzer 2009-09-15 14:52:59 UTC
(In reply to comment #14)
> Adding Michel Dänzer to the CC list.  He has worked in this area, so this
> might ring a bell for him.

Thanks, but no need, I read the xorg-team list. I didn't have any ideas, but it looks like you have a lead anyway.
Comment 17 Ian Romanick 2009-09-16 08:07:04 UTC
commit 2921a2555d0a76fa649b23c31e3264bbc78b2ff5
Author: Ian Romanick <ian.d.romanick@intel.com>
Date:   Wed Sep 16 07:39:58 2009 -0700

    intel: Deassociated drawables from private context struct in intelUnbindContext
    
    The generic DRI infrastructure makes sure that __DRIcontextRec::driDrawablePriv
    and __DRIcontextRec::driReadablePriv are set to NULL after unbinding a
    context.  However, the intel_context structure keeps cached copies of
    these pointers.  If these cached pointers are not NULLed and the
    drawable is actually destroyed after unbinding the context (typically
    by way of glXDestroyWindow), freed memory will be dereferenced in
    intelDestroyContext.
    
    This should fix bug #23418.
Comment 18 Ian Romanick 2009-09-16 23:56:34 UTC
*** Bug 22691 has been marked as a duplicate of this bug. ***
Comment 19 Ian Romanick 2009-09-17 00:00:08 UTC
*** Bug 22863 has been marked as a duplicate of this bug. ***
Comment 20 Ian Romanick 2009-09-17 00:11:40 UTC
*** Bug 23477 has been marked as a duplicate of this bug. ***
Comment 21 Ian Romanick 2009-09-17 00:22:21 UTC
*** Bug 22110 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.