Bug 21582

Summary: [radeon-rewrite] crashes server through radeonRefillCurrentDmaRegion
Product: Mesa Reporter: Tormod Volden <bugzi11.fdo.tormod>
Component: Drivers/DRI/r300Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: fatih, lowell87, pavel, pedretti.fabio
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: full backtrace of Xorg
Xorg log with backtrace
gdm log with mismatch and assertion
full backtrace of Xorg
Additional backtrace
Torcs crashing when trying to tart the race

Description Tormod Volden 2009-05-05 15:22:31 UTC
Created attachment 25523 [details]
full backtrace of Xorg

As soon as I press Alt-Tab to cycle windows (running compix), X crashes.

This is with latest radeon-rewrite and -ati driver on Ubuntu 9.04 with 2.6.30-rc based kernel.

Seems to be the same backtrace as in https://bugzilla.redhat.com/show_bug.cgi?id=498478, but I can attach a full backtrace.
Comment 1 Tormod Volden 2009-05-05 15:23:50 UTC
Created attachment 25524 [details]
Xorg log with backtrace
Comment 2 Pavel Rojtberg 2009-05-07 06:06:28 UTC
same trigger. same backtrace. same stack - except for the kernel being 2.6.28. (jaunty default)
Comment 3 Alex Deucher 2009-05-07 09:35:42 UTC
*** Bug 21618 has been marked as a duplicate of this bug. ***
Comment 4 Maciej Cencora 2009-05-14 05:31:04 UTC
The problem is in radeonRefillCurrentDmaRegion:
we call radeon_revalidate_bos which calls radeonFlush which frees rmesa->dma.current (only if some conditions are met) so we end up dereferencing null pointer by radeon_bo_map.

There are two solutions:
- remove radeon_revalidate_bos from radeonRefillCurrentDmaRegion,
- check if rmesa->dma.current is null after calling radeon_revalidate_bos and create new bo if necessary

Both solutions proved to be working, unfortunately I don't know which one is the correct one. If Jerome Glisse doesn't know too we probably have to wait for Dave Airlie to decide.
Comment 5 Tormod Volden 2009-05-17 03:33:43 UTC
I naively commented out the radeon_revalidate_bos line, but then Xorg crashes at startup. Is the workaround more complicated? This bug keeps me from testing radeon-rewrite much, so a temporary workaround would be most welcome if the real fix has to wait.
Comment 6 Maciej Cencora 2009-05-17 05:17:17 UTC
(In reply to comment #5)
> I naively commented out the radeon_revalidate_bos line, but then Xorg crashes
> at startup. Is the workaround more complicated? This bug keeps me from testing
> radeon-rewrite much, so a temporary workaround would be most welcome if the
> real fix has to wait.
> 

Can you post a backtrace of where it is crashing with commented out radeon_revalidate_bos call?
Comment 7 Tormod Volden 2009-05-17 07:22:11 UTC
It did not crash now. Last time, I had tested it against 76a64958a4ca38ec27b63a909979c493c507b952 so it was probably compiz and bug 21776 that kicked in and got me confused.
Comment 8 Tormod Volden 2009-05-18 10:52:56 UTC
I am not 100% sure this is relevant, but when I do the window cycling with alt-tab in compiz, there is now some lag and it sometimes hangs for up to a few seconds on my M26 card. On my RV515 there is no lag.
Comment 9 Jerome Glisse 2009-05-20 13:24:08 UTC
Commit a13e96359baaa0331561f86ef6487feba6540464 should bring definitive fix for this issue please reopen if it's not the case.
Comment 10 Tormod Volden 2009-05-24 15:47:37 UTC
I am afraid I still see the original problem even after updating to 7dd184dc4da37233471875df6f40cce0560cb7bc.
Comment 11 Jerome Glisse 2009-05-25 03:01:35 UTC
This time 9b1efcb87c794ded9306f01336d48a80aaad3261 (commit just after the one you tested last) fix the issue :), once again if it's not the case reopen.
Comment 12 Tormod Volden 2009-05-25 11:05:43 UTC
Created attachment 26207 [details]
gdm log with mismatch and assertion

With 9dee2f20... I don't get the same backtrace, but X dies and I can only find some errors and a failed assertion in the gdm log:

CS section size missmatch start at (r300_cmdbuf.c,emit_tex_offsets,182) 4 vs 2
CS section end at (r300_cmdbuf.c,emit_tex_offsets,202)
X: radeon_common.c:1008: radeon_validate_bo: Assertion `radeon->state.validated_bo_count < 24' failed.
Comment 13 Tormod Volden 2009-05-25 11:09:24 UTC
BTW, there was also a "failed to revalidate buffers" in between the mismatch errors in the log.
Comment 14 Tormod Volden 2009-05-27 10:14:46 UTC
Created attachment 26254 [details]
full backtrace of Xorg

The mismatch messages come all the time, they are not causing the crash.
Comment 15 Lowell Alleman 2009-05-27 15:11:09 UTC
Created attachment 26265 [details]
Additional backtrace

Not sure if additional backtraces will be helpful on this or not... mine looks slightly different.

Is there a way to track down what X command is being sent to cause this bug?  I have a specific button that causes this crash every time I click it...  Can I trace the X protocol between client/server to get additional info to help track this down?
Comment 16 Jerome Glisse 2009-05-28 01:43:52 UTC
This is different bug, Tormod are you using KMS ? Does gdm crash with compiz enabled ? disabled ? I can't reproduce the bug here with kms or not. Is it with rv515 ? Others ?
Comment 17 Jerome Glisse 2009-05-28 02:43:15 UTC
I pushed change to how we emit texture offset in r300, please test with 2f9189d538ac56bd241ccc8f8f82bc4fdd779aa6 and report if it helps for this new issue.
Comment 18 Tormod Volden 2009-05-28 04:19:28 UTC
> This is different bug, Tormod are you using KMS ? Does gdm crash with compiz
> enabled ? disabled ? I can't reproduce the bug here with kms or not. Is it with
> rv515 ? Others ?

It appears the same as initially reported, the backtrace has only changed to issue an assert instead of an SEGV:
"As soon as I press Alt-Tab to cycle windows (running compix), X crashes.

This is with latest radeon-rewrite and -ati driver on Ubuntu 9.04 with
2.6.30-rc based kernel."

So I am not using KMS, and gdm does not crash. It happens on M26 and has never happened on RV515.

I have to correct what I said before on lag: I sometimes do see this lag on RV515 also. (Similar to the lag I saw on M26 when I worked around the crash by commenting out radeon_revalidate_bos.) So the lag is likely unrelated. Just that every time it lags, my heart jumps and I think it is the crash kicking in :)

I will try 2f9189d538ac56bd241ccc8f8f82bc4fdd779aa6 later today. Thanks!
Comment 19 Tormod Volden 2009-05-28 06:49:58 UTC
I could test 2f9189d5 on RV515 now, and alt-tab (with compiz) makes it crash:

X: radeon_common.c:1008: radeon_validate_bo: Assertion `radeon->state.validated_bo_count < 24' failed.

On the good side of things, the mismatch messages are gone now.
Comment 20 Tormod Volden 2009-05-28 10:17:57 UTC
Tested 5dcbcbfca4f3c00de1fdab28d1cc8d691f67edce on both RV515 and M26 and got the same assertion failure.
Comment 21 Jerome Glisse 2009-05-29 01:20:13 UTC
Did you restarted Xorg after installing lastest rewrite lib ? Also does Xorg load the new driver ? I have no luck reproducing your bug, how much window do you have open ? Which software ?

I test with compiz + firefox + midori + several terminal all running at the same time and cycling through window with alt-tab does work properly no crash.
Comment 22 Tormod Volden 2009-05-29 02:43:27 UTC
Yes, I always restart X after installing the new mesa. I make distribution packages so I am sure the new mesa overwrites the old and there is only one version installed on my machine at any time.

To reproduce my setup you can boot a Ubuntu 9.04 live CD and then install these packages on top of it: https://launchpad.net/~xorg-edgers/+archive/radeon

After logging in to the default Gnome session, I open two gnome-terminal windows and press alt-tab. I have noticed that I sometimes can swap windows without crashing if I press alt-tab for only a very short moment. But if I keep it down to get the window selector displayed, it crashes.

On what cards have you tried?
Comment 23 Lowell Alleman 2009-05-29 07:11:35 UTC
I have an M10 and I'm also using Tormod's radeon-rewrite packages.  I'm able to reproduce the issue consistently using Amarok (KDE music player) package version 2.0.2mysql5.1.30-0ubuntu3.  I can cause the crash by opening up Amarok, and bringing up the "Collection" panel.  As soon as as I click the "Advanced" button (part of the search interface) at the top of that panel, Xorg crashes with the backtrace I previously provided.

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] [1002:4e50]
       Subsystem: IBM Device [1014:0550]

I don't remember if I was running the stock 2.28.12-generic kernel, or the 2.6.29-02062902-generic kernel at the time I got this backtrace.

If there is some way to get some additional state information, or somehow trace the X client/server communications to determine what command/request causes this, let me know.
Comment 24 Nicolai Hähnle 2009-05-31 06:15:44 UTC
I could not reproduce this bug, unfortunately.

System:
- Ubuntu 9.04 (Jaunty)
- Standard kernel
- Packages from:
 * deb http://ppa.launchpad.net/xorg-edgers/radeon/ubuntu jaunty main
 * deb http://ppa.launchpad.net/tormodvolden/ppa/ubuntu jaunty main

Compiz is running, glxinfo confirms that the system is running radeon-rewrite in DRI1/non-KMS mode. Neither Alt+Tab nor the Amarok steps mentioned in comment #23 crash anything.

Graphics card is a Radeon X1650 Pro, connected via AGP (PCI ID 1002:71c1, should be an RV530/RV535 if I recall correctly)
Comment 25 Nicolai Hähnle 2009-05-31 06:29:02 UTC
Following up on #24:

Replaced the graphics card with a Radeon 9700 Pro (R300), still no crashes with the same system setup.
Comment 26 Tormod Volden 2009-05-31 11:12:48 UTC
I wouldn't be surprised if these issues are very card specific, since I originally did not see crashes on RV515.
Comment 27 Tormod Volden 2009-06-06 03:32:52 UTC
I might add that I can not reproduce when using KMS and DRI2. The mesa version is the same, but the DDX is then glisse's latest and libdrm has a patch from zhasha for libdrm-radeon.
Comment 28 Tormod Volden 2009-06-12 00:19:14 UTC
Just confirming that this bug now is in latest git master. Any ideas how I can debug this or provide useful information?

X: radeon_common.c:1008: radeon_validate_bo: Assertion `radeon->state.validated_bo_count < 32' failed.
Comment 29 Fabio Pedretti 2009-06-12 08:13:43 UTC
Also the game sauerbraten has this problem, when using the aqueducts map. It crashes with:

sauer_client: radeon_common.c:1008: radeon_validate_bo: Assertion `radeon->state.validated_bo_count < 32' failed.
Aborted

As suggested in IRC I tried changing RADEON_MAX_BOS to some bigger value (in radeon_common_context.h) but I get the assertion also with 64.
Comment 30 Michel Dänzer 2009-06-15 06:16:25 UTC
I'm also seeing the assertion failure, I have to use a pre-radeon-rewrite master snapshot for compiz...

Do those who can't reproduce it build the driver with --enable-debug?
Comment 31 Tormod Volden 2009-06-15 07:28:03 UTC
> Do those who can't reproduce it build the driver with --enable-debug?

Nicolai can maybe comment on this himself, but in comment 24 he tested the same binaries as I did.
Comment 32 Pavel Rojtberg 2009-06-15 10:09:40 UTC
this is probably only happens on Mxx series. I experience this bug and have an M56GL, tormod has an M26 and lowell an M10.
Comment 33 Tormod Volden 2009-06-15 12:12:12 UTC
I could also reproduce it on my RV515.
Comment 34 Tormod Volden 2009-06-16 04:20:07 UTC
FYI, the compiz effect on alt-tab which crashes is the "Static Application Switcher". It has a "mipmap" option (in compizconfig-settings-manager) but disabling it does not help. OTOH the "Application Switcher" does not crash.

Sometimes I can switch windows successfully although with some lag. There is a "WARNING! Falling back to software for invalid buffers" message which can be correlated to this but I am not sure.
Comment 35 Pauli 2009-06-19 04:07:20 UTC
Created attachment 26956 [details]
Torcs crashing when trying to tart the race

Torcs is causing same assertion failure but is it same bug?

I can reproduce this every time using DRI2 and git master of mesa. (r280 hw)
Comment 36 Michel Dänzer 2009-06-19 08:12:16 UTC
(In reply to comment #35)
> Torcs is causing same assertion failure but is it same bug?

Apparently not, compiz works for me with Dave's latest fix from master.

Let's track the torcs problem or any other remaining issues in separate reports.
Comment 37 Tormod Volden 2009-06-20 03:38:52 UTC
I can confirm that everything now works perfectly with latest git. Thanks a lot!
Comment 38 Fabio Pedretti 2009-06-23 06:32:38 UTC
I am still having the "Assertion `radeon->state.validated_bo_count < 32' failed." problem with sauerbraten (which appears to be the same bug of torcs reported by Pauli).

Bug filed at https://bugs.freedesktop.org/show_bug.cgi?id=22438 .

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.