Bug 33277

Summary: test-privserver always crashes and nobody noticed
Product: dbus Reporter: Simon McVittie <smcv>
Component: coreAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: John (J5) Palmieri <johnp>
Severity: major    
Priority: high CC: hp
Version: 1.4.xKeywords: patch
Hardware: Other   
OS: All   
URL: http://git.collabora.co.uk/?p=user/smcv/dbus-smcv.git;a=shortlog;h=refs/heads/serverdata-33277
Whiteboard: 1.4
i915 platform: i915 features:
Attachments: [1/3] test_server_setup: on OOM while setting up watch/timeout, don't leak sd
[2/3] test_server_setup: allocate two server-data blobs, for watches and timeouts
[3/3] test_server_shutdown: disconnect the server before causing it to be freed
test-privserver-client: wait for server to die between iterations
[alternative patch 2] test_server_setup: allocate two server-data blobs, for watches and timeouts

Description Simon McVittie 2011-01-19 09:28:20 UTC
test-privserver always crashes with an assertion failure, because nobody called dbus_server_disconnect(). As far as I can see from the git history, it's always done this...

There are several things wrong here:

* On one of my branches, the tests fail as a result (I'm not sure why,
  but I'm refactoring the main loop, so it might be timing-related).

* Tests seem as though they ought to be fairly exemplary; this one clearly
  isn't!

* Our test framework doesn't normally notice that part of the test has
  crashed. This is lower-priority, but concerning.

Patches on the way.
Comment 1 Simon McVittie 2011-01-19 09:42:50 UTC
Created attachment 42204 [details] [review]
[1/3] test_server_setup: on OOM while setting up watch/timeout, don't leak sd

A related fix for part of Bug #33128 (see attachment 42058 [details] [review] for an earlier version, rejected in the 18th comment on Bug #33128).
Comment 2 Simon McVittie 2011-01-19 09:43:47 UTC
Created attachment 42205 [details] [review]
[2/3] test_server_setup: allocate two server-data blobs, for watches and timeouts

This is similar to how ConnectionData works. Without this change, we
deserve to segfault: when the first set of callbacks (either watches or
timeouts) is cleaned up, we unref the server and loop, and free sd;
when the second set of callbacks is cleaned up, we use-after-free sd,
the server and the loop, then double-free sd.

However, due to fd.o #33277 we don't even get that far, because we've
already died with an assertion failure.
Comment 3 Simon McVittie 2011-01-19 09:44:32 UTC
Created attachment 42206 [details] [review]
[3/3] test_server_shutdown: disconnect the server before causing it to be freed

Otherwise we die with an assertion failure.

(This is the actual fix for this bug.)
Comment 4 Simon McVittie 2011-01-19 09:55:13 UTC
Those patches don't address the failure to detect failure, but they're enough to drop the severity of this bug once applied.

To test manually:

./test/name-test/test-privserver (in one window)

./test/name-test/test-privserver-client (in another window)

The client test will fail, because test-privserver exits after one request, test-privserver-client wants to make four requests, and test-privserver probably isn't activatable on your normal session bus. However, at least neither actually crashes now.

The error I'm seeing on a work-in-progress branch seems to be unrelated, though; I might have broken dbus-daemon on that branch or something.
Comment 5 Simon McVittie 2011-01-20 05:46:32 UTC
Created attachment 42225 [details] [review]
test-privserver-client: wait for server to die between iterations

This fixes a race condition: the server exits while the client continues
to the next iteration. If the server wins, the test passes. If the client
wins, it sends a message to the dying service, never gets a reply, and the
test fails.

My branch to refactor the main loop for fd.o #23194 seems to make the
client more likely to win this race, resulting in intermittent test
failures.

This is an instance of the general problem described by Bug #11454.

> The error I'm seeing on a work-in-progress branch seems to be unrelated,
> though

It turned out to be this vaguely related race condition.
Comment 6 Colin Walters 2011-01-20 14:20:57 UTC
Comment on attachment 42204 [details] [review]
[1/3] test_server_setup: on OOM while setting up watch/timeout, don't leak sd

>From dae7a0c5eba6d05a2c595c86bcf85cc7839f9b40 Mon Sep 17 00:00:00 2001
>From: Simon McVittie <simon.mcvittie@collabora.co.uk>
>Date: Wed, 19 Jan 2011 17:28:58 +0000
>Subject: [PATCH 1/3] test_server_setup: on OOM while setting up watch/timeout, don't leak sd

Looks good.
Comment 7 Colin Walters 2011-01-20 14:22:52 UTC
Comment on attachment 42205 [details] [review]
[2/3] test_server_setup: allocate two server-data blobs, for watches and timeouts

>From 26d5655dfbc562cc7721826b5d7f4b4a553f356f Mon Sep 17 00:00:00 2001
>From: Simon McVittie <simon.mcvittie@collabora.co.uk>
>Date: Wed, 19 Jan 2011 17:33:31 +0000
>Subject: [PATCH 2/3] test_server_setup: allocate two server-data blobs, for watches and timeouts

This is pretty ugly; how hard is it to refcount serverdata instead?
Comment 8 Colin Walters 2011-01-20 14:23:54 UTC
Comment on attachment 42206 [details] [review]
[3/3] test_server_shutdown: disconnect the server before causing it to be freed

>From ebf31ed7fec8a5a10c621e2a051f918a03d3ed6b Mon Sep 17 00:00:00 2001
>From: Simon McVittie <simon.mcvittie@collabora.co.uk>
>Date: Wed, 19 Jan 2011 17:34:10 +0000
>Subject: [PATCH 3/3] test_server_shutdown: disconnect the server before causing it to be freed

Looks fine.
Comment 9 Simon McVittie 2011-01-21 06:25:23 UTC
Created attachment 42269 [details] [review]
[alternative patch 2] test_server_setup: allocate two server-data blobs, for watches and timeouts

(In reply to comment #6)
> >Subject: [PATCH 1/3] test_server_setup: on OOM while setting up watch/timeout, don't leak sd
> 
> Looks good.

Thanks, applied.

(In reply to comment #7)
> >Subject: [PATCH 2/3] test_server_setup: allocate two server-data blobs, for watches and timeouts
> 
> This is pretty ugly; how hard is it to refcount serverdata instead?

I started writing that first, in fact; it's easy, but seemed an unreasonable amount of boilerplate for a 2-word blob containing two refcounted objects, particularly for test code, so I threw it away.

4 insertions seem better for my insertion/deletion karma than 23 insertions/6 deletions, but if you'd strongly prefer the refcounted version, here it is (the branch is serverdata-33128-alt in my git repository).

I haven't applied Attachment #42206 [details] yet despite your positive review, since it isn't enough to make the test not crash (either Attachment #42205 [details] or this one is necessary first).

For reliable tests, Attachment #42225 [details] would also be good to have.
Comment 10 Simon McVittie 2011-01-24 08:22:29 UTC
I think this is unintrusive enough for 1.4.
Comment 11 Will Thompson 2011-02-02 06:53:37 UTC
Review of attachment 42225 [details] [review]:

Looks good.
Comment 12 Simon McVittie 2011-02-02 07:19:50 UTC
Reviewed by wjt on IRC and fixed in git for 1.4.4, 1.5.0. Thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.