Bug 38005

Summary: dbus_connection_(un)ref is not threadsafe
Product: dbus Reporter: Will Manley <freedesktop>
Component: coreAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: John (J5) Palmieri <johnp>
Severity: normal    
Priority: medium CC: cosimo.alfarano, hp, ralf.habacker, smcv, will
Version: 1.4.xKeywords: patch
Hardware: x86 (IA32)   
OS: Linux (All)   
URL: http://cgit.freedesktop.org/~smcv/dbus/log/?h=actually-atomic-38005
Whiteboard: NB#263436
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 36074, 37286, 39231    
Attachments: Consistently use atomic operations for all access to DBusConnection refcount
[PATCH 1/8] DBusPendingCall: exclusively use atomic operations on the refcount
[PATCH 2/8] DBusServer: exclusively use atomic operations on the refcount
[PATCH 1/9] DBusMessageFilter: exclusively use atomic accesses to refcount
[PATCH 4/8] DBusConnection: use atomic accesses to refcount, even for assertions
[PATCH 5/8] dbus-memory: use atomic accesses to block count, even for assertions
[PATCH 6/8] DBusMessage: always access refcount atomically
[PATCH 2/9] dbus_message_unref: make an assertion more strict
[PATCH 3/9] DBusObjectTree: always access refcount atomically
[PATCH 4/9] Add _dbus_atomic_get implemented in terms of inc, dec
[PATCH 5/9] _dbus_connection_close_if_only_one_ref: use _dbus_atomic_get
[PATCH 6/9] dbus_message_ref: avoid unused variable if not asserting
[PATCH 7/9] DBusConnection: use atomic accesses to refcount in assertions/initial ref
[PATCH 8/9] dbus-memory: use atomic accesses to block count, even for assertions
[PATCH 9/9] DBusMessage: always access refcount atomically, even for assertions/initial ref
[Extra patch] Implement _dbus_atomic_get directly, rather than via inc + dec
Implement _dbus_atomic_get directly, rather than via inc + dec (v2)

Description Will Manley 2011-06-06 12:48:19 UTC
Minimal test case attached fails with:

process 27240: arguments to dbus_connection_ref() were incorrect, assertion "connection->generation == _dbus_current_generation" failed in file dbus-connection.c line 2557.

I suspect it has something to do with _dbus_connection_(un)ref_unlocked not using _dbus_atomic_inc/dec while the corresponding _dbus_connection_(un)ref always use dbus atomics.

In the _unlocked case atomic instructions are used if DBUS_HAVE_ATOMIC_INT is defined (doesn't seem to be here).  In the other cases atomics are used unconditionally (there is actually a "#if 1" block "protecting" the atomics).

Further investigation pending....

Software version details:

libdbus version 1.4.8 (with commit 53db0b "Don't force use of -fPIE for the dbus-daemon if apparently supported" backported)

gcc (GCC) 4.4.4 20100726 (Red Hat 4.4.4-13)

Scientific Linux 6
Comment 1 Will Manley 2011-06-06 13:52:33 UTC
Hold on, I'm being an idiot.  There was an error in my minimal test case.  It should have read sizeof(t)/sizeof(*t) rather than sizeof(t) in the loop.  I will reconstruct the test to show my original problem.
Comment 2 Simon McVittie 2011-06-07 03:59:43 UTC
Providing a test case would be great - 

That said, this is a mess. Of the things that manipulate the refcount:

* ref_unlocked prefers atomic ops, but will do plain int ops if atomic ops
  are implemented in terms of a global lock; its justification for doing that
  is that the connection lock protects the refcount, but that isn't actually true

* unref_unlocked is the same as ref_unlocked

* close_if_only_one_ref relies on the connection lock regardless of whether
  we have atomic int ops, and even has a comment that says
  "If we didn't atomically check the refcount and close with the lock held
  though, we could screw this up" - guess what? :-P

* The public ref, unref always use atomic ops even if they're implemented in
  terms of a global lock, because:

  - the timeout and watch functions are called with the connection lock held
  - we allow reffing/unreffing connections from their own timeout/watch
    callbacks
  - we don't guarantee that locks are recursive

close_if_only_one_ref has really annoying semantics, particularly if you're trying to use lockless atomic ops.

It might be better if we consistently used the connection lock to protect the refcount, and just made sure our locks were always recursive (but see Bug #36204, Bug #36205).
Comment 3 Thiago Macieira 2011-06-07 04:45:43 UTC
> It might be better if we consistently used the connection lock to protect the
> refcount, and just made sure our locks were always recursive (but see Bug
> #36204, Bug #36205).

It needs to be the same method everywhere. Using atomic in some places and locked non-atomic in others introduces problems -- the most problematic one is that the locked decrease starts at 2, thinks it ended at 1, but is actually at 0.
Comment 4 Simon McVittie 2011-06-07 06:08:30 UTC
Created attachment 47660 [details] [review]
Consistently use atomic operations for all access to  DBusConnection refcount

Trying to mix atomic operations with locked non-atomic operations is
broken: the atomic ops aren't necessarily atomic with respect to the
locked non-atomic ops, and the non-atomic ops aren't protected by the
lock because the atomic ops can change the refcount behind their back.

In theory we could use the connection lock if atomic ops aren't supported
(making a per-connection lock cheaper than the global lock used to
implement atomic ops) *and* our mutexes are recursive (making it safe
against deadlocks)... but life's too short.
Comment 5 Simon McVittie 2011-06-07 06:10:15 UTC
Will, I'd be interested to know whether that patch fixes your usage.

If you can construct a minimal test case, would you mind contributing it to libdbus so we can use it as a regression test?
Comment 6 Will Manley 2011-06-07 10:04:37 UTC
Well, it certainly stops helgrind from moaning.  Trying to reproduce.  Unfortunately the error only exhibited in 0.1% of test runs.

Thanks
Comment 7 Will Manley 2011-06-08 03:10:45 UTC
Hmm, the patch doesn't seem to have had a statistically significant effect.  Previously my test failed 11/818 (1.3%) times and with the patch applied it has failed 120/13132 (0.9%).

This seems odd to me as yesterday I made a hacky attempt at patching the non-atomic stuff out yesterday and it seemed to fix it.  Maybe testing my attempt wasn't statistically significant.

I will attempt further testing and try to reproduce it more reliably with a smaller test case.
Comment 8 Simon McVittie 2011-06-08 03:47:05 UTC
(In reply to comment #7)
> Hmm, the patch doesn't seem to have had a statistically significant effect. 
> Previously my test failed 11/818 (1.3%) times and with the patch applied it has
> failed 120/13132 (0.9%).

Could you share your test case, please? If it's using libdbus correctly then it really ought to succeed every time (and we should fix libdbus until it does), and if not, all bets are off.
Comment 9 Will Manley 2011-06-08 04:29:06 UTC
Unfortunately my test case is not written against libdbus, but is abstracted from it.  I will try and pare it down until I have a minimal test case against libdbus that runs quickly and fails reliably
Comment 10 Will Manley 2011-06-10 07:28:44 UTC
(In reply to comment #5)
> Will, I'd be interested to know whether that patch fixes your usage.

On closer inspection the decrease from 1.3% to 0.8% probably was significant.

Hiding behind the bug caused by inconsistent application of atomics was a bug in my application code which I have now fixed.  With both patches applied I find no failures, with only one of the two patches applied I get some failures.

So, yes the patch fixes my usage.  Thank you.

Minimal test case still pending...
Comment 11 Simon McVittie 2011-06-10 07:37:25 UTC
(In reply to comment #10)
> Hiding behind the bug caused by inconsistent application of atomics was a bug
> in my application code which I have now fixed.  With both patches applied I
> find no failures, with only one of the two patches applied I get some failures.

By "both patches" you mean Attachment #47660 [details] (only) applied to dbus, and a second patch applied to your application, right?

If you can't construct a minimal test case, but you have a non-minimal test case that can be made public, that'd be better than nothing!

Review cabal: code review would be appreciated, so I can release the patch found here.
Comment 12 Will Thompson 2011-06-10 07:51:02 UTC
Review of attachment 47660 [details] [review]:

This looks reasonable. I imagine the number of platforms where we don't have atomic operations are … minimal?

::: dbus/dbus-connection.c
@@ +2114,3 @@
+  _dbus_atomic_inc (&connection->refcount);
+
+  tmp_refcount = _dbus_atomic_dec (&connection->refcount);

As discussed on IRC: this is a bit weird, but it's fair enough given that we don't already have a _dbus_atomic_get().
Comment 13 Simon McVittie 2011-06-10 10:09:10 UTC
(In reply to comment #12)
> ::: dbus/dbus-connection.c
> @@ +2114,3 @@
> +  _dbus_atomic_inc (&connection->refcount);
> +
> +  tmp_refcount = _dbus_atomic_dec (&connection->refcount);
> 
> As discussed on IRC: this is a bit weird, but it's fair enough given that we
> don't already have a _dbus_atomic_get().

I added a comment.

Fixed in git for 1.4.12, will be merged to master before 1.5.4.
Comment 14 Will Manley 2011-06-14 10:04:24 UTC
(In reply to comment #11)
> By "both patches" you mean Attachment #47660 [details] (only) applied to dbus, and a
> second patch applied to your application, right?

Exactly

> If you can't construct a minimal test case, but you have a non-minimal test
> case that can be made public, that'd be better than nothing!

I'm afraid that I can't (yet) release my non-minimal test case.
Comment 15 Simon McVittie 2011-06-23 05:57:48 UTC
Reopening, because DBusPendingCall and DBusServer have the same bug.

Also, various places make assertions about the value of an atomic variable without doing the atomic-access dance, which could lead to errors.
Comment 16 Simon McVittie 2011-06-23 06:46:20 UTC
Created attachment 48335 [details] [review]
[PATCH 1/8] DBusPendingCall: exclusively use atomic operations on  the refcount

Same reasoning as for fd.o #38005, commit 50732523a7.
Comment 17 Simon McVittie 2011-06-23 06:46:40 UTC
Created attachment 48336 [details] [review]
[PATCH 2/8] DBusServer: exclusively use atomic operations on the  refcount

Same reasoning as for fd.o #38005, commit 50732523a7.
Comment 18 Simon McVittie 2011-06-23 06:46:54 UTC
Created attachment 48337 [details] [review]
[PATCH 1/9] DBusMessageFilter: exclusively use atomic accesses to  refcount
Comment 19 Simon McVittie 2011-06-23 06:47:11 UTC
Created attachment 48338 [details] [review]
[PATCH 4/8] DBusConnection: use atomic accesses to refcount, even  for assertions
Comment 20 Simon McVittie 2011-06-23 06:47:42 UTC
Created attachment 48339 [details] [review]
[PATCH 5/8] dbus-memory: use atomic accesses to block count, even  for assertions
Comment 21 Simon McVittie 2011-06-23 06:48:04 UTC
Created attachment 48340 [details] [review]
[PATCH 6/8] DBusMessage: always access refcount atomically
Comment 22 Simon McVittie 2011-06-23 06:48:28 UTC
Created attachment 48341 [details] [review]
[PATCH 2/9] dbus_message_unref: make an assertion more strict

We've just decremented the refcount, so it should have been at least 1
before we did that.
Comment 23 Simon McVittie 2011-06-23 06:48:45 UTC
Created attachment 48342 [details] [review]
[PATCH 3/9] DBusObjectTree: always access refcount atomically
Comment 24 Simon McVittie 2011-06-23 06:53:08 UTC
Patches 1 and 2 make sense for dbus-1.4, IMO. I'm not sure whether the rest should be fixed in 1.4, or only in master. Reviewers?
Comment 25 Will Thompson 2011-07-06 04:04:56 UTC
Review of attachment 48335 [details] [review]:

looks fine!
Comment 26 Will Thompson 2011-07-06 04:22:59 UTC
Review of attachment 48336 [details] [review]:

This looks fine, too.
Comment 27 Will Thompson 2011-07-06 04:40:01 UTC
Review of attachment 48338 [details] [review]:

::: dbus/dbus-connection.c
@@ +2663,3 @@
 {
   DBusList *link;
+  dbus_int32_t old_refcount;

You're gonna get warnings about unused variables if DBUS_DISABLE_ASSERT is defined.
Comment 28 Will Thompson 2011-07-06 04:42:43 UTC
The many uses of _dbus_atomic_inc (&x); followed by _dbus_atomic_dec (&x); to get the current value atomically suggest adding _dbus_atomic_get (&x) which is just implemented in terms of those two. (Though I think in real life you gave a reason why not?)
Comment 29 Simon McVittie 2011-07-11 00:53:31 UTC
(In reply to comment #28)
> The many uses of _dbus_atomic_inc (&x); followed by _dbus_atomic_dec (&x); to
> get the current value atomically suggest adding _dbus_atomic_get (&x) which is
> just implemented in terms of those two. (Though I think in real life you gave a
> reason why not?)

Windows has InterlockedIncrement and InterlockedDecrement but not InterlockedGet. I could add a _dbus_atomic_get which is just implemented in terms of those the functions we already have, though...
Comment 30 Simon McVittie 2011-07-14 08:57:40 UTC
(In reply to comment #25)
> Review of [DBusPendingCall]
> 
> looks fine!

(In reply to comment #26)
> Review of [DBusServer]
> 
> This looks fine, too.

Fixed in git for 1.4.14, 1.5.6.

DBusMessageFilter wouldn't benefit from having _dbus_atomic_get and (afaics) never has an unused variable, so is that OK to merge? Likewise "dbus_message_unref: make an assertion more strict" and DBusObjectTree.

(In reply to comment #27)
> Review of [DBusConnection]
[...]
> You're gonna get warnings about unused variables if DBUS_DISABLE_ASSERT is
> defined.

True, and that would go away if I added _dbus_atomic_get. I'll fix that. Likewise DBusMessage.

dbus-memory could be improved by _dbus_atomic_get. It'll also have some "unused" warnings, which I'll fix.
Comment 31 Simon McVittie 2011-07-14 10:51:38 UTC
Created attachment 49096 [details] [review]
[PATCH 4/9] Add _dbus_atomic_get implemented in terms of inc, dec

As wjt suggested.
Comment 32 Simon McVittie 2011-07-14 10:51:57 UTC
Created attachment 49097 [details] [review]
[PATCH 5/9] _dbus_connection_close_if_only_one_ref: use  _dbus_atomic_get

New.
Comment 33 Simon McVittie 2011-07-14 10:52:36 UTC
Created attachment 49098 [details] [review]
[PATCH 6/9] dbus_message_ref: avoid unused variable if not asserting

New, avoids pre-existing unused variable.
Comment 34 Simon McVittie 2011-07-14 10:53:06 UTC
Created attachment 49100 [details] [review]
[PATCH 7/9] DBusConnection: use atomic accesses to refcount in  assertions/initial ref

Replaces Attachment #48338 [details].
Comment 35 Simon McVittie 2011-07-14 10:53:39 UTC
Created attachment 49102 [details] [review]
[PATCH 8/9] dbus-memory: use atomic accesses to block count, even  for assertions

Replaces Attachment #48339 [details].
Comment 36 Simon McVittie 2011-07-14 10:54:04 UTC
Created attachment 49103 [details] [review]
[PATCH 9/9] DBusMessage: always access refcount atomically, even for  assertions/initial ref

Replaces Attachment #48340 [details].
Comment 37 Cosimo Alfarano 2011-07-20 08:20:56 UTC
Review of attachment 49096 [details] [review]:

Should the atomic_get be atomic for _inc()+_dec() too?

Otherwise looks fine.
Comment 38 Cosimo Alfarano 2011-07-20 08:21:21 UTC
Review of attachment 49097 [details] [review]:

Looks fine
Comment 39 Cosimo Alfarano 2011-07-20 08:21:48 UTC
Review of attachment 49098 [details] [review]:

Looks fine.

::: dbus/dbus-message.c
@@ +1533,2 @@
   old_refcount = _dbus_atomic_inc (&message->refcount);
   _dbus_assert (old_refcount >= 1);

Using 

_dbus_assert (_dbus_atomic_get (&message->refcount));

would avoid a temporary variable and probably easier to read without caring of DBUS_DISABLE_ASSERT; not probably worth it, though, since it would call _inc()+_dec() in order to call _inc() again :)
Comment 40 Cosimo Alfarano 2011-07-20 08:22:01 UTC
Review of attachment 49100 [details] [review]:

Looks fine
Comment 41 Cosimo Alfarano 2011-07-20 08:26:48 UTC
(In reply to comment #37)
> Should the atomic_get be atomic for _inc()+_dec() too?

Answering myself: does't really matter since you return the _inc() value and eventually _dec() it :p
Comment 42 Simon McVittie 2011-07-20 08:49:42 UTC
(In reply to comment #37)
> Review of attachment 49096 [details] [review]:
> 
> Should the atomic_get be atomic for _inc()+_dec() too?

I'm not sure what you mean by that question? This is about the best we can do, given the primitives used by the Windows implementation.

The other possibility is to implement "get" fully. On gcc this would use __sync_fetch_and_add (&x, 0) or something, the implementation on Unix using locks is obvious, and on Windows, the equivalent in GLib is:

gint
(g_atomic_int_get) (volatile gint *atomic)
{
  MemoryBarrier ();
  return *atomic;
}

which apparently interoperates with InterlockedIncrement() and InterlockedDecrement().

I could commit an untested Windows implementation that uses MemoryBarrier() in the same way as GLib, if Ralf would be OK with that?
Comment 43 Simon McVittie 2011-07-20 08:51:30 UTC
(In reply to comment #39)
> Using 
> 
> _dbus_assert (_dbus_atomic_get (&message->refcount));
> 
> would avoid a temporary variable and probably easier to read without caring of
> DBUS_DISABLE_ASSERT; not probably worth it, though, since it would call
> _inc()+_dec() in order to call _inc() again :)

Yeah, that'd be three memory barriers for one refcount increase, which seems like a bad move.
Comment 44 Simon McVittie 2011-07-20 11:23:50 UTC
(In reply to comment #41)
> Answering myself: does't really matter since you return the _inc() value and
> eventually _dec() it :p

_dbus_atomic_inc and _dec confusingly have "++x" and "--x" semantics - they return the old value - even though the underlying functions on Windows have "x++" and "x--" semantics, and the underlying primitives on gcc exist in both versions (but for some reason we use the "x++"/"x--" versions and adjust afterwards). So _dbus_atomic_get() returns the value before/after the call, not the temporarily-incremented value.

I've been careful to use variable names like "old_value" to make this a bit more obvious. The functions should arguably be called _dbus_atomic_fetch_and_inc() etc. to make this more visible, but they're not.
Comment 45 Simon McVittie 2011-07-20 11:25:13 UTC
Created attachment 49349 [details] [review]
[Extra patch] Implement _dbus_atomic_get directly, rather than via inc +  dec

The Windows implementation is untested, but does at least (cross-)compile,
and matches what GLib does.
Comment 46 Cosimo Alfarano 2011-07-25 09:48:19 UTC
Review of attachment 48337 [details] [review]:

Seems fine.
Comment 47 Cosimo Alfarano 2011-07-25 09:52:04 UTC
Review of attachment 48341 [details] [review]:

Looks fine.

Not to be fixed but to be aware of: there is a mixed used of "old_ref > 0" and "old_ref >= 1" in various modules, which I find a bit confusing and probably led to the non-seen loose assertion.
Comment 48 Cosimo Alfarano 2011-07-25 09:52:11 UTC
Review of attachment 48342 [details] [review]:

Looks fine
Comment 49 Simon McVittie 2011-07-26 06:05:01 UTC
Patches mostly merged, based on review from Cosimo + no veto from reviewers.

Attachment #49102 [details], Attachment #49103 [details] and Attachment #49349 [details] remain to be reviewed. For 49349 I'd prefer to have some sort of "no objection" from Ralf or some other Windows developer, since I haven't tested the Windows code path (I'm mainly just trusting desrt, who implemented the corresponding atomic op in GLib).
Comment 50 Cosimo Alfarano 2011-07-26 08:27:31 UTC
Review of attachment 49102 [details] [review]:

seems fine
Comment 51 Cosimo Alfarano 2011-07-26 08:27:35 UTC
Review of attachment 49103 [details] [review]:

This too seems fine.

member for message have been already set by dbus_new0() now but I it's probably better to keep that portion of code untouched.
Comment 52 Simon McVittie 2011-07-28 09:17:45 UTC
Created attachment 49675 [details] [review]
Implement _dbus_atomic_get directly, rather than via inc + dec (v2)

> Review of attachment 49102 [details] [review]:
> 
> seems fine

Applied

> Review of attachment 49103 [details] [review]:
> 
> This too seems fine.

Applied

Lennart reviewed the extra patch on IRC and didn't like it. Lennart, Thiago, what do you think of this version?
Comment 53 Lennart Poettering 2011-07-28 11:21:01 UTC
(In reply to comment #52)

> Lennart reviewed the extra patch on IRC and didn't like it. Lennart, Thiago,
> what do you think of this version?

Looks good. please commit.
Comment 54 Lennart Poettering 2011-07-28 11:23:01 UTC
btw, the atomic stuff in dbus_message_new() is not really necessary, as at that point we know for sure nobody else accesses the message, and the first time someone else could access it things will necessarily be fenced of by the code that passed the reference to somebody else. So, you could remove the fencing in dbus_message_new() to micro-optimize things a bit. Doesn't really matter much though.
Comment 55 Simon McVittie 2011-08-05 06:03:34 UTC
(In reply to comment #53)
> Looks good. please commit.

In 1.4.14. Bug #39836 is a regression test, which wjt has just approved too.

(In reply to comment #54)
> btw, the atomic stuff in dbus_message_new() is not really necessary, as at that
> point we know for sure nobody else accesses the message, and the first time
> someone else could access it things will necessarily be fenced of by the code
> that passed the reference to somebody else.

In principle you're right, but I don't really want to trust this, because quite a lot of things in libdbus transfer refs implicitly ("steal"/"take" semantics).

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.