Description
Will Manley
2011-06-06 12:48:19 UTC
Hold on, I'm being an idiot. There was an error in my minimal test case. It should have read sizeof(t)/sizeof(*t) rather than sizeof(t) in the loop. I will reconstruct the test to show my original problem. Providing a test case would be great - That said, this is a mess. Of the things that manipulate the refcount: * ref_unlocked prefers atomic ops, but will do plain int ops if atomic ops are implemented in terms of a global lock; its justification for doing that is that the connection lock protects the refcount, but that isn't actually true * unref_unlocked is the same as ref_unlocked * close_if_only_one_ref relies on the connection lock regardless of whether we have atomic int ops, and even has a comment that says "If we didn't atomically check the refcount and close with the lock held though, we could screw this up" - guess what? :-P * The public ref, unref always use atomic ops even if they're implemented in terms of a global lock, because: - the timeout and watch functions are called with the connection lock held - we allow reffing/unreffing connections from their own timeout/watch callbacks - we don't guarantee that locks are recursive close_if_only_one_ref has really annoying semantics, particularly if you're trying to use lockless atomic ops. It might be better if we consistently used the connection lock to protect the refcount, and just made sure our locks were always recursive (but see Bug #36204, Bug #36205). > It might be better if we consistently used the connection lock to protect the
> refcount, and just made sure our locks were always recursive (but see Bug
> #36204, Bug #36205).
It needs to be the same method everywhere. Using atomic in some places and locked non-atomic in others introduces problems -- the most problematic one is that the locked decrease starts at 2, thinks it ended at 1, but is actually at 0.
Created attachment 47660 [details] [review] Consistently use atomic operations for all access to DBusConnection refcount Trying to mix atomic operations with locked non-atomic operations is broken: the atomic ops aren't necessarily atomic with respect to the locked non-atomic ops, and the non-atomic ops aren't protected by the lock because the atomic ops can change the refcount behind their back. In theory we could use the connection lock if atomic ops aren't supported (making a per-connection lock cheaper than the global lock used to implement atomic ops) *and* our mutexes are recursive (making it safe against deadlocks)... but life's too short. Will, I'd be interested to know whether that patch fixes your usage. If you can construct a minimal test case, would you mind contributing it to libdbus so we can use it as a regression test? Well, it certainly stops helgrind from moaning. Trying to reproduce. Unfortunately the error only exhibited in 0.1% of test runs. Thanks Hmm, the patch doesn't seem to have had a statistically significant effect. Previously my test failed 11/818 (1.3%) times and with the patch applied it has failed 120/13132 (0.9%). This seems odd to me as yesterday I made a hacky attempt at patching the non-atomic stuff out yesterday and it seemed to fix it. Maybe testing my attempt wasn't statistically significant. I will attempt further testing and try to reproduce it more reliably with a smaller test case. (In reply to comment #7) > Hmm, the patch doesn't seem to have had a statistically significant effect. > Previously my test failed 11/818 (1.3%) times and with the patch applied it has > failed 120/13132 (0.9%). Could you share your test case, please? If it's using libdbus correctly then it really ought to succeed every time (and we should fix libdbus until it does), and if not, all bets are off. Unfortunately my test case is not written against libdbus, but is abstracted from it. I will try and pare it down until I have a minimal test case against libdbus that runs quickly and fails reliably (In reply to comment #5) > Will, I'd be interested to know whether that patch fixes your usage. On closer inspection the decrease from 1.3% to 0.8% probably was significant. Hiding behind the bug caused by inconsistent application of atomics was a bug in my application code which I have now fixed. With both patches applied I find no failures, with only one of the two patches applied I get some failures. So, yes the patch fixes my usage. Thank you. Minimal test case still pending... (In reply to comment #10) > Hiding behind the bug caused by inconsistent application of atomics was a bug > in my application code which I have now fixed. With both patches applied I > find no failures, with only one of the two patches applied I get some failures. By "both patches" you mean Attachment #47660 [details] (only) applied to dbus, and a second patch applied to your application, right? If you can't construct a minimal test case, but you have a non-minimal test case that can be made public, that'd be better than nothing! Review cabal: code review would be appreciated, so I can release the patch found here. Review of attachment 47660 [details] [review]: This looks reasonable. I imagine the number of platforms where we don't have atomic operations are … minimal? ::: dbus/dbus-connection.c @@ +2114,3 @@ + _dbus_atomic_inc (&connection->refcount); + + tmp_refcount = _dbus_atomic_dec (&connection->refcount); As discussed on IRC: this is a bit weird, but it's fair enough given that we don't already have a _dbus_atomic_get(). (In reply to comment #12) > ::: dbus/dbus-connection.c > @@ +2114,3 @@ > + _dbus_atomic_inc (&connection->refcount); > + > + tmp_refcount = _dbus_atomic_dec (&connection->refcount); > > As discussed on IRC: this is a bit weird, but it's fair enough given that we > don't already have a _dbus_atomic_get(). I added a comment. Fixed in git for 1.4.12, will be merged to master before 1.5.4. (In reply to comment #11) > By "both patches" you mean Attachment #47660 [details] (only) applied to dbus, and a > second patch applied to your application, right? Exactly > If you can't construct a minimal test case, but you have a non-minimal test > case that can be made public, that'd be better than nothing! I'm afraid that I can't (yet) release my non-minimal test case. Reopening, because DBusPendingCall and DBusServer have the same bug. Also, various places make assertions about the value of an atomic variable without doing the atomic-access dance, which could lead to errors. Created attachment 48335 [details] [review] [PATCH 1/8] DBusPendingCall: exclusively use atomic operations on the refcount Same reasoning as for fd.o #38005, commit 50732523a7. Created attachment 48336 [details] [review] [PATCH 2/8] DBusServer: exclusively use atomic operations on the refcount Same reasoning as for fd.o #38005, commit 50732523a7. Created attachment 48337 [details] [review] [PATCH 1/9] DBusMessageFilter: exclusively use atomic accesses to refcount Created attachment 48338 [details] [review] [PATCH 4/8] DBusConnection: use atomic accesses to refcount, even for assertions Created attachment 48339 [details] [review] [PATCH 5/8] dbus-memory: use atomic accesses to block count, even for assertions Created attachment 48340 [details] [review] [PATCH 6/8] DBusMessage: always access refcount atomically Created attachment 48341 [details] [review] [PATCH 2/9] dbus_message_unref: make an assertion more strict We've just decremented the refcount, so it should have been at least 1 before we did that. Created attachment 48342 [details] [review] [PATCH 3/9] DBusObjectTree: always access refcount atomically Patches 1 and 2 make sense for dbus-1.4, IMO. I'm not sure whether the rest should be fixed in 1.4, or only in master. Reviewers? Review of attachment 48335 [details] [review]: looks fine! Review of attachment 48336 [details] [review]: This looks fine, too. Review of attachment 48338 [details] [review]: ::: dbus/dbus-connection.c @@ +2663,3 @@ { DBusList *link; + dbus_int32_t old_refcount; You're gonna get warnings about unused variables if DBUS_DISABLE_ASSERT is defined. The many uses of _dbus_atomic_inc (&x); followed by _dbus_atomic_dec (&x); to get the current value atomically suggest adding _dbus_atomic_get (&x) which is just implemented in terms of those two. (Though I think in real life you gave a reason why not?) (In reply to comment #28) > The many uses of _dbus_atomic_inc (&x); followed by _dbus_atomic_dec (&x); to > get the current value atomically suggest adding _dbus_atomic_get (&x) which is > just implemented in terms of those two. (Though I think in real life you gave a > reason why not?) Windows has InterlockedIncrement and InterlockedDecrement but not InterlockedGet. I could add a _dbus_atomic_get which is just implemented in terms of those the functions we already have, though... (In reply to comment #25) > Review of [DBusPendingCall] > > looks fine! (In reply to comment #26) > Review of [DBusServer] > > This looks fine, too. Fixed in git for 1.4.14, 1.5.6. DBusMessageFilter wouldn't benefit from having _dbus_atomic_get and (afaics) never has an unused variable, so is that OK to merge? Likewise "dbus_message_unref: make an assertion more strict" and DBusObjectTree. (In reply to comment #27) > Review of [DBusConnection] [...] > You're gonna get warnings about unused variables if DBUS_DISABLE_ASSERT is > defined. True, and that would go away if I added _dbus_atomic_get. I'll fix that. Likewise DBusMessage. dbus-memory could be improved by _dbus_atomic_get. It'll also have some "unused" warnings, which I'll fix. Created attachment 49096 [details] [review] [PATCH 4/9] Add _dbus_atomic_get implemented in terms of inc, dec As wjt suggested. Created attachment 49097 [details] [review] [PATCH 5/9] _dbus_connection_close_if_only_one_ref: use _dbus_atomic_get New. Created attachment 49098 [details] [review] [PATCH 6/9] dbus_message_ref: avoid unused variable if not asserting New, avoids pre-existing unused variable. Created attachment 49100 [details] [review] [PATCH 7/9] DBusConnection: use atomic accesses to refcount in assertions/initial ref Replaces Attachment #48338 [details]. Created attachment 49102 [details] [review] [PATCH 8/9] dbus-memory: use atomic accesses to block count, even for assertions Replaces Attachment #48339 [details]. Created attachment 49103 [details] [review] [PATCH 9/9] DBusMessage: always access refcount atomically, even for assertions/initial ref Replaces Attachment #48340 [details]. Review of attachment 49096 [details] [review]: Should the atomic_get be atomic for _inc()+_dec() too? Otherwise looks fine. Review of attachment 49097 [details] [review]: Looks fine Review of attachment 49098 [details] [review]: Looks fine. ::: dbus/dbus-message.c @@ +1533,2 @@ old_refcount = _dbus_atomic_inc (&message->refcount); _dbus_assert (old_refcount >= 1); Using _dbus_assert (_dbus_atomic_get (&message->refcount)); would avoid a temporary variable and probably easier to read without caring of DBUS_DISABLE_ASSERT; not probably worth it, though, since it would call _inc()+_dec() in order to call _inc() again :) Review of attachment 49100 [details] [review]: Looks fine (In reply to comment #37) > Should the atomic_get be atomic for _inc()+_dec() too? Answering myself: does't really matter since you return the _inc() value and eventually _dec() it :p (In reply to comment #37) > Review of attachment 49096 [details] [review]: > > Should the atomic_get be atomic for _inc()+_dec() too? I'm not sure what you mean by that question? This is about the best we can do, given the primitives used by the Windows implementation. The other possibility is to implement "get" fully. On gcc this would use __sync_fetch_and_add (&x, 0) or something, the implementation on Unix using locks is obvious, and on Windows, the equivalent in GLib is: gint (g_atomic_int_get) (volatile gint *atomic) { MemoryBarrier (); return *atomic; } which apparently interoperates with InterlockedIncrement() and InterlockedDecrement(). I could commit an untested Windows implementation that uses MemoryBarrier() in the same way as GLib, if Ralf would be OK with that? (In reply to comment #39) > Using > > _dbus_assert (_dbus_atomic_get (&message->refcount)); > > would avoid a temporary variable and probably easier to read without caring of > DBUS_DISABLE_ASSERT; not probably worth it, though, since it would call > _inc()+_dec() in order to call _inc() again :) Yeah, that'd be three memory barriers for one refcount increase, which seems like a bad move. (In reply to comment #41) > Answering myself: does't really matter since you return the _inc() value and > eventually _dec() it :p _dbus_atomic_inc and _dec confusingly have "++x" and "--x" semantics - they return the old value - even though the underlying functions on Windows have "x++" and "x--" semantics, and the underlying primitives on gcc exist in both versions (but for some reason we use the "x++"/"x--" versions and adjust afterwards). So _dbus_atomic_get() returns the value before/after the call, not the temporarily-incremented value. I've been careful to use variable names like "old_value" to make this a bit more obvious. The functions should arguably be called _dbus_atomic_fetch_and_inc() etc. to make this more visible, but they're not. Created attachment 49349 [details] [review] [Extra patch] Implement _dbus_atomic_get directly, rather than via inc + dec The Windows implementation is untested, but does at least (cross-)compile, and matches what GLib does. Review of attachment 48337 [details] [review]: Seems fine. Review of attachment 48341 [details] [review]: Looks fine. Not to be fixed but to be aware of: there is a mixed used of "old_ref > 0" and "old_ref >= 1" in various modules, which I find a bit confusing and probably led to the non-seen loose assertion. Review of attachment 48342 [details] [review]: Looks fine Patches mostly merged, based on review from Cosimo + no veto from reviewers. Attachment #49102 [details], Attachment #49103 [details] and Attachment #49349 [details] remain to be reviewed. For 49349 I'd prefer to have some sort of "no objection" from Ralf or some other Windows developer, since I haven't tested the Windows code path (I'm mainly just trusting desrt, who implemented the corresponding atomic op in GLib). Review of attachment 49102 [details] [review]: seems fine Review of attachment 49103 [details] [review]: This too seems fine. member for message have been already set by dbus_new0() now but I it's probably better to keep that portion of code untouched. Created attachment 49675 [details] [review] Implement _dbus_atomic_get directly, rather than via inc + dec (v2) > Review of attachment 49102 [details] [review]: > > seems fine Applied > Review of attachment 49103 [details] [review]: > > This too seems fine. Applied Lennart reviewed the extra patch on IRC and didn't like it. Lennart, Thiago, what do you think of this version? (In reply to comment #52) > Lennart reviewed the extra patch on IRC and didn't like it. Lennart, Thiago, > what do you think of this version? Looks good. please commit. btw, the atomic stuff in dbus_message_new() is not really necessary, as at that point we know for sure nobody else accesses the message, and the first time someone else could access it things will necessarily be fenced of by the code that passed the reference to somebody else. So, you could remove the fencing in dbus_message_new() to micro-optimize things a bit. Doesn't really matter much though. (In reply to comment #53) > Looks good. please commit. In 1.4.14. Bug #39836 is a regression test, which wjt has just approved too. (In reply to comment #54) > btw, the atomic stuff in dbus_message_new() is not really necessary, as at that > point we know for sure nobody else accesses the message, and the first time > someone else could access it things will necessarily be fenced of by the code > that passed the reference to somebody else. In principle you're right, but I don't really want to trust this, because quite a lot of things in libdbus transfer refs implicitly ("steal"/"take" semantics). |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.