Bug 30171

Summary: dbus-glib can call g_error("Out of memory") on unrelated programming errors
Product: dbus Reporter: E M <emezeske>
Component: GLibAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: John (J5) Palmieri <johnp>
Severity: major    
Priority: high CC: cosimo.alfarano, rob.taylor, smcv, walters, will
Version: unspecifiedKeywords: patch
Hardware: All   
OS: All   
URL: http://git.collabora.co.uk/?p=user/smcv/dbus-glib-smcv.git;a=shortlog;h=refs/heads/alleged-oom-30171
Whiteboard: NB#233873 review+
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 23616, 35766, 35767    
Attachments: Depend on dbus 1.2.16 for dbus_message_iter_abandon_container
When given an object path, use GVariant to check syntactic validity
dbus-gvalue: replace trivial cases of OOM with a call to a noreturn function
marshal_basic: if a boolean has a non-boolean value, diagnose the error
marshal_basic: if marshalling a string fails, critical and return FALSE
marshal_valuearray: abandon the container if we fail
marshal_valuearray: simplify teardown
marshal_proxy: check that it's really a proxy and its path is valid
marshal_object_path: check that it's really a valid object path
marshal_object: check that it's really an object and its path is valid
marshal_signature: check validity of the signature
marshal_map: on error, abandon the container instead of closing it
marshal_struct: simplify teardown, and abandon broken containers
marshal_variant: abandon broken containers rather than closing them
marshal_collection_ptrarray: assert that the signature is valid
marshal_collection_ptrarray: abandon broken containers, simplify exit path
marshal_collection_array: fail early if the array is NULL
marshal_collection_array: assert validity of signature
marshal_collection_array: if appending fails, don't assume OOM
marshal_collection_array: simplify exit path
dbus_g_proxy_marshal_args_to_message: diagnose inability to marshal args
dbus_g_proxy_begin_call_internal: don't assume NULL message means OOM
dbus_g_proxy_call_no_reply: don't assume NULL message means OOM
DBusGProxy: share OOM handler
Don't allow proxy timeouts to be set negative, except for -1
improve documentation of proxy timeouts to mention that -1 is special

Description E M 2010-09-13 13:03:06 UTC
There is a documentation bug in the dbus library that specifies that several
functions return FALSE to mean that an out-of-memory condition occurred, even
though in reality there are many other reasons why FALSE might be returned. 
The dbus-glib library faithfully respects this (erroneous) documentation, and
calls g_error("Out of memory") when certain dbus calls return FALSE.  It's
extremely troublesome to have a process abort with the message "Out of memory"
when really what happened was, e.g., dbus_connection_send_with_reply() was
given an invalid timeout parameter (which causes it to return FALSE).

I have already filed a dbus case to get the documentation fixed
(https://bugs.freedesktop.org/show_bug.cgi?id=30169).  I'm filing this case to
inform you guys of the fact that dbus-glib needs to be fixed in light of the
fact that the dbus documentation on which its implementation is based is
incorrect.

I consider this to be a "blocker" because dbus-glib calls abort() instead of recovering from errors, meaning that client apps end up mysteriously failing as "out of memory" for no good reason.
Comment 1 Colin Walters 2010-09-13 14:48:36 UTC
Yeah, dbus-glib should be checking all the same preconditions that dbus does.
Comment 2 Simon McVittie 2011-03-28 11:12:51 UTC
Downgrading this a bit; it's clearly not a release blocker (dbus-glib is well-known to be a minefield, prefer to use GDBus in new code) but is still quite important to fix.

For some programming errors (notably strings being non-UTF-8) it's problematic or inefficient for dbus-glib to duplicate libdbus' checks, but we can demote the failure mode from g_error to g_critical in these cases, and mention the possible cause other than OOM.

This branch ended up being fairly long due to the complexity of D-Bus marshalling. I've fixed these:

* use of invalid object paths is diagnosed as a programming error
* marshalling arguments to DBusMessage (at a low level) doesn't
  falsely report OOM
* passing bad (D-Bus-level) arguments to DBusGProxy methods is diagnosed
  as a programming error
* negative timeouts other than -1 are diagnosed as a programming error

I haven't yet fixed these:

* If you pass invalid arguments in DBusGObject (service-side) API,
  it might crash with "OOM" rather than diagnosing a programming error
* Syntactically invalid method/signal names, interfaces, error names,
  bus names aren't diagnosed as a programming error and might possibly
  be reported as "OOM"

For those, a policy question for Colin and the rest of the cabal: we depend on a new enough GLib (2.26) that it's guaranteed to have GIO and at least basic D-Bus functionality. Would it be considered OK to link libdbus-glib-1 against libgio-2.0, so it can use g_dbus_is_unique_name() (etc.) in assertion checks?

Otherwise, we need to either add API to libdbus (and wait for it to be widespread), or copypaste in Yet Another implementation of these syntactic checks (and having written one in telepathy-glib and one in dbus-python, I'm getting a bit tired of writing the same code).
Comment 3 Simon McVittie 2011-03-28 11:14:21 UTC
Created attachment 44943 [details] [review]
Depend on dbus 1.2.16 for dbus_message_iter_abandon_container
Comment 4 Simon McVittie 2011-03-28 11:14:50 UTC
Created attachment 44944 [details] [review]
When given an object path, use GVariant to check syntactic validity

We already depend on a new enough GLib that it has these functions.
Comment 5 Simon McVittie 2011-03-28 11:15:09 UTC
Created attachment 44945 [details] [review]
dbus-gvalue: replace trivial cases of OOM with a call to a noreturn function

In these cases there's no invalid value we could be supplying to libdbus,
so it's definitely OOM, not a programming error (e.g. non-UTF-8)
masquerading as OOM.
Comment 6 Simon McVittie 2011-03-28 11:15:24 UTC
Created attachment 44946 [details] [review]
marshal_basic: if a boolean has a non-boolean value, diagnose the error
Comment 7 Simon McVittie 2011-03-28 11:15:49 UTC
Created attachment 44947 [details] [review]
marshal_basic: if marshalling a string fails, critical and return FALSE

This could be OOM, but it could also be non-UTF-8.
Comment 8 Simon McVittie 2011-03-28 11:16:05 UTC
Created attachment 44948 [details] [review]
marshal_valuearray: abandon the container if we fail
Comment 9 Simon McVittie 2011-03-28 11:16:28 UTC
Created attachment 44949 [details] [review]
marshal_valuearray: simplify teardown

If close_container fails, it might be an assertion failure in libdbus,
so don't assume it's OOM. We have to handle programming errors somewhat
gracefully during marshalling anyway, so there's no point in asserting.
Comment 10 Simon McVittie 2011-03-28 11:16:47 UTC
Created attachment 44950 [details] [review]
marshal_proxy: check that it's really a proxy and its path is valid

This means that dbus_message_iter_append_basic can't fail other than by
OOM, so use the shared OOM handler.
Comment 11 Simon McVittie 2011-03-28 11:17:10 UTC
Created attachment 44951 [details] [review]
marshal_object_path: check that it's really a valid object path

Again, this means failure can only be OOM, so use the shared handler.
Also remove an unnecessary cast.
Comment 12 Simon McVittie 2011-03-28 11:17:31 UTC
Created attachment 44952 [details] [review]
marshal_object: check that it's really an object and its path is valid

Again, we can now assume that failure is OOM.
Comment 13 Simon McVittie 2011-03-28 11:18:00 UTC
Created attachment 44953 [details] [review]
marshal_signature: check validity of the signature

Also remove a spurious cast, and comment why we're *not* assuming that
failure here is necessarily OOM.
Comment 14 Simon McVittie 2011-03-28 11:18:26 UTC
Created attachment 44954 [details] [review]
marshal_map: on error, abandon the container instead of closing it

libdbus considers it to be a programming error if you close a container
gracefully when its contents are incomplete.
Comment 15 Simon McVittie 2011-03-28 11:18:42 UTC
Created attachment 44955 [details] [review]
marshal_struct: simplify teardown, and abandon broken containers
Comment 16 Simon McVittie 2011-03-28 11:19:00 UTC
Created attachment 44956 [details] [review]
marshal_variant: abandon broken containers rather than closing them
Comment 17 Simon McVittie 2011-03-28 11:19:22 UTC
Created attachment 44957 [details] [review]
marshal_collection_ptrarray: assert that the signature is valid

This means that dbus_message_iter_open_container can't fail except by OOM,
so use the shared handler.
Comment 18 Simon McVittie 2011-03-28 11:19:37 UTC
Created attachment 44958 [details] [review]
marshal_collection_ptrarray: abandon broken containers, simplify exit path
Comment 19 Simon McVittie 2011-03-28 11:19:53 UTC
Created attachment 44959 [details] [review]
marshal_collection_array: fail early if the array is NULL
Comment 20 Simon McVittie 2011-03-28 11:20:12 UTC
Created attachment 44960 [details] [review]
marshal_collection_array: assert validity of signature

This means we can recognise OOM reliably; do so.
Comment 21 Simon McVittie 2011-03-28 11:20:35 UTC
Created attachment 44961 [details] [review]
marshal_collection_array: if appending fails, don't assume OOM

Appending can also fail by appending ((gboolean) 23) or something;
admittedly, that's a programming error.
Comment 22 Simon McVittie 2011-03-28 11:20:57 UTC
Created attachment 44962 [details] [review]
marshal_collection_array: simplify exit path

Also don't assume that failure of dbus_message_iter_close_container is
always OOM.
Comment 23 Simon McVittie 2011-03-28 11:21:18 UTC
Created attachment 44963 [details] [review]
dbus_g_proxy_marshal_args_to_message: diagnose inability to marshal args

Failure to marshal arguments is entirely reachable, albeit only via
programming error.
Comment 24 Simon McVittie 2011-03-28 11:21:40 UTC
Created attachment 44964 [details] [review]
dbus_g_proxy_begin_call_internal: don't assume NULL message means OOM

It might either be OOM or bad arguments, probably the latter. That's
programming error, but we already raised a critical warning, so be silent.
Comment 25 Simon McVittie 2011-03-28 11:22:37 UTC
Created attachment 44965 [details] [review]
dbus_g_proxy_call_no_reply: don't assume NULL message means OOM

It might either be OOM or bad arguments, probably the latter. That's
programming error, but we already raised a critical warning, so be silent.
Comment 26 Simon McVittie 2011-03-28 11:22:57 UTC
Created attachment 44966 [details] [review]
DBusGProxy: share OOM handler
Comment 27 Simon McVittie 2011-03-28 11:23:22 UTC
Created attachment 44967 [details] [review]
Don't allow proxy timeouts to be set negative, except for -1

libdbus checks this as a precondition, but to avoid astonishing error
behaviour (interpreted as OOM), so should we.
Comment 28 Simon McVittie 2011-03-28 11:23:40 UTC
Created attachment 44968 [details] [review]
improve documentation of proxy timeouts to mention that -1 is special
Comment 29 Simon McVittie 2011-03-28 11:24:40 UTC
That's all for now...
Comment 30 Will Thompson 2011-03-29 03:44:03 UTC
Review of attachment 44957 [details] [review]:

::: dbus/dbus-gvalue.c
@@ +1909,3 @@
       return FALSE;
     }
+  g_assert (g_variant_is_signature (elt_sig));

Why an assertion and not a critical here?
Comment 31 Will Thompson 2011-03-29 03:47:43 UTC
Apart from a couple of places where you've used assertions rather than criticals (which may be defensible, I just couldn't see what was different in those cases), looks fine.

(In reply to comment #2)
> For those, a policy question for Colin and the rest of the cabal: we depend on
> a new enough GLib (2.26) that it's guaranteed to have GIO and at least basic
> D-Bus functionality. Would it be considered OK to link libdbus-glib-1 against
> libgio-2.0, so it can use g_dbus_is_unique_name() (etc.) in assertion checks?

I think this is fine too. Points for style!
Comment 32 Simon McVittie 2011-03-29 04:30:17 UTC
(In reply to comment #30)
> @@ +1909,3 @@
>        return FALSE;
>      }
> +  g_assert (g_variant_is_signature (elt_sig));
> 
> Why an assertion and not a critical here?

It was returned about 4 lines ago (sadly just outside the range of your diff context) by an internal dbus-glib function. If internal dbus-glib functions are returning non-NULL strings that aren't valid signatures, then we deserve SIGABRT :-)

(And the same for the other use of g_assert.)
Comment 33 Simon McVittie 2011-04-05 10:43:52 UTC
(In reply to comment #31)
> Apart from a couple of places where you've used assertions rather than
> criticals (which may be defensible, I just couldn't see what was different in
> those cases), looks fine.

Are you sufficiently convinced by Comment #32 to let me merge this? This is the first in a stack of branches which fix this bug in general.
Comment 34 Simon McVittie 2011-04-06 09:17:59 UTC
For your convenience, here is some additional context for the added assertions:

Attachment #44957 [details]:

http://git.collabora.co.uk/?p=user/smcv/dbus-glib-smcv.git;a=blob;f=dbus/dbus-gvalue.c;h=6493bc5a434bbea5d0625a4fb898288daaaadf38;hb=8de6c965adcd3a992a93ea09951f09421cd0b5fc#l1900

elt_sig is produced by _dbus_gtype_to_signature, which should never return a non-NULL string that isn't a signature.

Attachment #44960 [details]:

http://git.collabora.co.uk/?p=user/smcv/dbus-glib-smcv.git;a=blob;f=dbus/dbus-gvalue.c;h=ab5580b7b92aec4897bb8e0eb131010215bcc2b0;hb=f384985d4d8aa1e45ea03b8a479fb967fd83d9c9#l1951

Same explanation applies, except the string is called subsignature_str.
Comment 35 Will Thompson 2011-04-07 01:03:51 UTC
Sure, I'm convinced. :)
Comment 36 Simon McVittie 2011-04-13 06:06:08 UTC
Fixed in git for 0.94, thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.