Bug 30169

Summary:	Documentation lies about return values meaning "no memory"
Product:	dbus	Reporter:	E M <emezeske>
Component:	core	Assignee:	Rob Taylor <rob.taylor>
Status:	RESOLVED NOTABUG	QA Contact:
Severity:	normal
Priority:	medium
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:

Description E M 2010-09-13 12:18:44 UTC

The documentation for the dbus library blatantly lies about what the return value means for at least one function (and probably more).  I ran into this with the function dbus_connection_send_with_reply().  The documentation for that function states that it returns "FALSE if no memory, TRUE otherwise."  This is completely wrong.  The very first thing the function does is use _dbus_return_val_if_fail() to validate its arguments; thus, FALSE is returned if, for instance, timeout_milliseconds is invalid (which is a far cry from "no memory").

This error has real consequences.  I'm using dbus-glib for an app I'm writing, and apparently dbus-glib was written to the dbus documentation: when FALSE is returned, dbus-glib takes it to mean "no memory" and aborts as such.  It is pretty nasty when your important daemon process aborts with an error message of "Out of memory" when it should have easily been able to recover from a simple error.  The poor documentation resulted in dbus-glib being crappy.

Comment 1 Havoc Pennington 2010-09-13 14:05:18 UTC

_dbus_return_val_if_fail() is intended as a debugging aid, it is not there to save you from bugs in your app.

Once a _dbus_return_val_if_fail() assertion fails, behavior is undefined and unpredictable. FALSE always means out of memory as long as you use the dbus API correctly. You need to fix the program not to trigger the return_val_if_fail.  Once the app is fixed, FALSE will always mean out of memory as documented.

Returning TRUE but having the function not work would not exactly be more helpful, in most cases, frequently triggering a cascade of other problems.

If you set DBUS_FATAL_WARNINGS=1 in the environment, then your app will just exit on bug detection, rather than trying to continue unpredictably. This is recommended, really.

Comment 2 Havoc Pennington 2010-09-13 14:09:24 UTC

If you don't find the _dbus_return_val_if_fail helpful, btw, you can build with --disable-checks and then dbus will just puke somehow unpredictably and not print any helpful error messages. But it won't return FALSE!

--disable-checks is possible because _dbus_return_val_if_fail() has nothing to do with the API contract or the docs. It is a debugging aid.

Comment 3 E M 2010-09-13 14:49:38 UTC

> Returning TRUE but having the function not work would not exactly be more
> helpful, in most cases, frequently triggering a cascade of other problems.

> If you don't find the _dbus_return_val_if_fail helpful, btw, you can build with
> --disable-checks and then dbus will just puke somehow unpredictably and not
> print any helpful error messages. But it won't return FALSE!

To be clear, I am in no way requesting a way to force the function to return TRUE if it fails.  I am requesting that either:

(a) the documentation is fixed to describe the _actual_ circumstances under which FALSE might be returned, or

(b) the code be fixed so that OOM conditions are really the only possible causes of failure.

Maybe I shouldn't have used the _dbus_return_val_if_fail() calls as an example of what might cause dbus_connection_send_with_reply() to return FALSE without the system being out of memory.  They just seemed like a likely culprit.

The thing is, I have seen this exact problem happen, when my app uses dbus-glib.  Maybe my mistake was trusting that dbus-glib calls dbus_connection_send_with_reply() with correct arguments.  The funny thing is, though, that my app works perfectly for days on end, and only rarely evinces this "out of memory" behavior.  I'm logging the memory use, and it doesn't even approach exhaustion.  My Linux kernel is configured to overcommit, so malloc() wouldn't fail anyways -- the OOM killer would be invoked.

If you're extremely confident that dbus_connection_send_with_reply() only returns FALSE when it receives invalid arguments or runs out of memory, then that would indicate that dbus-glib occasionally calls it with bad arguments.  I will start looking there.

As an aside, may I ask why _dbus_return_val_if_fail() is used at all, as opposed to a regular assert()?  You say that it has nothing to do with the API contracts, but it would seem that by default (when DBUS_FATAL_WARNINGS=0) it does in fact change the API contract.  The only time it doesn't violate the API contract is apparently when the optional DBUS_FATAL_WARNINGS=1 flag is set.

Comment 4 Havoc Pennington 2010-09-14 08:28:07 UTC

The default is not DBUS_FATAL_WARNINGS=0 ... the default is to abort() when these return_if_fail type checks fail, so it's the same as an assert. _dbus_assert() is used for bugs in libdbus itself, as opposed to apps.

There are two kinds of warning inside libdbus, one is "application bug" (aborts by default) and the other is "problem occurred, but not an app bug" (does not abort by default). DBUS_FATAL_WARNINGS=0 makes neither one abort, DBUS_FATAL_WARNINGS=1 makes both abort, and the default is that only app bugs abort.

I believe Ubuntu has a patch to make neither abort by default, which is stupid, but if you're seeing that behavior it may be due to Ubuntu (or some other patched version).

The docs should mention all the ways args can be invalid probably, but that's a big task to audit all docs for this. I would not want to document that FALSE (or TRUE) would be returned on invalid args, because it just is not guaranteed (you can build with --disable-checks, and the default is to abort); and results in an unusable API anyway (because you can't tell if your app is buggy or you ran out of memory). Correct apps will treat FALSE as meaning out of memory and then fix any bugs that result in warnings.

Anyhow as libdbus is shipped, if not patched, it does not return FALSE except for OOM, since it aborts on return_if_fail.

There certainly could be bugs where we return FALSE incorrectly, you would just have to debug the particular problem you are seeing, or at least provide a test case. If you have a warning, then FALSE is probably due to the warning. If you don't see a warning printed, then either you are actually OOM or there is some sort of bug.

Comment 5 E M 2010-09-14 10:57:41 UTC

> I believe Ubuntu has a patch to make neither abort by default, which is stupid,
but if you're seeing that behavior it may be due to Ubuntu (or some other
patched version).

Ugg...

Thank you for the detailed explanation.  I can see now that it is definitely return_if_fail that is causing FALSE to be returned.  Maybe I should complain downstream about the non aborting behavior!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.