Summary: | dbus-python test suite failures | ||
---|---|---|---|
Product: | dbus | Reporter: | Barry Warsaw <barry> |
Component: | python | Assignee: | Simon McVittie <smcv> |
Status: | RESOLVED NOTOURBUG | QA Contact: | John (J5) Palmieri <johnp> |
Severity: | normal | ||
Priority: | medium | CC: | jason.conti |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Barry Warsaw
2011-11-28 06:20:25 UTC
Looks like the same failures occur on Debian Squeeze. It also doesn't appear to make any difference what I put on the Exec line in TestSuitePythonService.service. Everything, including a simple /bin/bash -c "echo 'hello world'" has the exact same SIGABRT failure. I have no clue how to continue to debug this. (In reply to comment #0) > Using current git head on Ubuntu 11.10 and Python 2.7, none of test-client.py > actually passes when you do a 'make check'. Weirdly, this works fine for me: Debian sid, Python "2.7.2+" (python2.7 2.7.2-8). On Dec 13, 2011, at 12:11 PM, bugzilla-daemon@freedesktop.org wrote: >Weirdly, this works fine for me: Debian sid, Python "2.7.2+" (python2.7 >2.7.2-8). I just tried it in a sid vm. I get some test failures, but nothing like the cascade of start-up problems I see on Ubuntu. :( Okay, I understand better what's happening here. I'm no pygi expert but the switch to pygi caused the code in gobject_service.py to fail due. Also, that code is not compatible with Python 3 due to different metaclass definition syntax. I've rewritten the code in gobject_service.py to use pygi and be compatible across Python versions. I've just committed the changes to my Py3 branch, but you might be able to tease out the differences if you want to apply this separately from that branch: https://github.com/warsaw/dbus-python3/commit/f8dab5af0bef5d26a51df41a564a5285c16a9cb5 Ah. That fix is incomplete. It works when running the tests manually, but something's still wrong with running it from `make check`. Barry, comment out the DBUS_FATAL_WARNINGS=1 line in test/run-test.sh. It seems like the tests complete successfully. I think the aborts on ubuntu may be outside the scope of dbus-python, because I can reproduce it using just the TestSuitePythonService.service (with the python code replaced with echo hello as above), tmp-session-bus.conf, the following GDBus code: #include <gtk/gtk.h> int main() { gtk_init(NULL, NULL); GError *error = NULL; GDBusProxy *proxy = g_dbus_proxy_new_for_bus_sync(G_BUS_TYPE_SESSION, G_DBUS_PROXY_FLAGS_NONE, NULL, "org.freedesktop.DBus.TestSuitePythonService", "/org/freedesktop/DBus/TestSuitePythonObject", "org.freedesktop.DBus.TestSuiteInterface", NULL, &error); if (error != NULL) { g_warning("Failed to get proxy: %s", error->message); g_error_free(error); } return 0; } and the following bash script: #!/bin/bash export DBUS_FATAL_WARNINGS=1 eval $(dbus-launch --sh-syntax --config-file=tmp-session-bus.conf) echo "Started session bus at $DBUS_SESSION_BUS_PID" ./test-dbus echo "Killing session bus at $DBUS_SESSION_BUS_PID" kill $DBUS_SESSION_BUS_PID Commenting out the export DBUS_FATAL_WARNINGS=1 line allows the code in test-dbus to run until it times out, with the line it aborts immediately just like in the tests. Hi Jason. Based on your suggestion of DBUS_FATAL_WARNINGS, and some more testing on Ubuntu 12.04, I'm starting to think this is a dbus-launch related bug. Start by instrumenting run-test.sh to print out $PYTHONPATH, then export that into your shell, and also export DBUS_TOP_BUILDDIR and DBUS_TOP_SRCDIR as required by test-service.py. Now, in that shell do this from $DBUS_TOP_SRCDIR: DBUS_FATAL_WARNINGS=1 dbus-daemon --config-file test/tmp-session-bus.conf --print-address In a different shell, start up Python with DBUS_SESSION_BUS_ADDRESS exported as appropriate, then in Python do this: >>> import dbus >>> sb = dbus.SessionBus() >>> p = sb.get_object('org.freedesktop.DBus.TestSuitePythonService', '/org/freedesktop/DBus/TestSuitePythonObject') you'll get a proxy object which you can poke at, and d-feet shows that your service is all up and running just fine. It does not matter whether DBUS_FATAL_WARNINGS is set for this process. IOW, with the service started with dbus-daemon directly, everything is fine. Now, kill dbus-daemon and start the service instead with dbus-launch: DBUS_FATAL_WARNINGS=1 dbus-launch --config-file test/tmp-session-bus.conf --sh-syntax In the other shell, fire up Python with the new DBUS_SESSION_BUS_ADDRESS and you'll see that as soon as you try to get the proxy object, you'll receive the SIGABRT: >>> p = sb.get_object('org.freedesktop.DBus.TestSuitePythonService', '/org/freedesktop/DBus/TestSuitePythonObject') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "dbus/bus.py", line 244, in get_object follow_name_owner_changes=follow_name_owner_changes) File "dbus/proxies.py", line 246, in __init__ self._named_service = conn.activate_name_owner(bus_name) File "dbus/bus.py", line 183, in activate_name_owner self.start_service_by_name(bus_name) File "dbus/bus.py", line 281, in start_service_by_name 'su', (bus_name, flags))) File "dbus/connection.py", line 630, in call_blocking message, timeout) dbus.exceptions.DBusException: org.freedesktop.DBus.Error.Spawn.ChildSignaled: Process /bin/bash received signal 6 The process described by DBUS_SESSION_BUS_PID is still running, so kill it. Unfortunately, once dbus-launch exits, I don't get any verbose output, so I have no idea why the service is getting aborted. It doesn't appear that DBUS_VERBOSE is getting exported to dbus-launch's child, or maybe it is and verbose output is getting thrown away. As you might guess, starting dbus-launch with DBUS_FATAL_WARNINGS=0 lets everything work. I need to try to coax more information out of dbus-launch to continue. On Dec 21, 2011, at 12:58 PM, bugzilla-daemon@freedesktop.org wrote: >https://bugs.freedesktop.org/show_bug.cgi?id=43303 > >--- Comment #6 from Jason Conti <jason.conti@gmail.com> 2011-12-21 12:58:54 UTC --- >Barry, comment out the DBUS_FATAL_WARNINGS=1 line in test/run-test.sh. Think I found the offending code in dbus/dbus-spawn.c:do_exec 953 #ifdef DBUS_BUILD_TESTS 954 max_open = sysconf (_SC_OPEN_MAX); 955 956 for (i = 3; i < max_open; i++) 957 { 958 int retval; 959 960 if (i == child_err_report_fd) 961 continue; 962 963 retval = fcntl (i, F_GETFD); 964 965 if (retval != -1 && !(retval & FD_CLOEXEC)) 966 _dbus_warn ("Fd %d did not have the close-on-exec flag set!\n", i); 967 } 968 #endif It doesn't seem like this code should get executed since it seems to be disabled in the non-debug package build, but commenting it out and rebuilding the package on precise and I can't reproduce the aborts. It also seems like quite a few processes are leaking file descriptors. Running your first dbus-daemon example in the last comment under gdb will also cause the abort unless I manually set FD_CLOEXEC on fd 3-6 before attempting to launch the service. Likewise, running dbus-daemon with --fork will trigger the aborts. (In reply to comment #8) > It doesn't seem like this code should get executed since it seems to be > disabled in the non-debug package build, but commenting it out and rebuilding > the package on precise and I can't reproduce the aborts. Does rebuilding the same source with your current toolchain also fix it? > It also seems like quite a few processes are leaking file descriptors. Running > your first dbus-daemon example in the last comment under gdb will also cause > the abort unless I manually set FD_CLOEXEC on fd 3-6 before attempting to > launch the service. Those fds probably come from gdb itself, similar to Bug #35173, which was fixed in 1.4.10. On Jan 06, 2012, at 04:49 AM, bugzilla-daemon@freedesktop.org wrote: >It doesn't seem like this code should get executed since it seems to be >disabled in the non-debug package build, but commenting it out and rebuilding >the package on precise and I can't reproduce the aborts. > >It also seems like quite a few processes are leaking file descriptors. Running >your first dbus-daemon example in the last comment under gdb will also cause >the abort unless I manually set FD_CLOEXEC on fd 3-6 before attempting to >launch the service. Likewise, running dbus-daemon with --fork will trigger the >aborts. I'm still confused. I think we've established that the child is SIGABRTing when DBUS_FATAL_WARNINGS=1 because some warning occurs when dbus-launch execs the child. But I'm still at a loss to know exactly *which* warning is occurring as there is no output that I ever see, even with DBUS_VERBOSE=1. Nothing on stdout/stderr and nothing in any log file. So perhaps the former are getting closed before the warnings are occurring? Have you actually seen any warnings during normal operation when reproducing this problem? I'm also trying to understand the difference between Ubuntu and Debian here, as I'm fairly sure Ubuntu doesn't carry any relevant dbus delta from Debian, so perhaps there's some other environmental interaction, that's only shown up since Ubuntu 11.04 (IIRC). (In reply to comment #9) > Does rebuilding the same source with your current toolchain also fix it? Do you mean rebuild dbus in oneiric with the passage above commented out, or just a no-change rebuild in precise without the passage commented to see if it was a problem with the build? I did the latter earlier in the day yesterday (but added nostrip and noopt) and was still getting the aborts. (In reply to comment #10) > I'm still confused. I think we've established that the child is SIGABRTing > when DBUS_FATAL_WARNINGS=1 because some warning occurs when dbus-launch execs > the child. Sorry, I kind of tacked it on as an afterthought at the end, what I meant was that it is not just with dbus-launch. I checked the code and the only major difference I saw was that dbus-launch execs dbus-daemon with --fork, and running: DBUS_FATAL_WARNINGS=1 dbus-daemon --config-file test/tmp-session-bus.conf --print-address --fork and then your python example will abort just the same as the dbus-launch one. I also get the aborts when I run: DBUS_FATAL_WARNINGS=1 gdb --args dbus-daemon --config-file test/tmp-session-bus.conf --print-address Which also prints the following: Activating service name='org.freedesktop.DBus.TestSuitePythonService' Fd 3 did not have the close-on-exec flag set! /bin/dbus-daemon(_dbus_print_backtrace+0x29) [0x8008f9c9] /bin/dbus-daemon(_dbus_abort+0x14) [0x800931a4] /bin/dbus-daemon(+0x75c82) [0x80075c82] /bin/dbus-daemon(_dbus_spawn_async_with_babysitter+0x490) [0x800995e0] /bin/dbus-daemon(bus_activation_activate_service+0x113e) [0x800180ce] /bin/dbus-daemon(+0x30dc0) [0x80030dc0] /bin/dbus-daemon(bus_driver_handle_message+0x6e8) [0x80033038] /bin/dbus-daemon(+0x2dbe9) [0x8002dbe9] /bin/dbus-daemon(dbus_connection_dispatch+0x8c3) [0x8004a7f3] /bin/dbus-daemon(_dbus_loop_dispatch+0x40) [0x80094a60] /bin/dbus-daemon(_dbus_loop_iterate+0x45f) [0x80094eff] /bin/dbus-daemon(_dbus_loop_run+0xa0) [0x80095230] /bin/dbus-daemon(main+0x740) [0x80013970] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0xb7dfa113] /bin/dbus-daemon(+0x13dc9) [0x80013dc9] Activated service 'org.freedesktop.DBus.TestSuitePythonService' failed: Process /bin/bash received signal 6 I initially thought this was just gdb making debugging difficult by leaking the fd as Scott mentioned, and it is difficult to tell since I haven't been able to get any output in the dbus-launch or dbus-daemon --fork cases. I was curious if they were related though, so I commented out the above passage in dbus-spawn.c, rebuilt and wasn't able to reproduce the aborts with gdb, dbus-launch or dbus-daemon --fork, which is why I figured they were connected. As it happens, the build with that code commented also allows make check to run without issues even with DBUS_FATAL_WARNINGS=1 in the dbus-python code. Decided to build a debian sid system for testing, and sure enough no aborts as Simon mentions above (sorry for calling you Scott in the last comment). But looking at the dbus-daemon binary, it is half the size on debian as in ubuntu. So I installed dbus-1-dbg on debian, and sure enough, using the debug binary from that package (a binary which is not installed on ubuntu), and I can get the aborts in debian as well. It turns out in ubuntu we are (hopefully unintentionally) installing the debug binary of dbus-daemon instead of the production version. This can be seen in, for instance, the build log of the i386 version in precise: https://launchpadlibrarian.net/88969187/buildlog_ubuntu-precise-i386.dbus_1.4.16-1ubuntu2_BUILDING.txt.gz It installs the binary from build/bus to debian/tmp//bin and then proceeds a bit later to overwrite it with the binary from build-debug/bus. The ubuntu package adds --exec-prefix=/ to common_configure_flags in debian/rules so that the binary ends up in /bin and somehow this is having the side effect of causing the debug binary to overwrite it. I move the flag to the flags for the production build and added back the debug binaries to debian/dbus-1-dbg.install and everything seems to be working again. Hi Jason, thanks for looking into this in such great detail. On Jan 07, 2012, at 07:46 PM, bugzilla-daemon@freedesktop.org wrote: >It turns out in ubuntu we are (hopefully unintentionally) installing the >debug binary of dbus-daemon instead of the production version. This can be >seen in, for instance, the build log of the i386 version in precise: >https://launchpadlibrarian.net/88969187/buildlog_ubuntu-precise-i386.dbus_1.4.16-1ubuntu2_BUILDING.txt.gz I talked to Martin Pitt about this and he said there's no good reason why Ubuntu installs the debug version of dbus-daemon over top the production version. I think this was just a buggy solution for moving the binaries from /usr/bin into /bin. >It installs the binary from build/bus to debian/tmp//bin and then proceeds >a bit later to overwrite it with the binary from build-debug/bus. > >The ubuntu package adds --exec-prefix=/ to common_configure_flags in >debian/rules >so that the binary ends up in /bin and somehow this is having the side effect >of causing the debug binary to overwrite it. I move the flag to the flags >for the production build and added back the debug binaries to >debian/dbus-1-dbg.install and everything seems to be working again. I've confirmed your change fixes the aborts in the test suite. I double checked the change with pitti and we agreed that it looks good. The one problem is that the debug binaries get installed in bogus paths, but the good news is that it doesn't really affect anything :). So we agreed not to worry about that for now. I'll upload this to Precise, but since we have no open bugs on it (other than the one you opened :), I don't see any reason to back port it. Thanks again! (The freedesktop.org bug can be closed now.) It seems the bug was in Ubuntu's dbus packaging, so, closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.