Summary: | Valgrind warnings when running test-bus | ||
---|---|---|---|
Product: | dbus | Reporter: | Philip Withnall <bugzilla> |
Component: | core | Assignee: | D-Bus Maintainers <dbus> |
Status: | RESOLVED FIXED | QA Contact: | D-Bus Maintainers <dbus> |
Severity: | minor | ||
Priority: | medium | CC: | bugzilla |
Version: | git master | Keywords: | patch |
Hardware: | All | ||
OS: | All | ||
Whiteboard: | review+ | ||
i915 platform: | i915 features: | ||
Attachments: |
dbus: Fix writing off the end of an fd_set when testing with Valgrind
dbus: Fix writing off the end of an fd_set when testing with Valgrind |
Description
Philip Withnall
2017-02-16 15:07:31 UTC
(In reply to Philip Withnall from comment #0) > IIRC the epoll problems are false positives, but we might as well > zero-initialise the epoll event struct anyway. I think that's the one remaining patch on Bug #88808. I should finish that. (In reply to Philip Withnall from comment #0) > ==28587== Warning: invalid file descriptor 1024 in syscall close() That's the fix for Bug #83899 (see also Bug #73689) being confused by the cognitive dissonance between valgrind (which pretends we only have low fds) and the kernel as accessed directly via /proc (which knows we also have Valgrind's internal fds). We could just #include <dbus/dbus-valgrind-internal.h>, and ignore fds > 1024 in _dbus_close_all() when RUNNING_ON_VALGRIND is nonzero. > ==28587== Invalid read of size 8 > ==28587== at 0x47097B: _dbus_check_fdleaks_enter (dbus-message-util.c:185) > ==28587== by 0x46C4F8: test_pre_hook (test-main.c:76) > ==28587== by 0x46C786: main (test-main.c:129) > ==28587== Address 0x74b0800 is 0 bytes after a block of size 128 alloc'd Similarly, this is Valgrind's special high fds causing us to overrun the fd_set. We could just ignore fds above the bounds of a fd_set - we never deliberately allocate non-contiguous fds anyway. Created attachment 129822 [details] [review] dbus: Fix writing off the end of an fd_set when testing with Valgrind If the test-bus test is run under Valgrind, its code to detect FD leaks accidentally writes off the end of the fd_set it uses, as Valgrind opens some high FDs (≥1024) for internal use. Detect if it’s running under Valgrind and ignore those FDs; and otherwise assert that the FDs are within the size of an fd_set. Signed-off-by: Philip Withnall <withnall@endlessm.com> (In reply to Philip Withnall from comment #3) > Created attachment 129822 [details] [review] [review] > dbus: Fix writing off the end of an fd_set when testing with Valgrind > > If the test-bus test is run under Valgrind, its code to detect FD leaks > accidentally writes off the end of the fd_set it uses, as Valgrind opens > some high FDs (≥1024) for internal use. > > Detect if it’s running under Valgrind and ignore those FDs; and > otherwise assert that the FDs are within the size of an fd_set. > > Signed-off-by: Philip Withnall <withnall@endlessm.com> This fixes the Valgrind FD problems. For the epoll_event problems, I’ve attached a patch in bug #88808. I briefly ran a few other tests (test-corrupt, for example) in Valgrind and found some other leaks, which look like they’re in the tests themselves. I’ll try and find some time to actually investigate and fix them at some point. Comment on attachment 129822 [details] [review] dbus: Fix writing off the end of an fd_set when testing with Valgrind Review of attachment 129822 [details] [review]: ----------------------------------------------------------------- ::: dbus/dbus-message-util.c @@ +183,4 @@ > if (fd == dirfd (d)) > continue; > > +#ifdef WITH_VALGRIND There shouldn't be any need for this #ifdef, dbus-valgrind-internal.h unconditionally defines the various macros (for instance RUNNING_ON_VALGRIND expands to 0). I think I'd prefer "fd >= 1024" or whatever specific offset valgrind starts at. If you do need the #ifdef for whatever reason, shouldn't the assertion be outside it? Comment on attachment 129822 [details] [review] dbus: Fix writing off the end of an fd_set when testing with Valgrind Review of attachment 129822 [details] [review]: ----------------------------------------------------------------- ::: dbus/dbus-message-util.c @@ +183,4 @@ > if (fd == dirfd (d)) > continue; > > +#ifdef WITH_VALGRIND The #ifdef is needed for the (probably quite common) case where dbus-daemon is compiled without --with-valgrind. --with-valgrind=no is the default. In that case, RUNNING_ON_VALGRIND is 0 even when running under Valgrind, and the assertion causes the test to fail. My thinking was that: • The ultimate aim here is to add `_dbus_assert (fd < FD_SETSIZE)`, since otherwise we write off the end of the fd_set. • That works whenever we’re not running under Valgrind. Under Valgrind, we have high-numbered FDs and it aborts with a false positive. • RUNNING_ON_VALGRIND is not 100% reliable at telling us we’re under Valgrind, due to needing to be compiled with --with-valgrind. => The assertion has to only be enabled when we have WITH_VALGRIND. Comment on attachment 129822 [details] [review] dbus: Fix writing off the end of an fd_set when testing with Valgrind Review of attachment 129822 [details] [review]: ----------------------------------------------------------------- ::: dbus/dbus-message-util.c @@ +183,4 @@ > if (fd == dirfd (d)) > continue; > > +#ifdef WITH_VALGRIND > I think I'd prefer "fd >= 1024" or whatever specific offset valgrind starts at. Fair point. I’ll have to check what they use — there is a chance it’s FD_SETSIZE. I can’t immediately find anything relevant in the Valgrind code base or its installed headers. :-\ (In reply to Philip Withnall from comment #6) > My thinking was that: > • The ultimate aim here is to add `_dbus_assert (fd < FD_SETSIZE)`, since > otherwise we write off the end of the fd_set. > • That works whenever we’re not running under Valgrind. Under Valgrind, we > have high-numbered FDs and it aborts with a false positive. Well, if we write off the end of the fd_set, that's always a bug regardless. Since this is just best-effort stuff for detecting whether we leaked any fds, how about this? * If fd < FD_SETSIZE, we track it * If fd >= FD_SET_SIZE, we emit _dbus_verbose ("fd %d unexpectedly large, cannot track whether it is leaked\n", fd); or something, and do not track it This can all be unconditional; it doesn't need to test RUNNING_ON_VALGRIND. (Rationale: we know we don't actually allocate thousands of fds, if only because on many OSs and configurations we'd run out of fd quota if we tried; we don't go to any specific effort to open fds of large magnitude; and POSIX guarantees that things like open() and socket() will always allocate the lowest available fd. So if we leak anything, realistically it's going to be < 100.) (In reply to Philip Withnall from comment #0) > ==28587== Warning: invalid file descriptor 1024 in syscall close() > ==28587== Warning: invalid file descriptor 1025 in syscall close() > ==28587== Warning: invalid file descriptor 1026 in syscall close() > ==28587== Warning: invalid file descriptor 1027 in syscall close() > ==28587== Use --log-fd=<number> to select an alternative log fd. > ==28587== Warning: invalid file descriptor 1028 in syscall close() > ==28587== Warning: invalid file descriptor 1029 in syscall close() > ==28587== Warning: invalid file descriptor 1030 in syscall close() If you want to address these warnings, *this* is where we'd need to use RUNNING_ON_VALGRIND, I think. The log message led me via https://codesearch.debian.net/search?q=package%3Avalgrind+invalid+file+descriptor&page=1 to https://codesearch.debian.net/search?q=package%3Avalgrind+soft_limit and from there to https://sources.debian.net/src/valgrind/1:3.12.0-1.1/coregrind/m_main.c/?hl=1515#L1515 which indicates that valgrind adjusts the soft fd limit up a bit, then puts its own reserved fds just beyond that. But it seems like maybe overkill to have a block that gets the rlimit specifically so that we can skip closing fds beyond it... Created attachment 130024 [details] [review] dbus: Fix writing off the end of an fd_set when testing with Valgrind If the test-bus test is run under Valgrind, its code to detect FD leaks accidentally writes off the end of the fd_set it uses, as Valgrind opens some high FDs (≥1024) for internal use. Ignore those FDs. Realistically, they are never going to be leaks — in order to have a false negative from omitting this check, D-Bus would have to allocate and not leak all the FDs up to FD_SETSIZE, and then leak the first FD over that which it allocated. D-Bus never allocates anywhere near that number of FDs concurrently. Signed-off-by: Philip Withnall <withnall@endlessm.com> Fixed in git for 1.11.12, thanks (In reply to Simon McVittie from comment #9) > (In reply to Philip Withnall from comment #0) > > ==28587== Warning: invalid file descriptor 1024 in syscall close() > > ==28587== Warning: invalid file descriptor 1025 in syscall close() > > ==28587== Warning: invalid file descriptor 1026 in syscall close() > > ==28587== Warning: invalid file descriptor 1027 in syscall close() > > ==28587== Use --log-fd=<number> to select an alternative log fd. > > ==28587== Warning: invalid file descriptor 1028 in syscall close() > > ==28587== Warning: invalid file descriptor 1029 in syscall close() > > ==28587== Warning: invalid file descriptor 1030 in syscall close() > > If you want to address these warnings, *this* is where we'd need to use > RUNNING_ON_VALGRIND, I think. I'm assuming you don't want to bother with that. If you do, please reopen, or open another bug. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.