Summary: | XtAppPending can block uninterruptably. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Stephen Turnbull <stephenjturnbull> | ||||||
Component: | Lib/Xt | Assignee: | Jeremy Huddleston Sequoia <jeremyhu> | ||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||
Severity: | critical | ||||||||
Priority: | medium | CC: | cosmos, jeremyhu | ||||||
Version: | unspecified | ||||||||
Hardware: | x86 (IA32) | ||||||||
OS: | Mac OS X (All) | ||||||||
Whiteboard: | 2011BRB_Reviewed | ||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Stephen Turnbull
2009-02-11 00:11:00 UTC
Confirmed with XEmacs 21.4.22 (latest stable release), using the MacPorts libraries (@1.0.5_0). On Leopard, both with the Mercurial HEAD version of XEmacs 21.5 and with XEmacs 21.4 the problem does not occur with Xquartz project libraries (version 2.3.2.1). Xquartz project libraries, version 2.4.0-alpha2, also work correctly for both XEmacs 21.5 HEAD and XEmacs 21.4.22. So it's related to the MacPorts build. I tried rebuilding libXt with the latest version of NextEvent.c, which rebuilds the fdList every time through the WaitLoop. This didn't help. I tried changing the unconditional goto WaitLoop at the end of _XTWaitForSomething() to if (block) goto WaitLoop; else { #ifdef USE_POLL XtStackFree ((XtPointer) wf.fdlist, fdlist); #endif return -1; } This helps in the sense that it now blocks in XtAppProcessEvent, which documents the behavior. I still get an infloop in _XtWaitForSomething but that is presumably an XEmacs or Mac OS X bug. I'll take that to appropriate channels, but any advice would be appreciated. Created attachment 22934 [details] [review] Patch to handle POLLNVAL condition as an exception. Created attachment 22935 [details] [review] Patch to prevent reentering the WaitLoop if block==False. In light testing, the attached patch patch-NextEvent.c-XPOLL_EXCEPT seems to win for me. I don't really understand all ramifications, so I can't actually recommend it, and I've only tried it on Mac OS X "Leopard" 10.5.6. On the other hand, it seems rather safe because it only triggers in the case that an app receives an fd with POLLNVAL set, which would seem to count as an exceptional condition, and is apparently very rare since nobody else is experiencing the infloop. I also attach a patch to respect block=False in the WaitLoop (patch-NextEvent.c-BLOCK_CORRECTLY), which seem to me to be The RightThang[tm]. for XtAppPending, but again I don't understand the other uses of _XtWaitForSomething so I can't recommend it whole-heartedly. The reason this works in the xquartz.macosforge.org build is that we actually do: --- libXt-1.0.5.orig/configure.ac 2008-02-12 19:02:35.000000000 -0800 +++ libXt-1.0.5/configure.ac 2008-02-12 19:02:41.000000000 -0800 @@ -43,8 +43,6 @@ PKG_CHECK_MODULES(XT, sm x11 xproto kbpr AC_CHECK_HEADER([alloca.h], AC_DEFINE(INCLUDE_ALLOCA_H, 1, [Define to 1 if Xalloca.h should include <alloca.h>])) # Map function checks to old Imake #defines -AC_CHECK_FUNC(poll, AC_DEFINE(USE_POLL,1, - [Define to 1 if you have the `poll' function.])) AC_CHECK_FUNC(snprintf, AC_DEFINE(USE_SNPRINTF,1, [Define to 1 if you have the `snprintf' function.])) I inherited that, but the previous maintainer said that emacs exhibited a bug when we used poll() in libXt, but neither he nor I had the time to trace it any further. OK, that doesn't help localize the bug very much, since it completely changes the behavior of the event loop with respect to IO. Do you have a pointer to the previous discussion offhand? I plan to keep looking since the USE_POLL code is much more straightforward, and XtAppPending should never block, even temporarily, but I guess using select() instead of poll() is a reasonable workaround for MacPorts. Not to mention that something is very likely wrong with the Emacsen, since somehow they're passing around dead fds. FWIW, I'm on the emacs-devel list and haven't seen anything there; nor have others on the XEmacs lists complained about it, so it's not an urgent problem. But I would prefer that it not be closed. Stephen, sorry I don't have a link to that discussion. As I mentioned, it was something that I inherited and didn't dig deep into (when you inherit 200+ bugs, it's hard to find time to satisfy curiosities when there's not really something "broken"). The patches look good for me... although the *real* problem is figuring out why POLLNVAL is being returned when the fd is still valid. (In reply to comment #9) > The patches look good for me... although the *real* problem is figuring out why > POLLNVAL is being returned when the fd is still valid. Unfortunately, xnu (the OS X kernel) does not properly support poll() on character devices (in this case, /dev/ptmx). This is being tracked by Apple bug number 3710161 -- if you Google for that, you'll find a couple other references to it e.g. http://lists.apple.com/archives/darwin-dev/2005/May/msg00220.html I think that POLLNVAL means "don't poll() me plz". The real solution for this is for autoconf to automatically detect that we are building for OS X and disable poll(). That's curious... so then why is Stephen eventually getting valid data at some point? Jamey, didn't you say you touched something like this recently? Is this still a problem? This symptom doesn't appear to be related to libX11 or libxcb, but we did fix a similar-looking bug in libxcb. Here's the patch that we used there: http://cgit.freedesktop.org/xcb/libxcb/commit/?id=f0565e8f06aadf760a9065a97b8cf5ab9cbd18de Perhaps that will inspire a solution? *** Bug 13471 has been marked as a duplicate of this bug. *** Resolved on git master in the same way as libxcb handled it: To ssh://git.freedesktop.org/git/xorg/lib/libXt 0d8ef50..70bb9e2 master -> master I've reverted this change from libXt as it caused some regressions. #47203 #47216 Reopening for further investigation, but I think this might not be necessary any more since I believe we should fall through to _XIOError() from _XEventsQueued() from XEventsQueued() from FindInputs() because xcb should be flagging the relevant connection as broken and we check xcb_connection_has_error() in _XEventsQueued. Comment on attachment 22935 [details] [review] Patch to prevent reentering the WaitLoop if block==False. >--- src/NextEvent.c.orig 2006-07-12 03:05:55.000000000 +0900 >+++ src/NextEvent.c 2009-02-14 17:22:36.000000000 +0900 >@@ -722,7 +722,14 @@ > #endif > return dpy_no; > } >- goto WaitLoop; >+ if (block) >+ goto WaitLoop; >+ else { >+#ifdef USE_POLL >+ XtStackFree ((XtPointer) wf.fdlist, fdlist); >+#endif >+ return -1; >+ } > } > > #define IeCallProc(ptr) \ I am also frequently getting same kind of error on a 32 bit Linux X application. When this error happened I forcefully dumped core and see the back trace. Below is the part of back trace. #11 0xf7381864 in _XIOError () from /lib/libX11.so.6 #12 0xf737ed2f in _XEventsQueued () from /lib/libX11.so.6 #13 0xf736f770 in XEventsQueued () from /lib/libX11.so.6 #14 0xf74a6c26 in _XtWaitForSomething () from /lib/libXt.so.6 #15 0xf74a7eda in XtAppProcessEvent () from /lib/libXt.so.6 Is this error related with this bug ?? Is this issue have some fix till now?? Please help. Hi, Subrata I'm the OP, not a developer, but it seems the developers aren't able to respond immediately so I'll tell you what I can. I'd like to begin by advising you that the wait loop is quite complex, so the partial trace you sent is not enough to confirm any relationship with this bug. Please add the whole trace you get so that the developers can more accurately diagnose the situation. Now, in the case that I reported, there were two interacting problems. The first was that the loop entry test didn't handle all possible conditions, only those that the developers recognized as valid. This is a general practice in Xorg, apparently, and I was told my patches attached to the issue which validate the condition would not be accepted. The second problem was that due to an incomplete implementation on Apple's part (which didn't get fixed in the 3 years I tracked this bug), an invalid file descriptor was left in the table, and so it was possible to enter the loop because of an invalid condition. Since nothing in the loop can fix the invalid condition, this led to an infinite loop. I couldn't find a way to fix or workaround the invalid file descriptor, either, and an alternative configure option to Xt that used select() rather than poll() gave acceptable results so I eventually gave up on this bug. AFAIK there have never been any bugs in Linux that would cause an invalid file descriptor to be left in the table, or a spurious report of POLLNVAL. So if there is any relation of your issue to this bug, the most likely cause is in the program that is freezing. That is, the application is allocating a file descriptor, it doesn't get closed but somehow becomes invalid, and thus causes this behavior. Of course it is not impossible that a new bug has been introduced into Linux or into Xorg, but I think your best use of time would be to check on the possibility that the application has a bug in its resource cleanup as described above. It appears that libxcb uses a patch similar in intent to my "handle POLLNVAL" patch above. Perhaps if you switch to a libxcb-based build of Xorg you'll get better results. (That's a wild guess since I have no idea what your Xorg configuration looks like.) Good luck, sorry I can't be more helpful. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/lib/libxt/issues/3. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.