The Xlib IO routines does not handle the EAGAIN (resource temporaly unavailable) correctly. It just treats it as a fatal IO error and quits. This error happens sometimes after running Xephyr for several hours. After that, it dies with the following message: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0.1" Apparently, the sockets used by the X clients are non-blocking, so I guess that's the reason of the error.
Sorry about the phenomenal bug spam, guys. Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Created attachment 9380 [details] [review] Proposed patch to solve the problem
We've been using this patch in the University (UFPR, 600 terminals using X over XDMCP and Xephyr, a multiseat environment) and in the Paraná Digital Project (11.000 four-head multiseats, 44.000 seats, Xephyr with XDMCP too) since this bug was reported (more than 1 year ago). All the machines are i386 and we've never found problems with this patch. Without it, our Xephyrs keep randomdly dying. Here is our version: diff -Nru libx11-1.1.3-no-patch/src/XlibInt.c libx11-1.1.3/src/XlibInt.c --- libx11-1.1.3-no-patch/src/XlibInt.c 2008-07-01 16:50:06.000000000 -0300 +++ libx11-1.1.3/src/XlibInt.c 2008-07-01 16:51:49.000000000 -0300 @@ -135,11 +135,7 @@ #define ECHECK(err) (errno == err) #define ESET(val) #else -#ifdef ISC #define ECHECK(err) ((errno == err) || ETEST()) -#else -#define ECHECK(err) (errno == err) -#endif #define ESET(val) errno = val #endif #endif Is there any argument against the patch? There was a discussion about it here: http://lists.freedesktop.org/archives/xorg/2006-August/017402.html And no one actually said NO to the patch. Any plans to apply it? Make it conditional to ISC and i386? This is the only patch that keeps us away of making "multiseat with Xephyr" without any patches (we don't even need "-geometry" on Xephyr anymore). Thanks, Paulo.
Created attachment 17637 [details] [review] Patch to add ETEST() checks only where needed Looking through XlibInt.c there are a few places ECHECK is used to check for specific errors in which you wouldn't want the ETEST checks applied, such as: if (ECHECK(EPIPE)) { (void) fprintf (stderr, "X connection to %s broken (explicit kill or server shutdown).\r\n", DisplayString (dpy)); } I think it's better to add ETEST() checks to the places they're not currently already in place and make sense. I propose this patch instead, which adds ETEST() to the ECHECK() calls where it's not already in place and would seem to make sense.
Pushed my patch from comment #4 to git master so this starts getting tested by more people: src/XlibInt.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) New commits: commit 501f4e0ada1690783ada05ad412e4b191ad55336 Author: Alan Coopersmith <alan.coopersmith@sun.com> Date: Thu Mar 12 17:38:21 2009 -0700 Bug 6820: Xlib shouldn't handle EAGAIN as a fatal IO error X.Org Bug #6820 <http://bugs.freedesktop.org/show_bug.cgi?id=6820> Patch #17637 <http://bugs.freedesktop.org/attachment.cgi?id=17637> Signed-off-by: Alan Coopersmith <alan.coopersmith@sun.com>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.