Bug 63072

Summary: allow Unicode non-characters as per Corrigendum 9
Product: dbus Reporter: Simon McVittie <smcv>
Component: coreAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: Havoc Pennington <hp>
Severity: normal    
Priority: medium CC: desrt, lennart, smcv, thiago, walters
Version: unspecifiedKeywords: patch
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: [1.6, master] Accept non-characters when validating Unicode
[master] Specification: explicitly allow the Unicode noncharacters
[1.6, master] [v2] Accept non-characters when validating Unicode

Description Simon McVittie 2013-04-03 10:33:00 UTC
libdbus and the D-Bus Specification currently disallow Unicode non-characters (U+FDD0..U+FDEF, U+xFFFE, U+xFFFF) in UTF-8 strings. This is consistent with pre-2013 versions of GLib.

There has been considerable discussion of this in the past, including:

<http://lists.freedesktop.org/archives/dbus/2010-February/012182.html>

<https://bugs.freedesktop.org/show_bug.cgi?id=40817>

<https://bugzilla.gnome.org/show_bug.cgi?id=107427>

However, Unicode Corrigendum 9 <http://www.unicode.org/versions/corrigendum9.html> clarifies that this was not the intention of the standard, and g_utf8_validate() has been changed <https://bugzilla.gnome.org/show_bug.cgi?id=694669> to consider noncharacters to be valid. This matches the interpretation Thiago advocated in our previous discussions.

We should consider changing the D-Bus Specification, the reference implementation, and any bindings that do their own validity checking (notably dbus-python, at least in git master) to allow non-characters.

As a practical note, GDBus uses g_utf8_validate() to check for validity, so it will happily send messages that dbus-daemon considers to be invalid (and get kicked off the bus as a result).
Comment 1 Simon McVittie 2013-04-03 10:36:41 UTC
Should this change also be made in D-Bus 1.6? Answers on a postcard.

For: if an application using new-GDBus sends a message containing Corrigendum 9 UTF-8, making this change in D-Bus 1.6 means it won't get rejected.

Against: an application expecting a message in "GLib 2.34 UTF-8" could receive an unexpected message in "Corrigendum 9 UTF-8" via a stable-branch dbus-daemon, and crash.

If we're going to make this change at all then my inclination would be to say "yes, also change D-Bus 1.6".
Comment 2 Thiago Macieira 2013-04-03 14:59:58 UTC
"yes, also change D-Bus 1.6"

The number of applications that depend on not receiving non-characters via D-Bus must be vanishingly small.
Comment 3 Simon McVittie 2013-04-22 14:32:45 UTC
Created attachment 78331 [details] [review]
[1.6, master] Accept non-characters when validating Unicode

Unicode Corrigendum #9 clarifies that the non-characters U+nFFFE
(for n in the range 0 to 0x10), U+nFFFF (for n in the same range),
and U+FDD0..U+FDEF are valid for interchange, and their presence
does not make a string ill-formed.

GLib 2.36 made the corresponding change in its definition of UTF-8
as used by g_utf8_validate() and similar functions.
Comment 4 Simon McVittie 2013-04-22 14:33:32 UTC
Created attachment 78332 [details] [review]
[master] Specification: explicitly allow the Unicode noncharacters

This follows Unicode Corrigendum #9.
Comment 5 Simon McVittie 2013-04-22 14:37:18 UTC
Created attachment 78333 [details] [review]
[1.6, master] [v2] Accept non-characters when validating Unicode

Unicode Corrigendum #9 clarifies that the non-characters U+nFFFE
(for n in the range 0 to 0x10), U+nFFFF (for n in the same range),
and U+FDD0..U+FDEF are valid for interchange, and their presence
does not make a string ill-formed.

GLib 2.36 made the corresponding change in its definition of UTF-8
as used by g_utf8_validate() and similar functions.

---

v2: also fix the comment above UNICODE_VALID().
Comment 6 Thiago Macieira 2013-04-22 14:56:03 UTC
Comment on attachment 78331 [details] [review]
[1.6, master] Accept non-characters when validating Unicode

Review of attachment 78331 [details] [review]:
-----------------------------------------------------------------

Ship it!
Comment 7 Thiago Macieira 2013-04-22 14:56:34 UTC
Comment on attachment 78332 [details] [review]
[master] Specification: explicitly allow the Unicode noncharacters

Review of attachment 78332 [details] [review]:
-----------------------------------------------------------------

Ship it!
Comment 8 Thiago Macieira 2013-04-22 14:57:02 UTC
Comment on attachment 78333 [details] [review]
[1.6, master] [v2] Accept non-characters when validating Unicode

Review of attachment 78333 [details] [review]:
-----------------------------------------------------------------

Ship it!
Comment 9 Simon McVittie 2013-04-22 15:28:38 UTC
Fixed in git for 1.7.2, 1.6.10.

Any chance you could review Bug #63166, which breaks the build on recent Linux systems, including mine? I think that's the only release blocker at the moment.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.