Bug 63072

Summary:	allow Unicode non-characters as per Corrigendum 9
Product:	dbus	Reporter:	Simon McVittie <smcv>
Component:	core	Assignee:	Simon McVittie <smcv>
Status:	RESOLVED FIXED	QA Contact:	Havoc Pennington <hp>
Severity:	normal
Priority:	medium	CC:	desrt, lennart, smcv, thiago, walters
Version:	unspecified	Keywords:	patch
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Attachments:	[1.6, master] Accept non-characters when validating Unicode [master] Specification: explicitly allow the Unicode noncharacters [1.6, master] [v2] Accept non-characters when validating Unicode

Description Simon McVittie 2013-04-03 10:33:00 UTC

libdbus and the D-Bus Specification currently disallow Unicode non-characters (U+FDD0..U+FDEF, U+xFFFE, U+xFFFF) in UTF-8 strings. This is consistent with pre-2013 versions of GLib.

There has been considerable discussion of this in the past, including:

<http://lists.freedesktop.org/archives/dbus/2010-February/012182.html>

<https://bugs.freedesktop.org/show_bug.cgi?id=40817>

<https://bugzilla.gnome.org/show_bug.cgi?id=107427>

However, Unicode Corrigendum 9 <http://www.unicode.org/versions/corrigendum9.html> clarifies that this was not the intention of the standard, and g_utf8_validate() has been changed <https://bugzilla.gnome.org/show_bug.cgi?id=694669> to consider noncharacters to be valid. This matches the interpretation Thiago advocated in our previous discussions.

We should consider changing the D-Bus Specification, the reference implementation, and any bindings that do their own validity checking (notably dbus-python, at least in git master) to allow non-characters.

As a practical note, GDBus uses g_utf8_validate() to check for validity, so it will happily send messages that dbus-daemon considers to be invalid (and get kicked off the bus as a result).

Comment 1 Simon McVittie 2013-04-03 10:36:41 UTC

Should this change also be made in D-Bus 1.6? Answers on a postcard.

For: if an application using new-GDBus sends a message containing Corrigendum 9 UTF-8, making this change in D-Bus 1.6 means it won't get rejected.

Against: an application expecting a message in "GLib 2.34 UTF-8" could receive an unexpected message in "Corrigendum 9 UTF-8" via a stable-branch dbus-daemon, and crash.

If we're going to make this change at all then my inclination would be to say "yes, also change D-Bus 1.6".

Comment 2 Thiago Macieira 2013-04-03 14:59:58 UTC

"yes, also change D-Bus 1.6"

The number of applications that depend on not receiving non-characters via D-Bus must be vanishingly small.

Comment 3 Simon McVittie 2013-04-22 14:32:45 UTC

Created attachment 78331 [details] [review]
[1.6, master] Accept non-characters when validating Unicode

Unicode Corrigendum #9 clarifies that the non-characters U+nFFFE
(for n in the range 0 to 0x10), U+nFFFF (for n in the same range),
and U+FDD0..U+FDEF are valid for interchange, and their presence
does not make a string ill-formed.

GLib 2.36 made the corresponding change in its definition of UTF-8
as used by g_utf8_validate() and similar functions.

Comment 4 Simon McVittie 2013-04-22 14:33:32 UTC

Created attachment 78332 [details] [review]
[master] Specification: explicitly allow the Unicode noncharacters

This follows Unicode Corrigendum #9.

Comment 5 Simon McVittie 2013-04-22 14:37:18 UTC

Created attachment 78333 [details] [review]
[1.6, master] [v2] Accept non-characters when validating Unicode

Unicode Corrigendum #9 clarifies that the non-characters U+nFFFE
(for n in the range 0 to 0x10), U+nFFFF (for n in the same range),
and U+FDD0..U+FDEF are valid for interchange, and their presence
does not make a string ill-formed.

GLib 2.36 made the corresponding change in its definition of UTF-8
as used by g_utf8_validate() and similar functions.

---

v2: also fix the comment above UNICODE_VALID().

Comment 6 Thiago Macieira 2013-04-22 14:56:03 UTC

Comment on attachment 78331 [details] [review]
[1.6, master] Accept non-characters when validating Unicode

Review of attachment 78331 [details] [review]:
-----------------------------------------------------------------

Ship it!

Comment 7 Thiago Macieira 2013-04-22 14:56:34 UTC

Comment on attachment 78332 [details] [review]
[master] Specification: explicitly allow the Unicode noncharacters

Review of attachment 78332 [details] [review]:
-----------------------------------------------------------------

Ship it!

Comment 8 Thiago Macieira 2013-04-22 14:57:02 UTC

Comment on attachment 78333 [details] [review]
[1.6, master] [v2] Accept non-characters when validating Unicode

Review of attachment 78333 [details] [review]:
-----------------------------------------------------------------

Ship it!

Comment 9 Simon McVittie 2013-04-22 15:28:38 UTC

Fixed in git for 1.7.2, 1.6.10.

Any chance you could review Bug #63166, which breaks the build on recent Linux systems, including mine? I think that's the only release blocker at the moment.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.