Top Level Domain Name Specification
Autonomica AB
Franzengatan 5SE-112 51 StockholmSwedenliman@autonomica.sehttp://www.autonomica.se/
Individual submission
The precise syntax allowed in top-level domain name labels
has been the subject to some debate. RFC 1123, for example,
makes the statement that top-level domain names are
"alphabetic". This document updates the definition of
allowable top-level domain names in order to support
internationalized domain names (IDNs), as encoded by the IDNA
protocols. This document focuses narrowly on the issue of
IDNs and does not make any other changes or clarifications to
existing domain name syntax rules.
The precise syntax allowed in top-level domain (TLD) name
labels has been the subject to some debate.
RFC 1123, for example, states
that TLD names must be "alphabetic", which is interpreted as
excluding the hyphen (or dash) character. This document
updates the definition of allowable top-level domain names to
support internationalized domain names that consist of Unicode
letters, as encoded by the IDNA protocols [RFCXXX]. In
particular, this document clarifies that ASCII TLDs beginning
with the IDN A-label prefix (currently "xn--"), as encoded by
IDNA, are permissible as DNS TLD names as long as they are
made from Unicode letters. This document focuses narrowly on
the issue of allowable ASCII labels encoded by the IDNA
protocols and does not (and is not intended to) make any other
changes or clarifications to existing domain name syntax
rules.
The terminology used in this document is as defined in
RFC 0952
and RFC 1035.
The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted
as described in RFC 2119.RFC 0952 states (among other
things) that a host name is;
... a text string up to 24 characters drawn from the
alphabet (A-Z), digits (0-9), minus sign (-), and period
(.). Note that periods are only allowed when they serve
to delimit components of "domain style names". (See
RFC-921, "Domain Name System Implementation Schedule", for
background). No blank or space characters are permitted
as part of a name. No distinction is made between upper
and lower case. The first character must be an alpha
character. The last character must not be a minus sign or
period.
RFC 1123 reaffirms this
definition, making two additional changes to the syntax:
The syntax of a legal Internet host name was specified in
RFC-952 [DNS:4]. One aspect of host name syntax is hereby
changed: the restriction on the first character is relaxed
to allow either a letter or a digit. Host software MUST
support this more liberal syntax.
and
However, a valid host name can never have the
dotted-decimal form #.#.#.#, since at least the
highest-level component label will be alphabetic.
The restrictions on host names and specifically TLD names
have always been, at least in part, driven by human factors
considerations. Underscores in host names are avoided because
they are indistinguishable from hyphens when seen on a page or
written in longhand, and to some extent because of early
internationalization issues. The original "no leading digits"
rule was driven by wanting to make sure that even imprecise
programming or human thought errors didn't confuse addresses
with names.
The wish to express TLD names in other scripts than Latin
makes it necessary to relax the the rules for TLD
names. However, the old motivations for keeping the TLD names
alphabetical still hold, and furthermore, certain
characteristics of some IDN names with digits in them make
them unsuitable as DNS labels. The problem is referred to as
"jumping digits", and is described in draft-ietf-idnabis-bidi.
In order to keep changes to existing specifications to a
minimum but to still allow for IDN TLD names, this document
hereby changes the existing specification to allow for IDN TLD
names in the "A-label form" as specified by the IDNA-2008
specifications, i.e., an ASCII-compatible-encoding, using
reversible Punycode conversion from valid IDN labels, with
IDN A-label prefix (currently "xn--"), but requiring that the
native-character ("Unicode") form consist of letters only.
Restricted-A-label is an A-label as defined in
draft-ietf-idna-defs converted from (and convertible to) a
U-label that is consistent with the definition in
draft-ietf-idna-defs and that is further restricted to
contain only Unicode characters of General Category "L". Note
that "L" contains several sub-categories. The list is:
although IDNA prohibits (categorizes as DISALLOWED) all
characters in the last two categories and several of the
characters that fall into the other categories.
This new specification reflects current practice in
registration of TLD names by the IANA, and allows for IDNs.
It should be noted that there are many issues that must be
considered in making any changes to current restrictions on
DNS labels, especially at the top level. DNS software is
widely deployed, and some of that software contains embedded
assumptions that may not hold if DNS names are used at the top
level that differ from the older rules. For example, when TLDs
longer than 3 characters became available (e.g., .info,
.museum, etc.), some deployed systems did not process such DNS
names properly. This document does not take the position that
no problems will result when IDN TLDs are created, but does
recognize that relaxing the syntax of allowed TLDs is
necessary in order to allow deployment of IDNs to happen.
It is also carefully noted that the above specification is not
the only limiting factor on TLD labels. There may be other
entities than the IETF that have influence over TLD names, and
which may decide to restrict the names further. The above
technical specification is just one limiting factor.
This memo changes the specifications for TLD names registered
by the IANA, and the IANA is requested to change its
registration process to use the above specification.
This document is believed to have limited security
consequences.
It may introduce stability issues where names registered
under this new specification may inter-operate badly with old
software written to enforce a strict interpretation of the old
specification. This might also open up attack vectors
(e.g. form names being truncated). However, it is believed that
such software is scarce on the Internet, and since TLD names
that do not adhere to a strict interpretation of the old
specification are already used (including test IDNs) without
apparent problems, it is believed that this change of the
specification will not create major stability or security
problems on the Internet.
Clean up references. Check situation with references to
Internet Drafts. Are they/will they be published as RFCs
before this draft?
Verify quotations.
Get rid of the term "jumping digits" and replace with
appropriate wording. Also mention additional reasons not
to have digits that relate to Input Method Editors and
localization.
Substantial comments and improvements supplied by Thomas
Narten and John Klensin. Decided to go for a minimal change
approach. Also noted that U-labels have to be letters due to
jumping digit problem. Rewritten major parts.
First cut. Prompted by Olafur Gudmundsson and Tina Dam.