From 9dfc6f6427b4b44d5684dad1ae5cea45a82821ee Mon Sep 17 00:00:00 2001
From: Rob Pike
Each code point is distinct; for instance, upper and lower case letters @@ -197,7 +198,7 @@ token is integer, floating-point, imaginary, - character, or + rune, or string literal @@ -359,13 +360,15 @@ imaginary_lit = (decimals | float_lit) "i" . -
-A character literal represents a character constant, -typically a Unicode code point, as one or more characters enclosed in single -quotes. Within the quotes, any character may appear except single -quote and newline. A single quoted character represents itself, +A rune literal represents a rune constant, +an integer value identifying a Unicode code point. +A rune literal is expressed as one or more characters enclosed in single quotes. +Within the quotes, any character may appear except single +quote and newline. A single quoted character represents the Unicode value +of the character itself, while multi-character sequences beginning with a backslash encode values in various formats.
@@ -379,7 +382,7 @@ a literala
, Unicode U+0061, value 0x61
, while
a literal a
-dieresis, U+00E4, value 0xe4
.
-Several backslash escapes allow arbitrary values to be represented
+Several backslash escapes allow arbitrary values to be encoded as
as ASCII text. There are four ways to represent the integer value
as a numeric constant: \x
followed by exactly two hexadecimal
digits; \u
followed by exactly four hexadecimal digits;
@@ -408,11 +411,11 @@ After a backslash, certain single-character escapes represent special values:
\t U+0009 horizontal tab
\v U+000b vertical tab
\\ U+005c backslash
-\' U+0027 single quote (valid escape only within character literals)
+\' U+0027 single quote (valid escape only within rune literals)
\" U+0022 double quote (valid escape only within string literals)
-All other sequences starting with a backslash are illegal inside character literals. +All other sequences starting with a backslash are illegal inside rune literals.
char_lit = "'" ( unicode_value | byte_value ) "'" . @@ -438,6 +441,11 @@ escaped_char = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | ` '\xff' '\u12e4' '\U00101234' +'aa' // illegal: too many characters +'\xa' // illegal: too few hexadecimal digits +'\0' // illegal: too few octal digits +'\uDFFF' // illegal: surrogate half +'\U00110000' // illegal: invalid Unicode code point@@ -452,7 +460,8 @@ raw string literals and interpreted string literals. Raw string literals are character sequences between back quotes
``
. Within the quotes, any character is legal except
back quote. The value of a raw string literal is the
-string composed of the uninterpreted characters between the quotes;
+string composed of the uninterpreted (implicitly UTF-8-encoded) characters
+between the quotes;
in particular, backslashes have no special meaning and the string may
contain newlines.
Carriage returns inside raw string literals
@@ -463,8 +472,9 @@ Interpreted string literals are character sequences between double
quotes ""
. The text between the quotes,
which may not contain newlines, forms the
value of the literal, with backslash escapes interpreted as they
-are in character literals (except that \'
is illegal and
-\"
is legal). The three-digit octal (\
nnn)
+are in rune literals (except that \'
is illegal and
+\"
is legal), with the same restrictions.
+The three-digit octal (\
nnn)
and two-digit hexadecimal (\x
nn) escapes represent individual
bytes of the resulting string; all other escapes represent
the (possibly multi-byte) UTF-8 encoding of individual characters.
@@ -491,6 +501,8 @@ interpreted_string_lit = `"` { unicode_value | byte_value } `"` .
"æ¥æ¬èª"
"\u65e5æ¬\U00008a9e"
"\xff\u00FF"
+"\uD800" // illegal: surrogate half
+"\U00110000" // illegal: invalid Unicode code point
@@ -500,15 +512,15 @@ These examples all represent the same string:
"æ¥æ¬èª" // UTF-8 input text `æ¥æ¬èª` // UTF-8 input text as a raw literal -"\u65e5\u672c\u8a9e" // The explicit Unicode code points -"\U000065e5\U0000672c\U00008a9e" // The explicit Unicode code points -"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // The explicit UTF-8 bytes +"\u65e5\u672c\u8a9e" // the explicit Unicode code points +"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points +"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
If the source code represents a character as two code points, such as a combining form involving an accent and a letter, the result will be -an error if placed in a character literal (it is not a single code +an error if placed in a rune literal (it is not a single code point), and will appear as two code points if placed in a string literal.
@@ -517,7 +529,7 @@ literal.There are boolean constants, -character constants, +rune constants, integer constants, floating-point constants, complex constants, and string constants. Character, integer, floating-point, @@ -527,7 +539,7 @@ collectively called numeric constants.
A constant value is represented by a
-character,
+rune,
integer,
floating-point,
imaginary,
@@ -2392,7 +2404,7 @@ In all other cases, x.f
is illegal.
If x
is of pointer or interface type and has the value
nil
, assigning to, evaluating, or calling x.f
causes a run-time panic.
-
+
@@ -3586,7 +3598,7 @@ wherever it is legal to use an operand of boolean, numeric, or string type, respectively. Except for shift operations, if the operands of a binary operation are different kinds of untyped constants, the operation and, for non-boolean operations, the result use -the kind that appears later in this list: integer, character, floating-point, complex. +the kind that appears later in this list: integer, rune, floating-point, complex. For example, an untyped integer constant divided by an untyped complex constant yields an untyped complex constant.
@@ -3614,7 +3626,7 @@ const f = int32(1) << 33 // f == 0 (type int32) const g = float64(2) >> 1 // illegal (float64(2) is a typed floating-point constant) const h = "foo" > "bar" // h == true (untyped boolean constant) const j = true // j == true (untyped boolean constant) -const k = 'w' + 1 // k == 'x' (untyped character constant) +const k = 'w' + 1 // k == 'x' (untyped rune constant) const l = "hi" // l == "hi" (untyped string constant) const m = string(k) // m == "x" (type string) const Σ = 1 - 0.707i // (untyped complex constant) @@ -3624,7 +3636,7 @@ const Φ = iota*1i - 1/1i // (untyped complex constant)
Applying the built-in function complex
to untyped
-integer, character, or floating-point constants yields
+integer, rune, or floating-point constants yields
an untyped complex constant.