<!--{
"Title": "The Go Programming Language Specification",
- "Subtitle": "Version of August 17, 2012",
+ "Subtitle": "Version of August 29, 2012",
"Path": "/ref/spec"
}-->
canonicalized, so a single accented code point is distinct from the
same character constructed from combining an accent and a letter;
those are treated as two code points. For simplicity, this document
-will use the term <i>character</i> to refer to a Unicode code point.
+will use the unqualified term <i>character</i> to refer to a Unicode code point
+in the source text.
</p>
<p>
Each code point is distinct; for instance, upper and lower case letters
<a href="#Integer_literals">integer</a>,
<a href="#Floating-point_literals">floating-point</a>,
<a href="#Imaginary_literals">imaginary</a>,
- <a href="#Character_literals">character</a>, or
+ <a href="#Rune_literals">rune</a>, or
<a href="#String_literals">string</a> literal
</li>
</pre>
-<h3 id="Character_literals">Character literals</h3>
+<h3 id="Rune_literals">Rune literals</h3>
<p>
-A character literal represents a <a href="#Constants">character constant</a>,
-typically a Unicode code point, as one or more characters enclosed in single
-quotes. Within the quotes, any character may appear except single
-quote and newline. A single quoted character represents itself,
+A rune literal represents a <a href="#Constants">rune constant</a>,
+an integer value identifying a Unicode code point.
+A rune literal is expressed as one or more characters enclosed in single quotes.
+Within the quotes, any character may appear except single
+quote and newline. A single quoted character represents the Unicode value
+of the character itself,
while multi-character sequences beginning with a backslash encode
values in various formats.
</p>
a literal <code>a</code>-dieresis, U+00E4, value <code>0xe4</code>.
</p>
<p>
-Several backslash escapes allow arbitrary values to be represented
+Several backslash escapes allow arbitrary values to be encoded as
as ASCII text. There are four ways to represent the integer value
as a numeric constant: <code>\x</code> followed by exactly two hexadecimal
digits; <code>\u</code> followed by exactly four hexadecimal digits;
\t U+0009 horizontal tab
\v U+000b vertical tab
\\ U+005c backslash
-\' U+0027 single quote (valid escape only within character literals)
+\' U+0027 single quote (valid escape only within rune literals)
\" U+0022 double quote (valid escape only within string literals)
</pre>
<p>
-All other sequences starting with a backslash are illegal inside character literals.
+All other sequences starting with a backslash are illegal inside rune literals.
</p>
<pre class="ebnf">
char_lit = "'" ( unicode_value | byte_value ) "'" .
'\xff'
'\u12e4'
'\U00101234'
+'aa' // illegal: too many characters
+'\xa' // illegal: too few hexadecimal digits
+'\0' // illegal: too few octal digits
+'\uDFFF' // illegal: surrogate half
+'\U00110000' // illegal: invalid Unicode code point
</pre>
Raw string literals are character sequences between back quotes
<code>``</code>. Within the quotes, any character is legal except
back quote. The value of a raw string literal is the
-string composed of the uninterpreted characters between the quotes;
+string composed of the uninterpreted (implicitly UTF-8-encoded) characters
+between the quotes;
in particular, backslashes have no special meaning and the string may
contain newlines.
Carriage returns inside raw string literals
quotes <code>""</code>. The text between the quotes,
which may not contain newlines, forms the
value of the literal, with backslash escapes interpreted as they
-are in character literals (except that <code>\'</code> is illegal and
-<code>\"</code> is legal). The three-digit octal (<code>\</code><i>nnn</i>)
+are in rune literals (except that <code>\'</code> is illegal and
+<code>\"</code> is legal), with the same restrictions.
+The three-digit octal (<code>\</code><i>nnn</i>)
and two-digit hexadecimal (<code>\x</code><i>nn</i>) escapes represent individual
<i>bytes</i> of the resulting string; all other escapes represent
the (possibly multi-byte) UTF-8 encoding of individual <i>characters</i>.
"日本語"
"\u65e5本\U00008a9e"
"\xff\u00FF"
+"\uD800" // illegal: surrogate half
+"\U00110000" // illegal: invalid Unicode code point
</pre>
<p>
<pre>
"日本語" // UTF-8 input text
`日本語` // UTF-8 input text as a raw literal
-"\u65e5\u672c\u8a9e" // The explicit Unicode code points
-"\U000065e5\U0000672c\U00008a9e" // The explicit Unicode code points
-"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // The explicit UTF-8 bytes
+"\u65e5\u672c\u8a9e" // the explicit Unicode code points
+"\U000065e5\U0000672c\U00008a9e" // the explicit Unicode code points
+"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // the explicit UTF-8 bytes
</pre>
<p>
If the source code represents a character as two code points, such as
a combining form involving an accent and a letter, the result will be
-an error if placed in a character literal (it is not a single code
+an error if placed in a rune literal (it is not a single code
point), and will appear as two code points if placed in a string
literal.
</p>
<h2 id="Constants">Constants</h2>
<p>There are <i>boolean constants</i>,
-<i>character constants</i>,
+<i>rune constants</i>,
<i>integer constants</i>,
<i>floating-point constants</i>, <i>complex constants</i>,
and <i>string constants</i>. Character, integer, floating-point,
<p>
A constant value is represented by a
-<a href="#Character_literals">character</a>,
+<a href="#Rune_literals">rune</a>,
<a href="#Integer_literals">integer</a>,
<a href="#Floating-point_literals">floating-point</a>,
<a href="#Imaginary_literals">imaginary</a>,
If <code>x</code> is of pointer or interface type and has the value
<code>nil</code>, assigning to, evaluating, or calling <code>x.f</code>
causes a <a href="#Run_time_panics">run-time panic</a>.
-</i>
+</li>
</ol>
<p>
respectively.
Except for shift operations, if the operands of a binary operation are
different kinds of untyped constants, the operation and, for non-boolean operations, the result use
-the kind that appears later in this list: integer, character, floating-point, complex.
+the kind that appears later in this list: integer, rune, floating-point, complex.
For example, an untyped integer constant divided by an
untyped complex constant yields an untyped complex constant.
</p>
const g = float64(2) >> 1 // illegal (float64(2) is a typed floating-point constant)
const h = "foo" > "bar" // h == true (untyped boolean constant)
const j = true // j == true (untyped boolean constant)
-const k = 'w' + 1 // k == 'x' (untyped character constant)
+const k = 'w' + 1 // k == 'x' (untyped rune constant)
const l = "hi" // l == "hi" (untyped string constant)
const m = string(k) // m == "x" (type string)
const Σ = 1 - 0.707i // (untyped complex constant)
<p>
Applying the built-in function <code>complex</code> to untyped
-integer, character, or floating-point constants yields
+integer, rune, or floating-point constants yields
an untyped complex constant.
</p>
to type <code>bool</code>, <code>rune</code>, <code>int</code>, <code>float64</code>,
<code>complex128</code> or <code>string</code>
respectively, depending on whether the value is a
-boolean, character, integer, floating-point, complex, or string constant.
+boolean, rune, integer, floating-point, complex, or string constant.
</p>
Calls to <code>Alignof</code>, <code>Offsetof</code>, and
<code>Sizeof</code> are compile-time constant expressions of type <code>uintptr</code>.
</p>
-<p>
<h3 id="Size_and_alignment_guarantees">Size and alignment guarantees</h3>