From: Rob Pike Date: Tue, 19 Mar 2013 05:50:32 +0000 (-0700) Subject: doc/go1.1.html: document the surrogate and BOM changes X-Git-Tag: go1.1rc2~453 X-Git-Url: http://www.git.cypherpunks.su/?a=commitdiff_plain;h=b89a2bcf0186477b4f5070604920dfd156f50613;p=gostls13.git doc/go1.1.html: document the surrogate and BOM changes R=golang-dev, adg CC=golang-dev https://golang.org/cl/7853048 --- diff --git a/doc/go1.1.html b/doc/go1.1.html index 9312e69f94..694b164409 100644 --- a/doc/go1.1.html +++ b/doc/go1.1.html @@ -34,13 +34,14 @@ In Go 1.1, an integer division by constant zero is not a legal program, so it is

Changes to the implementations and tools

-
  • TODO: more
  • -
  • TODO: unicode: surrogate halves in compiler, libraries, runtime
  • +

    +TODO: more +

    Command-line flag parsing

    -In the gc toolchain, the compilers and linkers now use the +In the gc tool chain, the compilers and linkers now use the same command-line flag parsing rules as the Go flag package, a departure from the traditional Unix flag parsing. This may affect scripts that invoke the tool directly. @@ -82,6 +83,52 @@ would instead say: i := int(int32(x)) +

    Unicode

    + +

    +To make it possible to represent code points greater than 65535 in UTF-16, +Unicode defines surrogate halves, +a range of code points to be used only in the assembly of large values, and only in UTF-16. +The code points in that surrogate range are illegal for any other purpose. +In Go 1.1, this constraint is honored by the compiler, libraries, and run-time: +a surrogate half is illegal as a rune value, when encoded as UTF-8, or when +encoded in isolation as UTF-16. +When encountered, for example in converting from a rune to UTF-8, it is +treated as an encoding error and will yield the replacement rune, +utf8.RuneError, +U+FFFD. +

    + +

    +This program, +

    + +
    +import "fmt"
    +
    +func main() {
    +    fmt.Printf("%+q\n", string(0xD800))
    +}
    +
    + +

    +printed "\ud800" in Go 1.0, but prints "\ufffd" in Go 1.1. +

    + +

    +The Unicode byte order marks U+FFFE and U+FEFF, encoded in UTF-8, are now permitted as the first +character of a Go source file. +Even though their appearance in the byte-order-free UTF-8 encoding is clearly unnecessary, +some editors add them as a kind of "magic number" identifying a UTF-8 encoded file. +

    + +

    +Updating: +Most programs will be unaffected by the surrogate change. +Programs that depend on the old behavior should be modified to avoid the issue. +The byte-order-mark change is strictly backwards- compatible. +

    +

    Assembler

    @@ -127,7 +174,7 @@ package code.google.com/p/foo/quxx: cannot download, $GOPATH must not be set to

    The go fix command no longer applies fixes to update code from -before Go 1 to use Go 1 APIs. To update pre-Go 1 code to Go 1.1, use a Go 1.0 toolchain +before Go 1 to use Go 1 APIs. To update pre-Go 1 code to Go 1.1, use a Go 1.0 tool chain to convert the code to Go 1.0 first.

    @@ -176,7 +223,7 @@ The same is true of the other protocol-specific resolvers ResolveIPAddr The previous ListenUnixgram returned UDPConn as -arepresentation of the connection endpoint. The Go 1.1 implementation +a representation of the connection endpoint. The Go 1.1 implementation returns UnixConn to allow reading and writing with ReadFrom and WriteTo methods on the UnixConn. @@ -381,7 +428,7 @@ The new method os.FileMode.IsRegular
  • The regexp package -now supports Unix-original lefmost-longest matches through the +now supports Unix-original leftmost-longest matches through the Regexp.Longest method, while Regexp.Split slices