gob: add an internal commentary example showing how the

author Rob Pike <r@golang.org>

Fri, 7 May 2010 18:44:41 +0000 (11:44 -0700)

committer Rob Pike <r@golang.org>

Fri, 7 May 2010 18:44:41 +0000 (11:44 -0700)
author Rob Pike <r@golang.org>
Fri, 7 May 2010 18:44:41 +0000 (11:44 -0700)
committer Rob Pike <r@golang.org>
Fri, 7 May 2010 18:44:41 +0000 (11:44 -0700)
diff --git a/src/pkg/gob/encode.go b/src/pkg/gob/encode.go

index fbea891b98855173a5b1187ee000ac0f211c87b2..6fd4c3be25a1b1a3146c3ab31bc5246f45f11ac0 100644 (file)
--- a/src/pkg/gob/encode.go
+++ b/src/pkg/gob/encode.go
@@ -2,8 +2,257 @@
  // Use of this source code is governed by a BSD-style
  // license that can be found in the LICENSE file.
  
+/*
+       The gob package manages streams of gobs - binary values exchanged between an
+       Encoder (transmitter) and a Decoder (receiver).  A typical use is transporting
+       arguments and results of remote procedure calls (RPCs) such as those provided by
+       package "rpc".
+
+       A stream of gobs is self-describing.  Each data item in the stream is preceded by
+       a specification of its type, expressed in terms of a small set of predefined
+       types.  Pointers are not transmitted, but the things they point to are
+       transmitted; that is, the values are flattened.  Recursive types work fine, but
+       recursive values (data with cycles) are problematic.  This may change.
+
+       To use gobs, create an Encoder and present it with a series of data items as
+       values or addresses that can be dereferenced to values.  (At the moment, these
+       items must be structs (struct, *struct, **struct etc.), but this may change.) The
+       Encoder makes sure all type information is sent before it is needed.  At the
+       receive side, a Decoder retrieves values from the encoded stream and unpacks them
+       into local variables.
+
+       The source and destination values/types need not correspond exactly.  For structs,
+       fields (identified by name) that are in the source but absent from the receiving
+       variable will be ignored.  Fields that are in the receiving variable but missing
+       from the transmitted type or value will be ignored in the destination.  If a field
+       with the same name is present in both, their types must be compatible. Both the
+       receiver and transmitter will do all necessary indirection and dereferencing to
+       convert between gobs and actual Go values.  For instance, a gob type that is
+       schematically,
+
+               struct { a, b int }
+
+       can be sent from or received into any of these Go types:
+
+               struct { a, b int }     // the same
+               *struct { a, b int }    // extra indirection of the struct
+               struct { *a, **b int }  // extra indirection of the fields
+               struct { a, b int64 }   // different concrete value type; see below
+
+       It may also be received into any of these:
+
+               struct { a, b int }     // the same
+               struct { b, a int }     // ordering doesn't matter; matching is by name
+               struct { a, b, c int }  // extra field (c) ignored
+               struct { b int }        // missing field (a) ignored; data will be dropped
+               struct { b, c int }     // missing field (a) ignored; extra field (c) ignored.
+
+       Attempting to receive into these types will draw a decode error:
+
+               struct { a int; b uint }        // change of signedness for b
+               struct { a int; b float }       // change of type for b
+               struct { }                      // no field names in common
+               struct { c, d int }             // no field names in common
+
+       Integers are transmitted two ways: arbitrary precision signed integers or
+       arbitrary precision unsigned integers.  There is no int8, int16 etc.
+       discrimination in the gob format; there are only signed and unsigned integers.  As
+       described below, the transmitter sends the value in a variable-length encoding;
+       the receiver accepts the value and stores it in the destination variable.
+       Floating-point numbers are always sent using IEEE-754 64-bit precision (see
+       below).
+
+       Signed integers may be received into any signed integer variable: int, int16, etc.;
+       unsigned integers may be received into any unsigned integer variable; and floating
+       point values may be received into any floating point variable.  However,
+       the destination variable must be able to represent the value or the decode
+       operation will fail.
+
+       Structs, arrays and slices are also supported.  Strings and arrays of bytes are
+       supported with a special, efficient representation (see below).
+
+       Interfaces, functions, and channels cannot be sent in a gob.  Attempting
+       to encode a value that contains one will fail.
+
+       The rest of this comment documents the encoding, details that are not important
+       for most users.  Details are presented bottom-up.
+
+       An unsigned integer is sent one of two ways.  If it is less than 128, it is sent
+       as a byte with that value.  Otherwise it is sent as a minimal-length big-endian
+       (high byte first) byte stream holding the value, preceded by one byte holding the
+       byte count, negated.  Thus 0 is transmitted as (00), 7 is transmitted as (07) and
+       256 is transmitted as (FE 01 00).
+
+       A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
+
+       A signed integer, i, is encoded within an unsigned integer, u.  Within u, bits 1
+       upward contain the value; bit 0 says whether they should be complemented upon
+       receipt.  The encode algorithm looks like this:
+
+               uint u;
+               if i < 0 {
+                       u = (^i << 1) | 1       // complement i, bit 0 is 1
+               } else {
+                       u = (i << 1)    // do not complement i, bit 0 is 0
+               }
+               encodeUnsigned(u)
+
+       The low bit is therefore analogous to a sign bit, but making it the complement bit
+       instead guarantees that the largest negative integer is not a special case.  For
+       example, -129=^128=(^256>>1) encodes as (01 82).
+
+       Floating-point numbers are always sent as a representation of a float64 value.
+       That value is converted to a uint64 using math.Float64bits.  The uint64 is then
+       byte-reversed and sent as a regular unsigned integer.  The byte-reversal means the
+       exponent and high-precision part of the mantissa go first.  Since the low bits are
+       often zero, this can save encoding bytes.  For instance, 17.0 is encoded in only
+       two bytes (40 e2).
+
+       Strings and slices of bytes are sent as an unsigned count followed by that many
+       uninterpreted bytes of the value.
+
+       All other slices and arrays are sent as an unsigned count followed by that many
+       elements using the standard gob encoding for their type, recursively.
+
+       Structs are sent as a sequence of (field number, field value) pairs.  The field
+       value is sent using the standard gob encoding for its type, recursively.  If a
+       field has the zero value for its type, it is omitted from the transmission.  The
+       field number is defined by the type of the encoded struct: the first field of the
+       encoded type is field 0, the second is field 1, etc.  When encoding a value, the
+       field numbers are delta encoded for efficiency and the fields are always sent in
+       order of increasing field number; the deltas are therefore unsigned.  The
+       initialization for the delta encoding sets the field number to -1, so an unsigned
+       integer field 0 with value 7 is transmitted as unsigned delta = 1, unsigned value
+       = 7 or (81 87).  Finally, after all the fields have been sent a terminating mark
+       denotes the end of the struct.  That mark is a delta=0 value, which has
+       representation (80).
+
+       The representation of types is described below.  When a type is defined on a given
+       connection between an Encoder and Decoder, it is assigned a signed integer type
+       id.  When Encoder.Encode(v) is called, it makes sure there is an id assigned for
+       the type of v and all its elements and then it sends the pair (typeid, encoded-v)
+       where typeid is the type id of the encoded type of v and encoded-v is the gob
+       encoding of the value v.
+
+       To define a type, the encoder chooses an unused, positive type id and sends the
+       pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
+       description, constructed from these types:
+
+               type wireType struct {
+                       s       structType;
+               }
+               type fieldType struct {
+                       name    string; // the name of the field.
+                       id      int;    // the type id of the field, which must be already defined
+               }
+               type commonType {
+                       name    string; // the name of the struct type
+                       id      int;    // the id of the type, repeated for so it's inside the type
+               }
+               type structType struct {
+                       commonType;
+                       field   []fieldType;    // the fields of the struct.
+               }
+
+       If there are nested type ids, the types for all inner type ids must be defined
+       before the top-level type id is used to describe an encoded-v.
+
+       For simplicity in setup, the connection is defined to understand these types a
+       priori, as well as the basic gob types int, uint, etc.  Their ids are:
+
+               bool            1
+               int             2
+               uint            3
+               float           4
+               []byte          5
+               string          6
+               wireType        7
+               structType      8
+               commonType      9
+               fieldType       10
+
+       In summary, a gob stream looks like
+
+               ((-type id, encoding of a wireType)* (type id, encoding of a value))*
+
+       where * signifies zero or more repetitions and the type id of a value must
+       be predefined or be defined before the value in the stream.
+*/
  package gob
  
+/*
+       For implementers and the curious, here is an encoded example.  Given
+               type Point {x, y int}
+       and the value
+               p := Point{22, 33}
+       the bytes transmitted that encode p will be:
+               1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00 01 02 01 01 78
+               01 04 00 01 01 79 01 04 00 00 00 07 ff 82 01 2c 01 42 00 07 ff
+               82 01 2c 01 42 00
+       They are determined as follows.
+
+       Since this is the first transmission of type Point, the type descriptor
+       for Point itself must be sent before the value.  This is the first type
+       we've sent on this Encoder, so it has type id 65 (0 through 64 are
+       reserved).
+
+               1f      // This item (a type descriptor) is 31 bytes long.
+               ff 81   // The negative of the id for the type we're defining, -65.
+                       // This is one byte (indicated by FF = ^-1) followed by
+                       // ^-65<<1 | 1.  The low 1 bit signals to complement the
+                       // rest upon receipt.
+
+               // Now we send a type descriptor, which is itself a struct (wireType).
+               // The type of wireType itself is known (it's built in, as is the type of
+               // all its components), so we just need to send a *value* of type wireType
+               // that represents type "Point".
+               // Here starts the encoding of that value.
+               // Set the field number implicitly to zero; this is done at the beginning
+               // of every struct, including nested structs.
+               03      // Add 3 to field number; now 3 (wireType.structType; this is a struct).
+                       // structType starts with an embedded commonType, which appears
+                       // as a regular structure here too.
+               01      // add 1 to field number (now 1); start of embedded commonType.
+               01      // add one to field number (now 1, the name of the type)
+               05      // string is (unsigned) 5 bytes long
+               50 6f 69 6e 74  // wireType.structType.commonType.name = "Point"
+               01      // add one to field number (now 2, the id of the type)
+               ff 82   // wireType.structType.commonType._id = 65
+               00      // end of embedded wiretype.structType.commonType struct
+               01      // add one to field number (now 2, the Field array in wireType.structType)
+               02      // There are two fields in the type (len(structType.field))
+               01      // Start of first field structure; add 1 to get field number 1: field[0].name
+               01      // 1 byte
+               78      // structType.field[0].name = "x"
+               01      // Add 1 to get field number 2: field[0].id
+               04      // structType.field[0].typeId is 2 (signed int).
+               00      // End of structType.field[0]; start structType.field[1]; set field number to 0.
+               01      // Add 1 to get field number 1: field[1].name
+               01      // 1 byte
+               79      // structType.field[1].name = "y"
+               01      // Add 1 to get field number 2: field[0].id
+               04      // struct.Type.field[1].typeId is 2 (signed int).
+               00      // End of structType.field[1]; end of structType.field.
+               00      // end of wireType.structType structure
+               00      // end of wireType structure
+
+       Now we can send the Point value.  Again the field number resets to zero:
+
+               07 // this value is 7 bytes long
+               ff 82 // the type number, 65 (1 byte (-FF) followed by 65<<1)
+               01 // add one to field number, yielding field 1
+               2c // encoding of signed "22" (0x22 = 44 = 22<<1); Point.x = 22
+               01 // add one to field number, yielding field 2
+               42 // encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
+               00 // end of structure
+
+       The type encoding is long and fairly intricate but we send it only once.
+       If p is transmitted a second time, the type is already known so the
+       output will be just:
+
+               07 ff 82 01 2c 01 42 00
+*/
+
  import (
         "bytes"
         "io"
diff --git a/src/pkg/gob/encoder.go b/src/pkg/gob/encoder.go

index d65a710802f689cef2efe5406533a61881e03d9f..308c58d303b8e11f10151ce2a28217ef64a0491c 100644 (file)
--- a/src/pkg/gob/encoder.go
+++ b/src/pkg/gob/encoder.go
@@ -2,182 +2,6 @@
  // Use of this source code is governed by a BSD-style
  // license that can be found in the LICENSE file.
  
-/*
-       The gob package manages streams of gobs - binary values exchanged between an
-       Encoder (transmitter) and a Decoder (receiver).  A typical use is transporting
-       arguments and results of remote procedure calls (RPCs) such as those provided by
-       package "rpc".
-
-       A stream of gobs is self-describing.  Each data item in the stream is preceded by
-       a specification of its type, expressed in terms of a small set of predefined
-       types.  Pointers are not transmitted, but the things they point to are
-       transmitted; that is, the values are flattened.  Recursive types work fine, but
-       recursive values (data with cycles) are problematic.  This may change.
-
-       To use gobs, create an Encoder and present it with a series of data items as
-       values or addresses that can be dereferenced to values.  (At the moment, these
-       items must be structs (struct, *struct, **struct etc.), but this may change.) The
-       Encoder makes sure all type information is sent before it is needed.  At the
-       receive side, a Decoder retrieves values from the encoded stream and unpacks them
-       into local variables.
-
-       The source and destination values/types need not correspond exactly.  For structs,
-       fields (identified by name) that are in the source but absent from the receiving
-       variable will be ignored.  Fields that are in the receiving variable but missing
-       from the transmitted type or value will be ignored in the destination.  If a field
-       with the same name is present in both, their types must be compatible. Both the
-       receiver and transmitter will do all necessary indirection and dereferencing to
-       convert between gobs and actual Go values.  For instance, a gob type that is
-       schematically,
-
-               struct { a, b int }
-
-       can be sent from or received into any of these Go types:
-
-               struct { a, b int }     // the same
-               *struct { a, b int }    // extra indirection of the struct
-               struct { *a, **b int }  // extra indirection of the fields
-               struct { a, b int64 }   // different concrete value type; see below
-
-       It may also be received into any of these:
-
-               struct { a, b int }     // the same
-               struct { b, a int }     // ordering doesn't matter; matching is by name
-               struct { a, b, c int }  // extra field (c) ignored
-               struct { b int }        // missing field (a) ignored; data will be dropped
-               struct { b, c int }     // missing field (a) ignored; extra field (c) ignored.
-
-       Attempting to receive into these types will draw a decode error:
-
-               struct { a int; b uint }        // change of signedness for b
-               struct { a int; b float }       // change of type for b
-               struct { }                      // no field names in common
-               struct { c, d int }             // no field names in common
-
-       Integers are transmitted two ways: arbitrary precision signed integers or
-       arbitrary precision unsigned integers.  There is no int8, int16 etc.
-       discrimination in the gob format; there are only signed and unsigned integers.  As
-       described below, the transmitter sends the value in a variable-length encoding;
-       the receiver accepts the value and stores it in the destination variable.
-       Floating-point numbers are always sent using IEEE-754 64-bit precision (see
-       below).
-
-       Signed integers may be received into any signed integer variable: int, int16, etc.;
-       unsigned integers may be received into any unsigned integer variable; and floating
-       point values may be received into any floating point variable.  However,
-       the destination variable must be able to represent the value or the decode
-       operation will fail.
-
-       Structs, arrays and slices are also supported.  Strings and arrays of bytes are
-       supported with a special, efficient representation (see below).
-
-       Interfaces, functions, and channels cannot be sent in a gob.  Attempting
-       to encode a value that contains one will fail.
-
-       The rest of this comment documents the encoding, details that are not important
-       for most users.  Details are presented bottom-up.
-
-       An unsigned integer is sent one of two ways.  If it is less than 128, it is sent
-       as a byte with that value.  Otherwise it is sent as a minimal-length big-endian
-       (high byte first) byte stream holding the value, preceded by one byte holding the
-       byte count, negated.  Thus 0 is transmitted as (00), 7 is transmitted as (07) and
-       256 is transmitted as (FE 01 00).
-
-       A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
-
-       A signed integer, i, is encoded within an unsigned integer, u.  Within u, bits 1
-       upward contain the value; bit 0 says whether they should be complemented upon
-       receipt.  The encode algorithm looks like this:
-
-               uint u;
-               if i < 0 {
-                       u = (^i << 1) | 1       // complement i, bit 0 is 1
-               } else {
-                       u = (i << 1)    // do not complement i, bit 0 is 0
-               }
-               encodeUnsigned(u)
-
-       The low bit is therefore analogous to a sign bit, but making it the complement bit
-       instead guarantees that the largest negative integer is not a special case.  For
-       example, -129=^128=(^256>>1) encodes as (01 82).
-
-       Floating-point numbers are always sent as a representation of a float64 value.
-       That value is converted to a uint64 using math.Float64bits.  The uint64 is then
-       byte-reversed and sent as a regular unsigned integer.  The byte-reversal means the
-       exponent and high-precision part of the mantissa go first.  Since the low bits are
-       often zero, this can save encoding bytes.  For instance, 17.0 is encoded in only
-       two bytes (40 e2).
-
-       Strings and slices of bytes are sent as an unsigned count followed by that many
-       uninterpreted bytes of the value.
-
-       All other slices and arrays are sent as an unsigned count followed by that many
-       elements using the standard gob encoding for their type, recursively.
-
-       Structs are sent as a sequence of (field number, field value) pairs.  The field
-       value is sent using the standard gob encoding for its type, recursively.  If a
-       field has the zero value for its type, it is omitted from the transmission.  The
-       field number is defined by the type of the encoded struct: the first field of the
-       encoded type is field 0, the second is field 1, etc.  When encoding a value, the
-       field numbers are delta encoded for efficiency and the fields are always sent in
-       order of increasing field number; the deltas are therefore unsigned.  The
-       initialization for the delta encoding sets the field number to -1, so an unsigned
-       integer field 0 with value 7 is transmitted as unsigned delta = 1, unsigned value
-       = 7 or (81 87).  Finally, after all the fields have been sent a terminating mark
-       denotes the end of the struct.  That mark is a delta=0 value, which has
-       representation (80).
-
-       The representation of types is described below.  When a type is defined on a given
-       connection between an Encoder and Decoder, it is assigned a signed integer type
-       id.  When Encoder.Encode(v) is called, it makes sure there is an id assigned for
-       the type of v and all its elements and then it sends the pair (typeid, encoded-v)
-       where typeid is the type id of the encoded type of v and encoded-v is the gob
-       encoding of the value v.
-
-       To define a type, the encoder chooses an unused, positive type id and sends the
-       pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
-       description, constructed from these types:
-
-               type wireType struct {
-                       s       structType;
-               }
-               type fieldType struct {
-                       name    string; // the name of the field.
-                       id      int;    // the type id of the field, which must be already defined
-               }
-               type commonType {
-                       name    string; // the name of the struct type
-                       id      int;    // the id of the type, repeated for so it's inside the type
-               }
-               type structType struct {
-                       commonType;
-                       field   []fieldType;    // the fields of the struct.
-               }
-
-       If there are nested type ids, the types for all inner type ids must be defined
-       before the top-level type id is used to describe an encoded-v.
-
-       For simplicity in setup, the connection is defined to understand these types a
-       priori, as well as the basic gob types int, uint, etc.  Their ids are:
-
-               bool            1
-               int             2
-               uint            3
-               float           4
-               []byte          5
-               string          6
-               wireType        7
-               structType      8
-               commonType      9
-               fieldType       10
-
-       In summary, a gob stream looks like
-
-               ((-type id, encoding of a wireType)* (type id, encoding of a value))*
-
-       where * signifies zero or more repetitions and the type id of a value must
-       be predefined or be defined before the value in the stream.
-*/
  package gob
  
  import (
author	Rob Pike <r@golang.org>
	Fri, 7 May 2010 18:44:41 +0000 (11:44 -0700)
committer	Rob Pike <r@golang.org>
	Fri, 7 May 2010 18:44:41 +0000 (11:44 -0700)
src/pkg/gob/encode.go		patch \| blob \| history
src/pkg/gob/encoder.go		patch \| blob \| history