transparently.
@end itemize
+Non goal is: very fast machine friendly decoding capability. As a rule,
+that means non-compact data representation.
+
Are not there any satisfiable codecs?
-@multitable @columnfractions .30 .10 .10 .10 .10 .10 .10 .10
+@multitable @columnfractions .30 .05 .05 .05 .05 .05
@headitem name @tab
Schemaless @tab
Simple @tab
Deterministic @tab
Streamable @tab
- Compact @tab
- Large strings @tab
- Many types
-
-@item ASN.1 DER @tab N @tab @strong{N} @tab Y @tab N @tab ~ @tab Y @tab ~
-@item ASN.1 CER @tab N @tab @strong{N} @tab Y @tab Y @tab ~ @tab Y @tab ~
-@item @url{https://protobuf.dev/, Protocol Buffers}
- @tab N @tab ~ @tab N @tab N @tab Y @tab N @tab Y
-@item @url{https://flatbuffers.dev/, FlatBuffers}
- @tab Y @tab N @tab N @tab N @tab N @tab Y @tab Y
+ Compact
+
+@item ASN.1 @url{https://en.wikipedia.org/wiki/Distinguished_Encoding_Rules#DER_encoding, DER} @tab
+ N @tab @strong{N} @tab Y @tab N @tab N
+@item ASN.1 @url{https://en.wikipedia.org/wiki/Distinguished_Encoding_Rules#CER_encoding, CER} @tab
+ N @tab @strong{N} @tab Y @tab Y @tab N
+@item @url{https://datatracker.ietf.org/doc/html/rfc1014, XDR} @tab
+ N @tab Y @tab N @tab N @tab N
@item @url{https://bsonspec.org/, BSON} @tab
- Y @tab Y @tab N @tab N @tab N @tab N @tab Y
+ Y @tab Y @tab N @tab N @tab N
@item @url{https://msgpack.org/, MessagePack} @tab
- Y @tab Y @tab N @tab N @tab N @tab N @tab Y
+ Y @tab Y @tab N @tab N @tab Y
@item @url{https://datatracker.ietf.org/doc/html/rfc8949, CBOR} @tab
- Y @tab N @tab N @tab Y @tab Y @tab Y @tab Y
-@item Deterministic Encoded CBOR @tab
- Y @tab N @tab Y @tab N @tab Y @tab Y @tab Y
+ Y @tab N @tab N @tab Y @tab Y
+@item @url{https://datatracker.ietf.org/doc/html/draft-mcnally-deterministic-cbor-11, dCBOR} @tab
+ Y @tab @strong{N} @tab Y @tab N @tab Y
@item @url{http://cr.yp.to/proto/netstrings.txt, Netstrings} @tab
- Y @tab Y @tab Y @tab N @tab N @tab Y @tab N
+ Y @tab Y @tab Y @tab N @tab N
@item @url{https://wiki.theory.org/BitTorrentSpecification#Bencoding, Bencode} @tab
- Y @tab Y @tab Y @tab Y @tab N @tab Y @tab N
-@item @url{https://en.wikipedia.org/wiki/Canonical_S-expressions, Canonical S-expression}
- @tab Y @tab Y @tab Y @tab Y @tab N @tab Y @tab N
+ Y @tab Y @tab Y @tab Y @tab N
+@item @url{https://en.wikipedia.org/wiki/Canonical_S-expressions, Canonical S-expression} @tab
+ Y @tab Y @tab Y @tab Y @tab N
@item YAC @tab
- Y @tab Y @tab Y @tab Y @tab Y @tab Y @tab Y
+ Y @tab Y @tab Y @tab Y @tab Y
@end multitable
-@itemize
-@item Streamable formats allow you to send a part of the data
- immediately, for example element of the list or map. Simplifying
- encoder and requiring less memory usage. All formats who needs to
- know the size of maps/lists are not streamable.
-@item Compactness means small amount of bytes overhead for the given
- data. For example any codec with ASCII decimal lengths of the
- strings or integers representation is not compact.
-@item "Large strings" is a strings bigger than 4GiB. Some codecs allow
- you to send only even 2GiB of data in a single chunk. That will
- force you code and structures be more complex when dealing with big
- data transfer
-@item "Many types" is a subjective thing of course. If codec can encode
- everything JSON can, then it is enough types. ASN.1 codecs support
- many various types, but they can not represent arbitrary map.
-@item Hardly you will find CBOR libraries supporting strict validation
- of deterministically encoded CBOR structures.
-@end itemize
+@multitable @columnfractions .30 .05 .05 .05 .05 .05 .05
+
+@headitem name @tab
+ Large strings @tab
+ Human strings @tab
+ Integers @tab
+ Lists @tab
+ Structures @tab
+ Datetime
+
+@item ASN.1 DER @tab
+ Y @tab Y @tab Y @tab Y @tab Y @tab Y
+@item ASN.1 CER @tab
+ Y @tab Y @tab Y @tab Y @tab Y @tab Y
+@item XDR @tab
+ N @tab Y @tab Y @tab Y @tab Y @tab N
+@item BSON @tab
+ N @tab Y @tab Y @tab Y @tab Y @tab Y
+@item MessagePack @tab
+ N @tab Y @tab Y @tab Y @tab Y @tab N
+@item CBOR @tab
+ Y @tab Y @tab Y @tab Y @tab Y @tab N
+@item dCBOR @tab
+ Y @tab Y @tab Y @tab Y @tab Y @tab N
+@item Netstrings @tab
+ Y @tab N @tab N @tab N @tab N @tab N
+@item Bencode @tab
+ Y @tab N @tab Y @tab Y @tab Y @tab N
+@item CSExp @tab
+ Y @tab N @tab N @tab Y @tab N @tab N
+@item YAC @tab
+ Y @tab Y @tab Y @tab Y @tab Y @tab Y
+
+@end multitable
+
+But hardly you will find wide range of CBOR libraries supporting strict
+validation of deterministically encoded CBOR structures.
YAC deals with those problems by using only streaming deterministic
encoding. Its other important differences:
@insertcopying
+@include rationale.texi
@include install.texi
@include encoding/index.texi
@include schema.texi
--- /dev/null
+@node Rationale
+@unnumbered Rationale
+
+@itemize
+
+@item
+We do not want ASCII decimal parsing. This is not trivial and not very
+fast to load an integer. Although it is human readable and
+understandable. Also it is not compact.
+
+@item
+We do not want varints (where most significant bit means continuation)
+and zig-zag-like encoding. This is not trivial code, prohibiting fast
+integer load.
+
+@item
+We do not want formats where maps and lists need to know their
+lengths/sizes in advance. That means no streaming possibility. That
+complicates encoder and requires more memory usage. Containers can be
+terminated with explicit signal tag.
+
+@item
+We want formats with ability to store maps/dictionaries/tables. Of
+course they can be emulated by reassembling lists, but that is manual
+action after the codec did his job.
+
+@item
+Differentiation of binary and human-readable strings (UTF-8 for example)
+is a must for a format that is intended to be looked and analysed by a human.
+
+@item
+ISO-based (string) representation of data is a no: because it requires
+complex parsing and takes much space. Naive UNIX timestamp
+representation raises questions about its length and dealing with the
+dates before 1970. Moreover they are not suitable for tasks requiring
+monotonous clocks, because of UTC.
+
+@item
+No tagging ability, context specifying, marking, hinting, extension
+mechanism or anything like that. That brings huge complications to the
+state and questions when you do not know how to deal with unknown
+entities. Any unsupported data type must be a string, possibly enveloped
+in a map with additional data. @code{@{"cp": "koi8-r", "str": BIN(...)@}}.
+
+@item
+Large (>2GiB) strings support is a must. Nowadays even a single
+multimedia file can easily exceed that size. General-purpose codec must
+be able to send it without complication of inventing your own chunked
+format.
+
+@item
+Is not embedded strings length, like in YAC and CBOR, is a more
+complicated code? Definitely. But there are so many short strings in a
+schemaless format for specifying map/structure keys. So many algorithm
+identifiers, that are also relatively short human-readable strings. So
+that is a compromise between slightly larger code and much shorter
+resulting structures, that is worth of it.
+
+@item
+We want strong distinguishing of continuous strings and streamable ones
+(BLOBs). ASN.1 CER does not distinguish them, making representation of
+every string in memory far from being convenient and easy to work with.
+Different tasks have different constraints: many of them do not need
+streamable strings at all, some of them may use them solely. YAC gives
+flexibility in choosing necessary data type for your needs.
+
+@end itemize