From cfc44e995d921e4c7b457a139ba013cb0e118d2b057cca60a3565419c044e597 Mon Sep 17 00:00:00 2001 From: Sergey Matveev Date: Wed, 13 Nov 2024 13:45:21 +0300 Subject: [PATCH] More marketing --- spec/{rationale.texi => design.texi} | 16 ++++- spec/index.texi | 102 +++++++++++++++------------ 2 files changed, 69 insertions(+), 49 deletions(-) rename spec/{rationale.texi => design.texi} (80%) diff --git a/spec/rationale.texi b/spec/design.texi similarity index 80% rename from spec/rationale.texi rename to spec/design.texi index 1c091e0..9c62fef 100644 --- a/spec/rationale.texi +++ b/spec/design.texi @@ -1,5 +1,5 @@ -@node Rationale -@unnumbered Rationale +@node Design +@unnumbered Design @itemize @@ -61,4 +61,16 @@ easy to work with. Different tasks have different constraints: many of them do not need streamable strings at all, some of them may use it solely. +@item +Only string keys are allowed in maps, eliminating huge number of +problems when keys like int/bigint/float collide. + +@item +More compact strings encoding is more important than more compact +integers. Short strings are often used as a keys in maps and as +algorithm identifiers. Huge quantity of integers can be met only in very +specialised use-cases, where you will likely use own specialised format +with fixed integers for faster loading. For example in most cases +cryptography related tasks do not involve integers at all in their formats. + @end itemize diff --git a/spec/index.texi b/spec/index.texi index b04c87f..63dce6f 100644 --- a/spec/index.texi +++ b/spec/index.texi @@ -9,27 +9,41 @@ Copyright @copyright{} 2024 @email{stargrave@@stargrave.org, Sergey Matveev} @top YAC YAC (Yet Another Codec) -- yet another format for binary reprensentation -of structured data. But why!? +of structured data. But why!? Because there is no satisfiable codec for +all set of requirements below. @itemize -@item It must be schema-less format. Schema-aware ones have their - definite valuable advantages, but also a complication drawbacks. -@item Its encoder/decoder must be very compact and small in terms of - code amount, to reduce attack surface on the codec itself. -@item It must be frugal to CPU usage for both performance/memory - constrained and high data volume applications. -@item Its encoding must be reasonably compact, to be friendly to storage - space constrained systems. -@item Its encoding must be deterministic -- there must be only a single - representation of the structured data, allowing its usage in - cryptography-related contexts. -@item It should support enough data types for being able to replace JSON - transparently. +@item +It @strong{must} be schema-less format. Schema-aware ones have +their definite valuable advantages, but also a complication drawbacks +and non-friendliness to humans. + +@item +Its encoder/decoder @strong{must} be very compact and small in terms of +code and branches amount, to reduce attack surface on the codec itself. + +@item +It @strong{must} support enough data types for being able at +least to replace JSON transparently. + +@item +Its encoding @strong{must} be deterministic -- there must be only a +single representation of the structured data, allowing its usage in +cryptography-related contexts. + +@item +Its encoder @strong{should} be streaming-friendly, making encoder +simpler and allowing memory-constrained systems workability. + +@item +Its encoding @strong{should} be reasonably compact, to be friendly to +storage space constrained systems. + +@item +It @strong{should} be frugal to CPU usage for both performance/memory +constrained and high data volume applications. @end itemize -Non goal is: very fast machine friendly decoding capability. As a rule, -that means non-compact data representation. - Are not there any satisfiable codecs? @multitable @columnfractions .30 .05 .05 .05 .05 .05 @@ -42,6 +56,8 @@ Are not there any satisfiable codecs? N @tab @strong{N} @tab Y @tab Y @tab N @item @url{https://datatracker.ietf.org/doc/html/rfc1014, XDR} @tab N @tab Y @tab N @tab N @tab N +@item @url{https://www.JSON.org/json-en.html, JSON} @tab + Y @tab N @tab N @tab Y @tab N @item @url{https://bsonspec.org/, BSON} @tab Y @tab Y @tab N @tab N @tab N @item @url{https://msgpack.org/, MessagePack} @tab @@ -61,58 +77,50 @@ Are not there any satisfiable codecs? @end multitable -@multitable @columnfractions .30 .05 .05 .05 .05 .05 .05 +@multitable @columnfractions .20 .05 .05 .05 .05 .05 .05 .05 .05 -@headitem @tab Large strings @tab Human strings @tab Integers @tab Lists @tab Structures @tab Datetime +@headitem @tab Big str @tab Bin str @tab Human str @tab Ints @tab Bigints @tab Lists @tab Structures @tab Datetime @item ASN.1 DER @tab - Y @tab Y @tab Y @tab Y @tab Y @tab Y + Y @tab Y @tab Y @tab Y @tab Y @tab Y @tab Y @tab Y @item ASN.1 CER @tab - Y @tab Y @tab Y @tab Y @tab Y @tab Y + Y @tab Y @tab Y @tab Y @tab Y @tab Y @tab Y @tab Y @item XDR @tab - N @tab Y @tab Y @tab Y @tab Y @tab N + N @tab Y @tab Y @tab Y @tab N @tab Y @tab Y @tab N +@item JSON @tab + Y @tab N @tab Y @tab Y @tab Y @tab Y @tab Y @tab N @item BSON @tab - N @tab Y @tab Y @tab Y @tab Y @tab Y + N @tab Y @tab Y @tab Y @tab N @tab Y @tab Y @tab Y @item MessagePack @tab - N @tab Y @tab Y @tab Y @tab Y @tab N + N @tab Y @tab Y @tab Y @tab N @tab Y @tab Y @tab N @item CBOR @tab - Y @tab Y @tab Y @tab Y @tab Y @tab N + Y @tab Y @tab Y @tab Y @tab N @tab Y @tab Y @tab N @item dCBOR @tab - Y @tab Y @tab Y @tab Y @tab Y @tab N + Y @tab Y @tab Y @tab Y @tab N @tab Y @tab Y @tab N @item Netstrings @tab - Y @tab N @tab N @tab N @tab N @tab N + Y @tab Y @tab N @tab N @tab N @tab N @tab N @tab N @item Bencode @tab - Y @tab N @tab Y @tab Y @tab Y @tab N + Y @tab Y @tab N @tab Y @tab Y @tab Y @tab Y @tab N @item CSExp @tab - Y @tab N @tab N @tab Y @tab N @tab N + Y @tab Y @tab N @tab N @tab N @tab Y @tab N @tab N @item YAC @tab - Y @tab Y @tab Y @tab Y @tab Y @tab Y + Y @tab Y @tab Y @tab Y @tab Y @tab Y @tab Y @tab Y @end multitable -But hardly you will find wide range of CBOR libraries supporting strict -validation of deterministically encoded CBOR structures. - -YAC deals with those problems by using only streaming deterministic -encoding. Its other important differences: +Note about CBOR: @itemize -@item It lacks any kind of tagging support, making the whole codec - simpler and less cumbersome when someone does not know how to deal - with the exact tag. -@item MAP allows only string keys with strict length-first ordering, - eliminating questions about dealing with duplicate keys and making - code simpler. -@item It has TAI64-based datetime representation as a base primitive - type. No more different and not always satisfying encodings. -@item BLOB data type gives ability to transfer huge chunks of binary - data in a streaming way. But unlike ASN.1 CER, you still can use - continuous strings representation. +@item Hardly you will find wide range of CBOR libraries supporting +strict validation of deterministically encoded CBOR structures. +@item I can not take tagged string/integer as a viable first-class +bigint/datetime data support, because many decoders do not support tags +and won't be able to interpret/validate them. @end itemize @insertcopying -@include rationale.texi +@include design.texi @include install.texi @include encoding/index.texi @include schema.texi -- 2.48.1