From: Sergey Matveev Date: Tue, 15 Apr 2025 08:24:43 +0000 (+0300) Subject: Design page is useless X-Git-Url: http://www.git.cypherpunks.su/?a=commitdiff_plain;h=7014cda81f62a4c41f45d29d4e73102d90b09fa88132724e990040b2c4ffa479;p=keks.git Design page is useless --- diff --git a/spec/comparison.texi b/spec/comparison.texi index ec18816..fe7539c 100644 --- a/spec/comparison.texi +++ b/spec/comparison.texi @@ -70,4 +70,6 @@ strict validation of deterministically encoded CBOR structures. @item Tagged string/integer can not be taken as a viable first-class bigint/datetime data support, because many decoders do not support tags and won't be able to interpret/validate them. +@item Non-string map keys very complicates representation process for +dynamically types languages. @end itemize diff --git a/spec/design.texi b/spec/design.texi deleted file mode 100644 index 57ff242..0000000 --- a/spec/design.texi +++ /dev/null @@ -1,76 +0,0 @@ -@node Design -@unnumbered Design - -@itemize - -@item -No ASCII decimal parsing. That is not trivial code, not fast, not -compact. Although it is human readable and understandable. - -@item -No varints (where most significant bit means continuation) and -zig-zag-like encoding. That is not trivial code. - -@item -No formats where maps and lists need to know their lengths/sizes in -advance. That means no streaming possibility. Complicates encoder and -requires more memory usage. - -@item -No formats without ability to store maps/dictionaries/tables. Of course -they can be emulated by reassembling lists, but that is manual action -after the codec did his job. - -@item -Differentiation of binary and human-readable strings (UTF-8 for example) -is a must for a format that is intended to be looked and analysed by a human. - -@item -No ISO-based (string) representation of datetime: it requires complex -parsing and takes much space. Naive UNIX timestamp representation raises -questions about its length and dealing with the dates before 1970. -Moreover they are not suitable for tasks requiring monotonous clocks, -because of UTC. - -@item -No tagging ability, context specifying, marking, hinting, extension -mechanism or anything like that. That brings complications to the state -and questions with unknown entities. Any unsupported data type must be a -string, possibly enveloped in a map with additional data. -@code{@{"cp": "koi8-r", "str": BIN(...)@}}. - -@item -Large (>2GiB) strings support is a must. Nowadays even a single -multimedia file can easily exceed that size. General-purpose codec -should be able to send it without complication of inventing your own -chunked format. - -@item -Is not embedded strings length, like in KEKS and CBOR, is a more -complicated code? Definitely. But there are so many short strings in a -schemaless format for specifying map/structure keys. So many algorithm -identifiers, that are also relatively short human-readable strings. So -that is a compromise between slightly larger code and much shorter -resulting structures. - -@item -There should be clear distinguishing of continuous strings and -streamable ones (BLOBs). ASN.1 CER does not do that, making -representation of every string in memory far from being convenient and -easy to work with. Different tasks have different constraints: many of -them do not need streamable strings at all, some of them may use it -solely. - -@item -Only string keys are allowed in maps, eliminating huge number of -problems when keys like int/bigint/float collide. - -@item -More compact strings encoding is more important than more compact -integers. Short strings are often used as a keys in maps and as -algorithm identifiers. Huge quantity of integers can be met only in a -specialised use-cases, where you will likely use own specialised format -with fixed integers for faster loading. For example in most cases -cryptography related tasks do not involve integers at all in their formats. - -@end itemize diff --git a/spec/index.texi b/spec/index.texi index af4ef55..91d41ef 100644 --- a/spec/index.texi +++ b/spec/index.texi @@ -44,13 +44,16 @@ requirements below. @item It @strong{should} be frugal to CPU usage for both performance/memory constrained and high data volume applications. +@item + It @strong{should} differentiate binary and human-readable strings. +@item + It @strong{would} be nice to have human-editable intermediate representation. @end itemize @include comparison.texi @insertcopying -@include design.texi @include install.texi @include encoding/index.texi @include schema/index.texi