Skip to content

Latest commit

 

History

History
1689 lines (1406 loc) · 64.5 KB

diary.md

File metadata and controls

1689 lines (1406 loc) · 64.5 KB

For an earlier diary not as well kept view diary.md. In the same tree you will find redux.md which is this document moved over it. Removed the old diary.md to replace it with this diary which covers the development of the syntax bashed definition language. Removed since I know my way around this diary and most of the older diary discusses the Perl pack inspired language.


Thu Nov 4 13:38:24 CDT 2021

Beginning to see that it is all pretty much covered using inlines and accumulators so it may be worth while to consider how you might name those as a more fundimental type. Inlines is probably fine, but accumulators, we could just call them variables and talk about their limitations. Perhaps $$ is a variable stack, because are you not already using $$ for something stack like anyway?

Want to note that I'm still wondering about transforms. Perhaps we can talk about the value of transformation without providing a generator for it. Inlines are nice but there are times when you should just adjust your structure before it goes into the serializer. I mean, if people can accept this, as they should, then we can have a simpiler module.

Also, reading some Flux documentation where someone is saying something won't be implemented, basta, and really you should explain why, or try to make the case that Flux is more like git. A step learning curve usually just means that their is a tree structure involved or something that is really not that hard to grasp.

Wed Nov 3 16:01:24 CDT 2021

We're starting to get into the weeds with this now. I'm wondering about literals, we want to have them parsed so we can assert their value, but do we want them to become part of the object? Can we have a way to get the value and perform an assertion, but not create a property in the object? Should this just be an option of the parser specified by, perhaps, adding Error to the definition? Would be more consistent to pass it to the user, so maybe we add 'null' to the definition or maybe we can add a parse side only function to the definition.

Fri Oct 29 02:07:47 CDT 2021

In split conditionals parse comes before serialize, because Packet is a packet parser, but when I write one I assume serialize comes first because you have to serialize a packet in order to have something parsed.

UPDATE: You goof. It is serialize first.

Fri Oct 29 00:18:34 CDT 2021

Need a way to have a separate parse and serialize path, but we currently do not have any good way of saying this except to create a two conditionals.

Something that is simply two arrays is ambiguous.

const definition = {
    object: [[ 16 ], [ 8 ]]
}

We should allow an always true split conditional and just put it inline.

The specific problem I'm facing is an MQTT header which has a remaining length variable. We get this by first serializing the variable part of the message, then formatting a header that has the remaining length. This means two separate serializes, but we can combine them into a parser. The variable serialize simply writes the remaining buffer calculating the length by getting the payload length, not writing a length encoding. When parsing we get the length by subtracting an offset from the header length property.

const definition = {
    object: [
        [ true, [ $ => $.header.length - 8, [ Buffer ] ],
        [ true, [ $ => $.body.length, [ Buffer ] ]
    ]
}

Fri Oct 29 00:11:40 CDT 2021

The suggested language for limits will not work. We already use the array wrapping for calculated lengths which are already proving useful.

However, a calculated length takes this form.

const definition = {
    object: [[ $ => $.header.length ], [ 8 ]]
}

The second element is always an array with a single element.

Provided there is no other root language element that is an array with a single element, they all seem to be arrays with multiple elements, we can define a test for a function followed by an array with a single element. A switch body will always have elements divisible by 2.

What is needed is a language.js file that has an example of everything, or perhaps a language.md and you can scan it for space in the language.

You should allow yourself language hacks, something like.

const definition = {
    object: { hack: 'length', get: $ => $.header.length }
}

So you can start work on the feature and see if it does or does not make life easier.

TODO Create an issue and then remove this comment.

Furthermore, some thoughts on the implementation.

We will only implement the length checking on the parser side, initially at least. It is assumed that the application is not trying to resource starve itself, or that that would be a programming error and would be resolved by debugging and testing, whereas someone could feed our parser with an endless terminated array, making for a simple DDoS attack.

We can use the existing checkpoint logic to generate checkpoints that instead of breaking into an incremental parser, raise an exception.

An ugly rough draft of the incremental parser would add a test to every one of the parsing looks that compares the $offset to a limit. Then a wiser implementation would skip this test for fixed length fields checking that there are bytes enough remaining, smarter still and we check once for a sum of a series of fixed width fields and the smartest of them all would adjust the limit decreasing it by the number of bytes necessary to contain the smallest possible remaining packet, a sum of fixed width fields, encodings, terminators, the smallest option in a conditional, etc. Note that this smallest field can be calculated in the language and made a property of the AST and you can sum the remaining entries in the tree as you process, but it may mean now that we have to return up the tree. Or, no, we can pass a parent minimum as we descend the tree.

Wed Sep 29 02:52:53 CDT 2021

Limiting parsers. Would like to limit the length of a parser so it will stop and we already have length checks, but how would use the language to limit?

const definition = {
    object: [[ $ => $.header.length ], {
    }]
}

The mnemonic could be that we are using the array to represent a constraint. This would require a whole bunch of new code generation, though. We can start by seeing if it fits in the language or creates ambiguities.

Wed Sep 29 02:49:59 CDT 2021

Partial arguments would allow for the reuse of the variable MQTT body in a whole parser.

const definition = {
    $object: [{ counter: [ 0 ] }, [[[
        ({ $start, $end, counter }) => counter[0] += $end - $start
    ]], {
        number: 8,
        string: [ [ 8 ], 0x0 ]
    }]],
    composed: {
        value: 8,
        object: [ '$object', $ => $.value ]
    }
}

So an array with an accumulator as a first argument is a parameterized accumulator, which also defined parameters, and we can pass those parameters using functions or constants, but probably functions.

Sat Sep 25 04:01:30 CDT 2021

const definition = {
    '$mqtt': [
        ({ $_, $count, $buffer, $start }) => {
            const value = $_ >>> (7 * $count) & 0x7f
            if ($count == 0 || value != 0) {
                return value
            }
            return 256
        },
        () => {
            let multiplier = 1, value = 0
            return function ($buffer, $start) => {
                const bite = $buffer[$start]
                value += (bite & 0x7f) * multiplier
                if (multiplier > 128 * 128 * 128) {
                    throw new Error
                }
                return bite & 128 ? null : value
            }
        }
    ]
}

Idea for an encoding that goes byte by byte so I don't have to write a loop declaratively, which is ugly. Could add an initial function that is sizeof.

Okay, if there are no arguments, then we are returning a function. This allows for the initialization of any local variables and we can detect this from the argument parsing, and invoke it to get the inner function.

The user may perhaps want an accumulator passed to the function builder function at some point, but let's hope not, because then we are going to have a much harder time getting to that inner function.

const definition = {
    '$mqtt': [
        $_ => {
            return [
                ($buffer, $start) => {
                    bite = $_ & 0x7f
                    $_ = $_ >>> 7
                    return bite
                },
                ($buffer, $start) => {
                    if ($_ == 0) {
                        return 256
                    }
                    bite = $_ & 0x7f
                    $_ = $_ >>> 7
                    return bite
                },
            ]
        },
        () => {
            let multiplier = 1, value = 0
            return function ($buffer, $start) => {
                const bite = $buffer[$start]
                value += (bite & 0x7f) * multiplier
                if (multiplier > 128 * 128 * 128) {
                    throw new Error
                }
                return bite & 128 ? null : value
            }
        }
    ]
}

Look how ugly it gets, though. Trying to get rid of that test for the first $count. Would only solve one problem encountered with MQTT at this point, but perhaps also a UTF8 encoding.

Sat Sep 25 03:33:00 CDT 2021

Revisiting thoughts on sizeof.

Seems that asynchronous serialization works as expected. The sizeof function presents a problem in that any transforms will have to be run, then when we serialize they will be run again.

I'm at the point where this simply needs to be documented and someone will have to decide for themselves if this is what they want to have happen. If not, then can provide an object with the conversion already performed. That's somewhat disappointing. The transforms are clever. Wish they would always Just Work™.

The idea of having a transform step that creates an intermediate object is a no go. The checksum accumulator wouldn't work, because you need to transform the checksum value based on the buffered writes.

Synchronous writes could be implemented with the incremental serializer, but then you never have a use for the whole serializer.

We could create an intermediate object that transforms only those entities that need to be transformed, but do you run assertions in that case?

Seems that sizeof needs to have caveats and limitations. It also seems that there are only a handful of cases where it might matter.

Also, we could do this...

const definition = {
    value [[
        [ $_ => Buffer.byteLength($_), $_ => Buffer.from($_) ]
    ], [ 32, [ Buffer ]], [
        $_ => $_.toString()
    ]
}

const object = {
    value: 'string'
}

But that makes me wonder why we can't copy the buffer ourselves. That is, we're creating a buffer, then copying the buffer. Wasted step. Obviously because we can't easily define a function that would work incrementally, and we're moving to just having a bunch of functions we call.

But anyway, in the above we've combined a sizeof function with a transform function.

Can't think ahead to what it means to create an intermediate object where we run some transforms and not others. Did have this in mind though...

const definition = {
    value [[
        ($_ = []) => Buffer.from('x' + $_)
    ], [ 32, [ Buffer ]], [
        $_ => $_.toString().substring(1)
    ]
}

const object = {
    value: 'string'
}

The [] means that this creates a variable length object, an array or a buffer, and that you need to run it to determine the size, you can't use the underlying object to determine the size.

Again, not sure what it means to have some functions run in sizeof or a pre-transform while others do not. What if the assertion is necessary to perform the conversion for sizeof?

Maybe the intermediate object is explained to the user a default of {} is used to indicate that it needs to be a part of the intermediate object?

const definition = {
    value [[
        ($_ = {}) => assert(/^\d+$/.test($_)),
        ($_ = {}) => Buffer.from($_)
    ], [ 32, [ Buffer ]], [
        $_ => $_.toString()
    ]
}

const object = {
    value: 'string'
}

This might be the least worst idea you've had so far.

Regarding this use case, which is probably common, encoding UTF8.

const definition = {
    lengthEncoded: [ 32, [ 'utf8' ] ],
    terminated: [ [ 'utf8' ], 0x0 ]
}

However the Buffer.write() method will not write a partial character. We would have to serialize UTF8 ourselves, probably very slow, or else find the character that was cut off, but that would also be very slow. Also, we need to know the byte length to check to see if the write was truncated. Nothing about this is cheap. Copy is probably cheaper.

We can use write in a whole serialize, though, and in best foot forward, so if we have an internal implementation, yes, we can use write for the best case, we can create buffers for the incremental parser.

Okay, that's worth doing.

Wed Sep 22 21:20:27 CDT 2021

Regarding offsetof. Could be a function that calculates on an object or it could be an class that takes a transformed object and calculates if necessary, so getters, or has static values. Creating the object assigns the values. If the values are all constant then we could have a single object, or the constant values could be in a superclass.

Could make the user request specific offsets.

({ value, $offset: { header: { encoding } } }) => {}

It would also be useful to have $size in this case which may make doing goofy length encodings easier.

The parser based solution means more parsing, but we can insist that the contents of the $offset destructuring be kept simple, no defaults, they wouldn't be honored.

This parser based solution also requires a long hard think about what to do if the value is nested in an array. Any solution requires a long hard think about what to do if the value is nested in an array. It may make the offset class expensive to generate and calculated array getters require Proxy.

({ value, $offset: { header: { encoding } } }) => {}

That creates a path with a depth and we can assume that we're not looking inside a previous object, so we can avoid creating a complete path language and just move upward or else...

Yeah, imagine a header for the root packet and a header in each object, which one wins. We'd have to do some sort of pattern matching. No full paths without array indices.

({ value, $offset: { header: { encoding }, array: { header: { encoding } } } }) => {}

An object, well, imagine we have an array of 1000 items to serialize, we'd have to create 1000 entries in an array of offsets. We could assume that it would be used once for each entry in the array.

({ value, $offset, $i }) => {
    return $offset.array[$[0]].value - $offset.array[$i[0]].header.encoding
}

This generates code that never gets executed though. If there is an array and the offsets within are not referenced there is a getter that does not get covered in the unit tests.

({ value, $offset, $i }) => {
    return $offset('value') - $offset('header.encoding')
}

And now we've introduced a path language.

({ value, $offset, $i }) => {
    return $offset('value') - $offset('array', $i[0], 'header', 'encoding')
}

Could make that move faster using maps to get values, so that if the first three fields are constant you can just pull the it out of map. You could get this unit tested by calling the $offsetof mechanics independently. You could specify which offsets you need in the definition language.

Or you could use an accumulator to snag the offset as needed, which we can use as an example in the interim.

You really need to implement like a dozen protocols before you try to optimize the language.

Thu Sep 22 12:59:14 CDT 2021

Transforms are disturbing my assumptions about performance. The sizeof functionality is important because synchronous serialization is probably the norm, but perhaps that is a mistake. Perhaps it should always be incremental.

If you do transforms on arrays, it is necessary to run the transform to determine the size of the array. It could be the case that the transform does not alter the length of the array, but it is more likely that the transform will surprise users when they transform a string to a UTF8 buffer, their tests will pass and their code will work in production until they serialize an emoji.

A buffer transform is relatively expensive and should not be performed twice.

We do not have to duplicate any tests that perform assertions. For sizeof we do not have to include an transforms for fixed length items. Difficult to say if we should perform the assertions in sizeof or wait until serialization. Perhaps the assertion is supposed to assert a condition before a transformation, but the sizeof forgos the assertion and then performs the transformation.

We could separate transformation and make it generate a transformed object that is then used in sizeof or synchronous serialize.

If we just want sizeof we still do not need to transform any fixed with objects, but it is unlikely that this is a common use case, I just want to know how big this would be if I serialize it, I have no intention of doing so.

At this point I am leaning toward a transform, but having things be as they are encountered during asynchronous serialization. There are problems their too with strangely length encoded formats, where the length is in a header and in advance of the body of a packet. I'd use an accumulator for that but it would better to have some sort of $sizeof object or sizeof function that you could call to get the length, and that would require an entirely transformed object.

Parsing cannot transform and if someone has developed a protocol that needs to seek back and forth on the wire that really isn't my problem.

Thu Jul 22 00:45:05 CDT 2021

TODO Need warnings for recursive includes.

Wed Aug 12 02:38:26 CDT 2020

On occasion you'll worry yourself that there will be some fatal ambiguity that will scrap this project, or some missing feature that will not fit within the constraints of the language, but at the moment, as you're resolving an ambiguity by simply making the test more stringent, you have a strong sense that things will always sort themselves out, the project will always be a little peculiar, but it will always be okay in the end.

Sun Aug 2 02:40:15 CDT 2020

Regarding unsipping in incremental parser. Upon sip, we have all the values in memory and it doesn't seem to make sense to make a call to the parser to parse a buffer constructed from the sips. For the common case, those values will be read into an integer, the common case of sipping to determine the length of an encoded integer. If the sip results somehow in a decision to read a variable length field, then we can't unsip from the registers we've used to gather the sip, it would be no more or less complicated than an incremental parse.

But, how is that supposed to work? Who is going to encode a zero terminated string that might also be a 32-bit integer? Who wants to document that certain string values are going to be misinterpreted as an integer?

If the only time we ever unsip is into integers, then it does make sense to build the integer from the sip. If we don't see an integer as the first value that can consume the sip, we can recurse. This is frightfully complicated though. It wouldn't be so daunting if there was a function you could write on the tree that would tell you the number of fixed bits that are consumed by integers of any sort starting from a node in the tree. If it consumes the sip, then proceed with unwinding the sip.

If we do unwind the sip, we now have to pass the sip in from the best-foot-forward parser.

Then there is the case where a branch of a sip conditional may sip itself. Also, we now start keeping the sip stack around until the sip is consumed, and a sip can be consumed by a sip, since a sip is always an integer.

But, really, the common case is length encoded integers, so sips of sips are a bridge to cross when we get there. At first we could simply ensure that the resolved field is an integer that consume the entire sip and do an unsip, then we can advance to more complicated conditions like a structure that nests integers, an array whose length encoding or first word if terminated consumes the sip, etc.

Unsip will have to be a stretch goal.

Thu Jul 23 13:15:23 CDT 2020

Mon Jul 20 07:15:55 CDT 2020

Will not do calculated terminals.

cycle(okay, {
    name: 'terminated/calculated',
    define: {
        object: {
            nudge: 8,
            array: [ [ 8 ], $_ => [ 0xd, 0xa ], $_ => $_[$_.length - 2] == 0xd && $_[$_.length - 1] == 0xa ],
            sentry: 8
        }
    },
    objects: [{
        nudge: 0xaa, array: [ Buffer.from('abcdefghij\r') ], sentry: 0xaa
    }],
    stopAt: 'serialize.chk'
})

I'm imagining how it would be done. Could pass the array and have it fuss with values at the end of the array, which would mean that functions would never really fit inline, so you'll end up writing modules to use this feature. Could specify a maximum length and pass an array with that many bytes to the function, but do you wait for that may bytes to accumulate before you call it?

Once I start to see how it will work, I want to implement it so I don't forget how it would work. But, these are components that I may implement in time anyway, so they would be available to implement this. The same logic as parsing a sipped integer, a byte reader object in the generator that would shift bytes off the integer until it was empty then fallback to incrementing the buffer. This same object could shift from an accumulator array. We would have a separate stack, imagine reading four bytes into a calculated terminator buffer, then terminating, sipping a 16-bit integer for a conditional, then when you enter the conditional branch you have a stack sources, sipped integer, terminator buffer, and underlying buffer.

This project has been drudgery. It was a clever idea that I wanted to have a go at, but it has proven to be quite tedious. Every little change means walking through the seven different generators looking for off-by-one errors. Copious code is generated and it has little formatting errors, duplicated assignments, many little annoyances that will have to be addressed someday. The copious code means I just want to the tests to pass, but bugs hide. I've seen where an advance calculates to a retreat, but we already overran anyway to the retreat saves the parse. undefined shows up in the code, but we short circuited that path with best-foot-forward.

Calculated terminators would add lots of complexity to an already complex part of the code and the legibility of the generated code would decrease. It would become a mass of catenated strings. (Which makes me wonder if I should add some sort of special handling for looping to Prolific. There, use that to get your mind off this set of problems.) At this point I want to get to where I'm going over all this generated code looking for bugs and poor formatting and finalizing, instead of piling on more disarray for the finalization.

At this point? What is this point? The point at which it does appear to be complete and could parse most of the formats I'm aware of. I've addressed MySQL integers and those Minecraft terminated arrays of structures. I'm encountering ambiguities I've not considered during testing when I'd much rather encounter them during application so I can make choices about the priority of languages terms based on the frequency in which they are encountered.

I sense a failure of discipline with this issue. I've been using GitHub Issues in order to work from a list, to give myself the game mechanics to pull me through the tedium. From the vantage point of this frame of mind marking an issue as "won't do" would appear to be an unmet challenge.

Because I subject myself to the behavior modification of this game, and because it is effective in getting me to push through the tedium, I surrender my thought process to its influence. When procrastination raises its voice I dismiss it by reviewing the GitHub Issues, speaking over that voice by counting the remaining issues that I'd have to complete to get the issues down to a single page of issues. Procrastination, being clever and charismatic, finds new and better ways to make itself heard, and so when I hear a voice telling me that an issue should be a "won't do" I clutch my GitHub issues like a rosary.

It would seem like Procrastination is trying to con me into kicking a can down the road, to walk away from the project by calling it done when it is not actually done. A "won't do" issue will resurface or duplicate. Procrastination has won and has done so by insinuating itself into my game, playing along with me, occasionally making a move on my behalf, so I don't even notice that I'm procrastinating.

But, my procrastination is, like myself, incredibly short sighted, and really not altogether clever. It just wants to get me to open up YouTube now, not reduce the overall hours of work on the project. This nagging voice is not procrastination, but something else. The nagging voice of reason.

What decided this for me was realizing that if I implement calculated terminals I'm going to have to document it. I am a bad writer, as you can see, but I am an absolutely terrible technical writer and this project is especially difficult to document as it is not a collection of classes and functions like a normal module, but rather an enormous language hack. The thought of having to explain use and function of calculated terminators settles the matter. Won't do.

And this is not procrastination. It is realizing that the documentation would really fall apart at that point and the module would no longer be clever and elegant but bursting past the boundaries of the syntax bashed language.

Upon further consideration, I began to see that calculated terminals actually mean the beginning of pattern matching, which is my well be another three months of my life and a has already been designated out-of-scope for the module.

And so, I hereby close issue 490 and if you consider reopening it, consider whether you shouldn't start here (Snort content matching rules).

Do not let visions of pattern matching dance in your head, you have better things to do.

Sun Jul 19 11:28:29 CDT 2020

Literal and include ambiguity. Could resolve by requiring the $. However array wrapping makes the problem go away quickly, and I don't have special naming conventions that disambiguate yet.

const defintion = {
    facade: 16,
    literal: [ 'ab', 8, 'cd' ],
    array: [[ 'facade' ], [ 8 ] ],
    horrible: [ 'facade', 'facade' ]
}

Sat Jul 18 10:11:44 CDT 2020

Should the calculation of accumulators be everywhere?

Wed Jul 15 12:45:09 CDT 2020 ~ todo

What about some sort of limit if there is a header length? Stop when we reach the limit, but only really works of we're incremental, or best-foot-forward.

Seems possible. Make it a special accumulator. It can be scoped using structures. Would probably raise an exception. Not sure how the user would resume a stream in this case.

Wed Jul 15 02:41:25 CDT 2020 ~ todo

Easiest way to resolve the double jump on terminated incremental parse would be to count the steps in a conditional first, then pass a done property to the decent, or maintain a stack, so that the terminated parse would use that done instead of using one it calculated itself.

Fun problem to consider, nested if statements so we probably have to jump to the outer end.

Only nested jump that should be reset appears to be within conditionals. Nested terminated arrays need to jump to their terminator check.

Tue Jul 14 03:26:09 CDT 2020 ~ todo

Another little thing to fix. If the size of a condition is less than or equal to the size of a sip, we can remove the checkpoint from the condition.

Mon Jul 13 23:01:43 CDT 2020

Checkpoints are hard to test because I have to devise tests that force checkpoint creation, which usually means injecting a conditional somewhere. Literal tests where passing, but the best-foot-forward parser was not advancing the checkpoint for the literal correctly. Because the literal structure was fixed width, we always fell back to the best-foot-forward parser immediately, so no way to detect that the literal was not correctly generating a jump in its best-foot-forward parser.

Makes me want to consider a switch that would tell the best-foot-forward generation to add a checkpoint for each field. Otherwise, I'm almost certain to get feedback from the wild. Would mean generating two new tests and worst of all, trying to come up with a new three letter file suffix prefix. Meh, just use chk and try not to worry about it, it's only for testing.

Mon Jul 13 12:40:35 CDT 2020 ~ todo

Not liking the recursive call to the incremental parser for the common case of literals and integers. Seems that for an integer we could unwind the sip and skip the literal somehow, but if the conditional resolves to anything else besides a literal surrounding an integer we do the recursive parse.

In order to do with, $sip will need to be an array and passed to the incremental parser so it can unwind the $sip gathered by the best-foot-forward parser.

Mon Jul 13 06:46:02 CDT 2020

Thoughts on sipping. I'd imagined that UTF-8 would parse by looking at each byte for a terminator, but this is not the case. The first byte determines the length, but the subsequent bites have the first two bits ignored. Can't simply logical and them away either. It would change the nature of the shift.

Thus, we have to consider whether we want to support this, or how. Would be able to do it now using calculated terminated arrays, to find the end by checking the start, then a fixup function that would calculate. Or can still use a sip to determine a fixed array length.

Sipping each byte is definitely calls for using terminated arrays, though. Nested sips would not be of much use, they would produce an array and have to be constructed, so nested sipping is gone.

There's an advantage to the sip, but it would be nice if we could preserve the buffer one parse, and have the user tell us the actual value.

const conditional = {
    object: {
        mysqlInteger: [[
            $_ => $_ < 251n, 8n
            $_ => $_ >= 251n && $_ < 2n ^ 16, [ 'fc', 16n ],
            $_ => $_ >= 2n ^ 16 && $_ < 2n ^ 24, [ 'fd', 24n ],
            true, [ 'fe', 64n ]
        ], [ 8n, [
            $sip => $sip < 251, 8n,
            $sip => $sip == 0xfc, [ 'fc', 16n ],
            $sip => $sip == 0xfd, [ 'fd', 24n ],
            $sip => $sip == 0xfe, [ 'fe', 64n ]
        ]]
    }
}

But to do so means we have to rewind, which is always annoying. Simple for the incremental parser, simply keep an array of the sip and call the parse function.

Oh, wait, also simple for the synchronous parser. Just reset the $start index.

Ah, we could keep an array for the incremental parser, or we could just explode the integer we gathered for the sip.

And you can see that we can also skip some parsing in the case of MySQL where we don't really need to parse the literal, but we need to specify it in order to skip the bytes, but the skip is a increment and if we want to be really crafty, after a sip we can deduct any literals from the $start rewind.

Anyway, here's the imagined protocol, the one I thought I had to make conditionals nested to support.

Parser side only, bytes are 7-bits, the top bit is used to indicate an end of number with a 0. Seems like something like that is going to be out there.

const definition = {
    object: {
        value: [[], [
            [ 8 ], array => array[array.length - 1] & 0x80 != 0x80
        ], [ function (array) {
            array.reduce((sum, value, index) => {
                if (index == array.length - 1) {
                    return (sum << 7) + (value & 0x7f)
                }
                return (sum << 7) + (value & 0x7f)
            }, 0)
        }]]
    }
}

Will I ever need to nest sipping?

What is sipping? It's when you read a byte to determine the type, but the byte you read is part of the type. In the case of MySQL the leading byte could be a flag indicating that length of the integer that follows the byte, but if the top bit is not set, the value is the byte itself.

Now, about stripping those bytes. If sipping gets us to a specific width, then a parse inline can assemble an array near as quickly as anything generated in the parser. Otherwise, I need to define how the bytes are unpacked.

const definition = {
    object: {
        value: [ 24, 3, 6, 6 ]
    }
}

That would be a 24-bit UTF-8 encoded character, which is really 15 bits. It's saying a 24-bit integer but using 3 bits from the first, 6 from the rest of the bytes.

Ugly, but the space in the language is available. How about we parse UTF-8 last, after parsing a bunch of other stuff, parse it for now with special functions, because we may need this space, and we're never going to really use Packet to parse UTF-8.

Sun Jul 12 17:33:00 CDT 2020

The following is ambiguous.

const definition = {
    packet: {
        type: 8,
        value: [
            $ => $.type == 1, { first: 16, second: 16 },
            32
        ]
    }
}

May appear to be a conditional but it is also a switch statement.

const definition = {
    packet: {
        type: 8,
        value: [
            $ => $.type == 1, {
                first: 16,
                second: 16
            }, 32
        ]
    }
}

We can resolve this ambiguity by keeping our rule that a conditional must have at least two conditions, and by using an explicit value for else.

const definition = {
    packet: {
        type: 8,
        value: [
            $ => $.type == 1, { first: 16, second: 16 },
            true, 32
        ]
    }
}

If we ever want to have a value that is not there, we can just use null.

        type: 8,
        value: [
            $ => $.type == 1, { first: 16, second: 16 },
            true, null
        ]
    }
}

Maybe even also allow [] if an array isn't there.

const definition = {
    packet: {
        type: 8,
        value: [
            $ => $.type == 1, [ [ 8 ], [ 16 ] ]
            true, []
        ]
    }
}

Could also be expressed as...

const definition = {
    packet: {
        type: 8,
        value: [
            $ => $.type == 1, [ [ 8 ], [ 16 ] ]
            true, [ [ 0 ], [ 16 ] ]
        ]
    }
}

But that generates dead code.

Thu Jul 9 14:20:56 CDT 2020

It is decided that fixed buffers will strip after the buffers are read into memory in every case. Trying to reuse the terminated code is aesthetically unpleasing. With it, I could stop when I hit a padding, but it complicates the code such that the terminated code has to also stop at a fixed point. Fixed length buffers with padding are for older protocols, specifically tar which has these strange null terminated and fixed strings, that I believe are sometimes unfixed in later implementations of tar.

Although, upon consideration, maybe reusing the terminated code is not so bad.

The reasoning going into this diary entry is that the fixed width means some reasonable size and the data is always there, so you may as well read it all in, then trim with buffer slice. This would allow for the same slice code in both the incremental parser and synchronous parser. There ought not to be a protocol that has an enormous fixed length, but commonly uses it for a handful of bytes.

However, it occurs to me that the ugliness comes from duplicating the termination code. My latest foray into this mess is to implement special handling for buffers. Padded fixed termination is implemented as terminated, but with checking to see if we've reached the of the width of the field in addition to checking for the terminator. This was basically copy and paste from the terminator implementation.

Rather than having the padding be a property of a fixed field, why not make the limited length of the field a property of a terminated field. If the width is not zero, then we adjust the generated terminator code to stop at a limit.

Note that we do not want to use this limit to prevent starvation from a client sending us a terminated field without a terminator. That sort of checking should be external to the parser. You'll want to add that to the documentation. That is maximum length terminated versus fixed length padded. We could add a counter to the API externally counting how many bytes have been fed to a particular parser.

So, the documentation and language will call this a fixed length padded, but the AST calls it terminated and I can corral this mess into a single generator function. The direction I was going was making helper functions that both functions call. This was getting so very ugly. Actually routing everything to the single terminator function would probably be more both more performant, in theory, cancelling a continuation of the parse, proceeding to skip.

Seems like this should be done during expansion since when we are doing a synchronous parse we can slice out the full buffer always, or else we'd already gone incremental, and then slice out anything after the terminator. The existing implementation works fine.

Should note that you should test that you don't overrun by placing a field afterward that has the terminator character in it.

Tue Jul 7 07:52:20 CDT 2020 ~ buffer, streaming

Buffer based byte arrays might be nice, but then why not do TypeArrays as well? You could be making life difficult for yourself.

Actually, you could declare these rather easily.

const definition = {
    lengthEncodedBuffer: [ 32, Buffer ],
    fixedBuffer: [[ 32 ], Buffer ],
    terminatedBuffer [ Buffer, 0x0 ]
}

From what I can see, there's no point in creating different TypedArrays, just the underlying ArrayBuffer which is simple enough for fixed and length encoded but would require catenation for terminated.

Terminated would be best as an array of segments so that the user could decide if the user needs to catenate them, in case the user can do without the catenation. Perhaps the user is just writing to file so they can write one buffer and then the next, not an entire catenated buffer.

For now, we can let this be and edge case.

Oh, well, this is too easy. Buffer instances are also Uint8Array instances, which is the language’s built-in class for working with binary data. Let's support buffers and let's concat zero terminated strings.

Anyway, when I looked at this at first, I thought I'd have to support all the different types, and then do I want to support endianness, etc? Now I can see that I return an ArrayBuffer and the user can cast it.

Which brings us to streaming. This shouldn't be specified in the language, but the APIs should allow the user to pull a chunk of bytes or write a chunk of bytes to the underlying stream somehow, not have to worry about how to switch from parsing headers to streaming bodies.

Sat Jul 4 12:04:12 CDT 2020

The API should implement strategies, and be a wrapper around a definition. Thus, BestFootForwardParse or SynchronousSerialize. This way the API can be probably be the same across implementations and you're not providing different function names for different strategies which would make for terse or wavy camel-case parseBFF or parseBestFootForward.

Fri Jul 3 19:41:57 CDT 2020 ~ todo

Just occured to me that I might want to have 8n, should I have some variable number that ranges from 8 bits to 64 bits, but I want the same type to appear in the deserialized structure. Do this instead of inferring BigInt.

Fri Jul 3 19:30:25 CDT 2020

Felt I had a challenge where I'd have to maintain an counter and that counter would have to update in order to implement MySQL integers, but I realize now that I only need to get the header value and the start of particular field.

It would be akin to this.

define({
    packet: {
        length: 32,             // Total length of packet.
        string: [[ 8 ], 0x0 ],  // Null terminated string.
        number: [               // Number occupying remaining bytes.
            ({ $, $start }) => $.length - $start == 1, 8
            ({ $, $start }) => $.length - $start == 2, 16
            32
        ]
    }
})

If you can parse that now, you should be okay to parse MySQL when the time comes.

Wait... Oh, you silly git. Start is not from the start of the packet, it is the current start of the buffer which may not be the same buffer as when the length is recorded. We do need running calculations.

define({
    packet: [{ counter: [ 0 ] }, [[[ ({ $start, $end, counter }) => {
        counter[0] += $end - $start
    } ]], {
        length: 32,             // Total length of packet.
        string: [[ 8 ], 0x0 ],  // Null terminated string.
        number: [               // Number occupying remaining bytes.
            ({ $, counter }) => $.length - counter[0] == 1, 8
            ({ $, counter }) => $.length - counter[0] == 2, 16
            32
        ]
    }
})

Which makes it appear that we'll need to have the count so far if we reference it in a conditional or switch, and maybe not in a transform or assertion.

Sun Jun 28 20:49:02 CDT 2020

Disambiguation.

const define = {
    // If we require always three, then this could be ambiguous. However, the
    // middle part is not a valid type on its own.
    conditional: [
        [ $ => $.type == 0, 32 ],
        [ $ => $.type == 0, 32 ],
        [ $ => $.type == 0, 32 ]
    ],
    // This is a valid type for a inline, it would be considered a switch
    // statement instead of a condition. So this could either be a inline that
    // transforming the result of a switch statment, or a conditional with a
    // middle case that indicates a nested structure.
    conditional: [
        [ $ => $.type == 0, 32 ],
        [ $ => $.type == 1, { first: 16, second: 16 } ],
        [ $ => $.type == 2, 32 ]
    ],
    // This would remove the ambiguity for conditionals. Now the conditional
    // cannot be interpreted as an inline.
    conditional: [
        $ => $.type == 0, 32,
        $ => $.type == 1, { first: 16, second: 16 },
        $ => $.type == 2, 32
    ],
    // Do we still require that an inline is defined with both before and after?
    inline: [
        32,
        [ max, 8 ]
    ],
    switched: [ $ => $.type, {
    }],
    switched: [ $ => $.type, [
        [ 0, 8 ],
        [ 1, 16 ],
        [ 32 ]
    ]],
    // Yes, here because we won't know which element is a conditional statement
    // and which element is a series of inline functions.
    inline: [[
        min, 1, max, 8
    ], [
        $ => $.type == 0, 32,
        $ => $.type == 1, { first: 16, second: 16 },
        $ => $.type == 2, 32
    ]],
    // With this empty array it is resolved.
    inline: [[
        min, 1, max, 8
    ], [
        $ => $.type == 0, 32,
        $ => $.type == 1, { first: 16, second: 16 },
        $ => $.type == 2, 32
    ], []],
    // Is it also resolved with the bi-directional notation? It could begin to
    // be mistaken for a terminated array containing a conditional with a
    // calcuated terminator, but that would expect the termination function to
    // be a member of the array.
    inline: [[[
        min, 1, max, 8
    ]], [
        $ => $.type == 0, 32,
        $ => $.type == 1, { first: 16, second: 16 },
        $ => $.type == 2, 32
    ]]
}

Sun Jun 21 08:24:13 CDT 2020

Looking at a benchmark, it appears that the real performance cost comes from a slice of the buffer, if that's how we want to do it, and in reality we probably want to build an object that has $buffer, $start and $end, where maybe instead of $start we have some sort of marker indicating some point in the parse, a place greater than the start of a marker in the language or the last time the value was referenced. Ah, not necessary. We can use the scope concept and track it automatically.

What if the scopes get nested? Same thing with sips. Makes me feel that sip should be an array, but that is so ugly for the common case. At least it is going to be constant so we don't have to do sip[sip.length - 1], except maybe in blind helpers functions.

Thought I'd resolved ambiguities because length-encoded and terminated arrays are both identified by an array with a single element at a certain position, but switch statements that use maps are going to be indistinguishable from a fixup only. Could decided to put the switch condition in an array, but then putting an array around a fixup for coming and going doesn't work.

Could say coming and going is an array around after, but that's kind of weird rule and will be hard to remember and from my latest sketches won't look right. Seems like you want to see this funny thing we're going to do right off.

Could say that the mnemonic is that these fixups are parenthetical, operating on the objects, values or buffers outside of the context of the language and leave it the way it is, even as we're getting rid of Buffer and Object and other things.

Had a look now at using { switch: $ => $.type, cases: [] } and that would be such a departure. Now we have names that are part of the language whereas before there are no names in the language. All names are provided by the user.

Putting an array around the case map makes it into a length-encoded array of structures.

Map for switch statements is a go for sure, though, since we already have map translations and I already have a place to use it in MQTT. Function definitely leads. No other way to envision that aspect of it. Only other possibility conceivable is to insist on a default value, but no.

Updated my latest examples. It isn't so bad. I can live with it. We're at the I can get used to it stage of resolving ambiguities.

Thu Jun 18 17:42:34 CDT 2020

For this checksum nonsense I need to recall how to do service discovery.

Thu Jun 18 04:57:18 CDT 2020

Trasnforms saved here so I can reference them if I decide to parse tar.

// The default transforms built into Packet.
var transforms =
// Convert the value to and from the given encoding.
{ str: function (encoding, parsing, field, value) {
    var i, I, ascii = /^ascii$/i.test(encoding)
        if (parsing) {
            value = new Buffer(value)
            // Broken and waiting on [297](http://github.com/ry/node/issues/issue/297).
            // If the top bit is set, it is not ASCII, so we zero the value.
            if (ascii) {
                for (i = 0, I = value.length; i < I; i++) {
                    if (value[i] & 0x80) value[i] = 0
                }
                encoding = 'utf8'
            }
            var length = value.length
            return value.toString(encoding, 0, length)
        } else {
            var buffer = new Buffer(value, encoding)
            if (ascii) {
                for (var i = 0, I = buffer.length; i < I; i++) {
                    if (value.charAt(i) == '\u0000') buffer[i] = 0
                }
            }
            return buffer
        }
    }
// Convert to and from ASCII.
, ascii: function (parsing, field, value) {
        return transforms.str('ascii', parsing, field, value)
    }
// Convert to and from UTF-8.
, utf8: function (parsing, field, value) {
        return transforms.str('utf8', parsing, field, value)
    }
// Add padding to a value before you write it to stream.
, pad: function (character, length, parsing, field, value) {
        if (! parsing) {
            while (value.length < length) value = character + value
        }
        return value
    }
// Convert a text value from alphanumeric to integer.
, atoi: function (base, parsing, field, value) {
        return parsing ? parseInt(value, base) : value.toString(base)
    }
// Convert a text value from alphanumeric to float.
, atof: function (parsing, field, value) {
        return parsing ? parseFloat(value) : value.toString()
    }
}

Mon Jun 15 02:19:56 CDT 2020

I'd added an issue to create string constants that would be added to packets to indicate the packet type determined at parse, but don't see a need for it. With mapped translations, we can turn a flag into a string, and sips that are used to determine the length of a variable length integer, or otherwise the binary type of an integer, as in UTF-8 or MySQL integers, do not convey information that is useful to the parse. Plus, what do you do during serialization?

#9

Thu Apr 30 01:08:34 CDT 2020 ~ todo

Terminated arrays peek for a terminator, but if the array is an array of words and the terminator is the same length as element being read, we can read read the terminator as a word and break when the word equals the word value of terminator.

Furthermore, if the array is an array of bytes, we can use Buffer.indexOf to look for the terminator.

Implies that we ought to return Buffer when the value is an array of bytes.

Note that I'm not in favor of fixups in code at the moment, the user and pre-process before serialization and post-process after parse.

Wed Apr 29 08:34:26 CDT 2020

At some point I'm going to have to determine if there is a requirement to have a user namespace outside of the field names, like for temporary variables. As it stands, all variables are prefixed with $.

Thu Feb 20 23:42:09 CST 2020

Going to page this project into memory and attempt to leave a roadmap in this diary entry as I do.

  • Length-encoded arrays containing length-encoded arrays.
  • Conditionals.
  • Conditional packing.
  • Nested conditionals.
  • Packed integers.
  • Two's compliment.
  • Checkums.
  • Terminated arrays.
  • Fixed arrays.
  • Terminated fixed arrays.
  • BigInt.
  • Floating point.

That was the general order of things and would allow me to run through and delete the rest of the legacy which I'm still keeping around because there may be some things I need to swipe.

Remember that his is much easier becuase of let. You're considering how to use the namespace of the function wiht sigils. You have rules about nested names.

Remember to optimize last. Someone might be able to write a parser by hand that is faster than Packet at first, but you'll catch up over time.

The last thing you where thinking is that you wanted to normalize the descent logic across all the different generators. Some of them have a different concept for their dispatch function.

Also, I know there is confusion about how to do $sip and the like, so we need a special sigil for internals like _$ as a prefix, which makes for a simple rule that only causes problems on rare occasions.

Note that at the outset, I'm not going to worry about namespaces in whole parsers and use the constructed object as the namespace. A place where performance may be inproved in the future by using local variables.

Sun Jan 19 04:21:24 CST 2020

{
    packet: {
        mysqlInteger: {
            $parse: {
                $sip: 8n,
                $return: [
                    $sip => $sip < 251, $sip => $sip,
                    $sip => $sip == 0xfc, 16n,
                    $sip => $sip == 0xfd, 24n,
                    $sip => $sip == 0xfe, 64n,
                ]
            },
            // Oops, not putting down the flag.
            // TODO ^^^ What?
            $serialize: [
                $_ => $_ < 251n, 8n
                $_ => $_ >= 251n && $_ < 2n ^ 16, 16n
                $_ => $_ >= 2n ^ 16 && $_ < 2n ^ 24, 24n,
                64n
            ]
        },
        utf8: {
            $parse: {
                $sip: 8,
                $return: [
                    $sip => $sip & 0x80 == 0, $sip => $sip,
                    $sip => $sip & 0xe0 == 0xc0, {
                        $sip: $sip => $sip,
                        $first: 8
                        $return: ($sip, $first) => $sip & 0xe0 << 8 + $first & 0xc0
                    }
                ]
            },
            $serialize: [
                $_ => $_ < 0x80, 8,
                $_ => $_ >= 0x80 && < 0x800, [
                    16, $_ => ($_ >>> 6 & 0x1f | 0xc0 << 8) & ($_ & 0x3f)
                ]
            ]
        },
        utf8: [[
            (utf8) => utf8 < 0x80, 8
        ], [
            (utf8) => utf8 >= 0x80 && < 0x800, [
                16, utf8 => (utf8 >>> 6 & 0x1f | 0xc0 << 8) & (utf8 & 0x3f)
            ]
        ]], [{
            sip: 8,
            utf8: [[
                sip => sip & 0x80 == 0, sip => sip
            ], [
                sip => sip & 0xe0 == 0xc0, {
                    sip: sip => sip,
                    first: 8,
                    utf8: (sip, first) => sip & 0xe0 << 8 + first & 0xc0
                }
            ]]
        }, [ 8, [[
            first => first & 0x80 == 0, first => first
        ], [
            first => first & 0xe0 == 0xc0, [ 8, [[
                (first, second) => first & 0xe0 << 8 + second & 0xc0
            ]]]
        ]]],
        string: [ 'mysqlInteger', [ 'utf8' ] ] // sensible chuckle
    }
}

And with that, minification of the definitions is not allowed. It would destroy the information in the functions. Not sure what this means for anyone using something like Webpack. Not sure I care. Seems like you ought to be able to sort out your own tools and not minify a particular file, still be able to source it somehow.

Whew. UTF-8 is a beast. Yes, we have to have separate parse and serialize. Yes, we have to have some way of referencing parsers stored as functions. Yikes, how are you going to do best-foot-forward parsers with this mess?

With some rules, we could parse the function bodies and convert the logic to different languages.

This is the point where I look at future of this project and decide it is a project for a later date. Will probably push through to length-encoded nested structures and some sort of conditional, but this hill on the horizon, well, it looks better on the horizon than under foot.

Sun Jan 12 16:04:33 CST 2020

The differentiation between lookup and nested structures was going to be that a lookup has multiple values and a nested structure has a single variable. Why would you lookup when it always maps to a single variable?

As I write, I realize that we could have conditionals always be started with a function and that could indicate a conditional.

{
    packet: {
        header: {
            type: 4,
            name: [ $ => $.header.type, [ 'connect', 'connack' ] ],
            flags: [ $ => $.header.name, {
                connect: [ 4, 0x0 ],
                connack: [ 4, 0x0 ]
            } ]
        }
    }
}

Or maybe even...

{
    packet: {
        header: {
            type: 4,
            name: [ $ => $.header.type, [{
                name: 'connect',
                value: [ 4, 0x0 ]
            }, {
                name: 'connect',
                value: [ 4, 0x0 ]
            }] ]
        }
    }
}

Or both. This way we don't have to document meanings based on variations of length.

Can't quite fathom how to convert conditionals to C. If I parse things and find that it is always lookup and never calculation, then the $.header.type can be converted into a package lookup.

Tue Oct 29 05:09:00 CDT 2019

ES6 block scope variables (aka let) are a boon to parser and serializer generalization. No more hoisting vars and you can declare the variables you need in a block and not have to worry about collisions. Generated code looks cleaner (perhaps slightly more generated) and the generation code is going to be much cleaner.

Tue Oct 29 02:29:23 CDT 2019

Packet is designed to run on Node.js because the syntax bashing was implemented in Node.js with Google V8. There may be some aspects of the language that depend on Google V8, specifically the representation of JavaScript snippets which depends on the implementation of Function.prototype.toString(). It is probably possible to shim this for other JavaScript engines.

To the best of my knowledge, the remainder of the language is based on JavaScript as it is specified. If you find yourself starting a port to another JavaScript engine, please share your build set up, I'd like to follow along.

The rule for JavaScript code snippets shall be: if the function fits in one line of code, then we will inline it, if not then we will declare the function and call it, hopefully the JIT compiler will inline it.

Mon Oct 28 09:36:37 CDT 2019

Parsing seven things.

  • MQTT ~
  • DNS ~
  • MySQL ~
  • tar ~
  • Cap'n Proto ~
  • Minecraft ~

What is the last one? WebSockets, but I'm not really interested in that. Leaving it open and maybe it is WebSockets and maybe by then it is a doodle, but look for something that has some really ugly properties.

Sun Oct 27 20:38:43 CDT 2019

This is a library I sketched out many years ago, but I never got around to completing it. For those of you who where enamoured of the original language, you'll find that I've departed from it significantly. I've returned to this project with JavaScript ES6 and I'm syntax bashing a new parser definition language taking advantage of contemporary features to get more sigils to play with. Everything is expressed through syntax bashing JavaScript, no more parsing strings to determine properties of fields.

New language...

define({
    first: 16,
 // ^^^^^ field name
    second: 16,
 //         ^^ size in bits, must be multiple of 8
    // smallest representation wihtout packing, 8-bit byte.
    byte: 8,
    // 32-bit integer, unsigned, network byte order aka big-endian.
    integer: 32,
    // For 64-bit integers we can use `BigInt`.
    integer: 64n,
    // 32-bit integer, little-endian for parsing C structs. The tilde sigil is
    // squiggly and we're squiggling the bits around.
    littleEndian: ~32,
    // 16-bit two's compliment signed integer.
    signed: -16,
    // 16-bit two's compliment signed integer, little-endian.
    signedLittleEndian: -~16,
    // The sign and endian sigils also work for `BigInt`.
    longSignedLittleEndian: -~64n,
    // 16-bit length encoded array of 16-bit integers.
    lengthEncoded: [ 16, [ 16 ]],
    // TODO Thinking about how to do lengths that are encoded by a packed value.
    // The repeated bit will always be an array with a single something in it,
    // so that could be the disambiguation we need to use the array around a
    // function universally. Note that `eval` is available as a keyword. Note
    // that symbols are available as well.
    extractedEncoded: [ $ => $.header.length, [ 8 ] ],
    // Zero terminated array of 16-bit integers, terminator is 16-bit.
    zeroTerminated: [[ 16 ], 0x0 ],
    // TODO Similarly, how do you a calcuated termination? Say you actually want
    // the terminator to be a part of the value?
    calculatedTerminated: [[ 8 ], value => value[value.length - 1] == 0xa ],
    // Zero terminated array of 16-bit integers, terminator is 8-bit.
    zeroTerminatedByByte: [[ 16 ], 0x0 ],
    // Carrage-return, newline terminated array of bytes.
    crlfTerminated: [[ 8 ], 0x0d, 0x0a ],
    // Fixed with array of bytes.
    fixed: [[ 8 ], [ 16 ]],
    // Fixed with array of bytes, zero padded. (Essentially zero terminated.)
    fixedZeroed: [[ 8 ], [ 16 ], 0x0 ],
    // Fixed with array of bytes, ASCII space padded.
    fixedSpaced: [[ 8 ], [ 16 ], 0x20 ],
    // CR-LF fillled. Wonder if such a thing exists in the wild?
    fixedCRLF: [[ 8 ], [ 16 ], 0x0d, 0x0a ],
    // Bit-packed 16-bit integer, note that bit-fields are always big-endian.
    flags: [{
        temperature: -4,     // two's compliment signed
        height: 8
        running: 1
        resv: 3
    }, 16 ],
    // We can make the entire field little-endian by explicitly specifying. I
    // don't know that the packed values are supposed to be little endian, so
    // I'm not going to worry about that for now.
    flags: [{
        temperature: -4,     // two's compliment signed
        height: 8
        running: 1
        resv: 3
    }, ~16 ],
    // 4-byte IEEE floating point, a C float.
    float: 32.32,
    // 4-byte IEEE little endian floating point, a C float.
    float: 32.23,
    // 8-byte IEEE floating point, a C double.
    double: 64.64,
    // 8-byte IEEE little endian floating point, a C double.
    double: 64.46,
    // Literals.
    literal: [ 'fc' ],
    // Skip 30, fill with ASCII spaces? No different from literal.
    skip: [ '20', 30 ],
    // Otherwise. Strings incidate a padding.
    literal: [ 'fc', 16 ],
    skip: [[ '20', 30 ], 16, [ '20', 3 ]],
    // Would want to import encoding and decoding functions.
    fixup: [[ value => encode(value) ], [
        [ [ 8 ], 0x0 ]
    ], [ value => decode(value) ]],
    skipAndFixup: [
        [ '00', 16 ], [
            [[ value => encode(value) ], [
                [ [ 8 ], 0x0 ]
            ], [ value => decode(value) ]]
        ]
    ],
    // I've run out of sigils, so might have to use strings. Can't pass in an
    // object unless the object wants to be a very special sort of object.
    // Usually you checksum a run of bytes, but what if the checksum is supposed
    // to skip some bytes? Hmm... We can disambiguate between references to
    // other patterns in the definition and includes easily enough and insist
    // that a package does not have the name name as a hexidecmal integer.
    checksumed: {
        body: [ crc32.create, [ crc32.update, 32, [ 8 ], crc32.update ],
        footer: {
            crc32.digest
        }
    },
    // The major difference for a checksum is that it operates on the underlying
    // buffer and not the values.
    $checksum: 'packet/crc32',
    // Always insist on a sub-object?
    // TODO We can always parse the source to deterine if we are being passed
    // an object, so that's actually well determined. If we are being passed an
    // object, we might be able to parse the destructuring well enough to know
    // which objects we need to pass in, so we don't waste time slicing buffers.
    checksumed: [{ $checksum: 'packet/crc32' }, {
        body: [[ ({ $checksum, buffer }) => checksum.update(buffer) ], {
            value: 32
        }, [[ ({ $checksum, buffer }) => checksum.update(buffer) ]]
        checksum: [[ ({ $checksum }) => checksum.digest() ], 32, [ ({ $checksum }) => checksum.digest() ]]
    }],
    example: {
        checksummed: [{ $checksum: 'packet/crc32' }, {
            body: [[ ($checksum, buffer) => checksum.update(buffer), Buffer ], {
                value: 32
            }, [ ($checksum, buffer) => checksum.update(buffer), Buffer ]],
            checksum: [[ $checksum => checksum.digest() ], 32, [ $checksum => checksum.digest() ]]
        }]
    },
    // A checksum-like helper might define what it is supposed to do in a scope.
    // Oof, but then how would you get the value when it leaves scope?
    example: {
        checksummed: [[ { $checksum: 'packet/crc32' }, 'md5' ], {
            value: 32
        }],
        checksum: [[ $checksum => checksum.digest() ], 32, [ $checksum => checksum.digest() ]]
    },
    // Maybe something like this...
    // Or maybe it exists until it is overwritten?
    // And maybe something ugly like double arrays means forward and backward?
    example: {
        // This would have special definition saying apply this function to
        // everything here as a buffer comming and going.
        body: [[ { $checksum: 'packet/crc32' }, 'md5' ], {
            first: 32,
            second: 32,
            third: 32
        }],
        // Still exisits outside scope, not overwritten, so we get the digest.
        checksum: [[[ ({ $checksum }) => checksum.digest() ]], 32 ]
    },
    example: {
        // This would be more explicit. A declaration, followed by a function
        // applied coming and going.
        body: [[
            { $checksum: 'packet/crc32' }, 'md5'
        ], [[
            // And maybe `$body` to get a specific start?
            ({ $checksum, $buffer, $start, $end }) => $checksum.update($buffer, $start, $end)
        ]], {
            first: 32,
            second: 32,
            third: 32
        }],
        // Still exisits outside scope, not overwritten, so we get the digest.
        checksum: [[[ ({ $checksum }) => checksum.digest() ]], 32 ]
    },
    // What about the wierd case of interpreting something differently if there
    // is not enough space remaining in the packet? So, there is a header in
    // MySQL and worse case I'd have to be decrementing a count from the start
    // of the packet. This could be done with a counter as designed above.
    mysql: {
        packet: {
            header: {
            }
        }
    }
})

That covers most of the definitions from the days of yore.