Gunnar_Zarncke comments on Open thread, Jul. 11 - Jul. 17, 2016 - Less Wrong

4 Post author: MrMind 11 July 2016 07:09AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (131)

You are viewing a single comment's thread. Show more comments above.

Comment author: Daniel_Burfoot 12 July 2016 05:39:13PM *  4 points [-]

I have some software I am thinking about packaging up and releasing as open-source, but I'd like to gauge how interesting it is to people other than me.

The software is a highly useable implementation of arithmetic encoding. AE completely handles the problem of encoding, so in order to build a custom compressor for some data set, all you have to do is supply a probability model for the data type(s) you are compressing (I call this "BYOM" - Bring Your Own Model).

One of the key technical difficulties of data compression is that you need to keep the encoder and decoder in exact sync, or the whole procedure goes entirely off the rails. This problem is especially acute for the use case of AE, where you are potentially changing the model in response to every event. My software makes it very easy to guarantee that the sender/receiver are in sync, and at the same time it reduces the amount of code you have to write (basically you don't write a separate encoder and decoder, you just write one class that is used for both, depending on the configuration).

Comment author: Gunnar_Zarncke 13 July 2016 07:13:54PM 0 points [-]
Comment author: Daniel_Burfoot 13 July 2016 07:48:41PM *  2 points [-]

Thanks for posting this link, it contains a good illustration of the problem of using separate encoder/decoder implementations.

See how they have separate encoder/decoder implementations on page 8/9 of the document? That strategy is very very error prone. It is very hard for the programmer to ensure that the encoder and decoder are performing exactly the same updates, and even the slightest off-by-one error will cause the process to fail completely (I spent many hours trying to debug sync problems like this). This problem becomes more painful as you attempt to build more and more sophisticated compressors.

With my library, there is no separation of encoder and decoder logic; it is effectively the same code. That basically guarantees there will be no sync problems. Since I developed this technique I haven't had any sync problems.