Gunnar_Zarncke comments on Open thread, Jul. 11 - Jul. 17, 2016 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (131)
I have some software I am thinking about packaging up and releasing as open-source, but I'd like to gauge how interesting it is to people other than me.
The software is a highly useable implementation of arithmetic encoding. AE completely handles the problem of encoding, so in order to build a custom compressor for some data set, all you have to do is supply a probability model for the data type(s) you are compressing (I call this "BYOM" - Bring Your Own Model).
One of the key technical difficulties of data compression is that you need to keep the encoder and decoder in exact sync, or the whole procedure goes entirely off the rails. This problem is especially acute for the use case of AE, where you are potentially changing the model in response to every event. My software makes it very easy to guarantee that the sender/receiver are in sync, and at the same time it reduces the amount of code you have to write (basically you don't write a separate encoder and decoder, you just write one class that is used for both, depending on the configuration).
Which language?
Java.
Great. I'm interested. Performancewise it may not be the best possibility, but for reusability it's good. I wonder about the overhead of your abstraction.
Thanks for the feedback!
Re: performance, my implementation is not performance optimized, but in my experience Java is very fast. According to this benchmark Java is only about 2x slower than pure C (also known as "portable assembly").
Yeah, the benchmark game. But arithmetic coding and the implied bit twiddling isn't exactly the strength of Java. On the other hand in this case the overhead of you in-sync de/encoding abstraction may be decisive.