You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

IlyaShpitser comments on Algorithmic Progress in Six Domains - Less Wrong Discussion

24 Post author: lukeprog 03 August 2013 02:29AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread. Show more comments above.

Comment author: IlyaShpitser 06 August 2013 12:41:48PM 0 points [-]

Shannon's bounds mean that you won't see crazy progress in general in data compression.

Comment author: timtyler 08 August 2013 12:33:50AM 2 points [-]

Compression is closely related to forecasting - which in turn is a large part of intelligence (my estimate is 80% forecasting vs 20% pruning and evaluation). So your thesis bears on the idea of an intelligence explosion.

Comment author: IlyaShpitser 08 August 2013 09:19:09PM 0 points [-]

I guess it's very weak negative evidence (like the fact that NP-complete problems exist, and lots of AI problems are NP-complete).

The steelman of a pro-explosion argument is probably very sophisticated and easily avoids these issues.

Comment author: timtyler 09 August 2013 10:08:57AM *  0 points [-]

Well, creatures able to compress their sense data to 0.22 of its original size might drive to extinction creatures who can only manage a compression ratio of 0.23 - in an evolutionary contest. 'Small' differences in modeling ability - as measured by compression ratios - could thus have large effects on the world.

However compression (and AI) are hard problems and run rapidly into diminishing returns - at least if you measure them in this way.

Comment author: IlyaShpitser 10 August 2013 06:28:47PM -2 points [-]

Well, creatures able to compress their sense data to 0.22 of its original size might drive to extinction creatures who can only manage a compression ratio of 0.23 - in an evolutionary contest.

Sounds pretty speculative.

Comment author: saturn 09 August 2013 03:35:08PM 0 points [-]

Unless you have a model that exactly describes how a given message was generated, its Shannon entropy is not known but estimated... and typically estimated based on the current state of the art in compression algorithms. So unless I misunderstood, this seems like a circular argument.

Comment author: IlyaShpitser 09 August 2013 05:47:25PM *  0 points [-]

You need to read about universal coding, e.g. start here:

http://en.wikipedia.org/wiki/Universal_code_(data_compression%29

I highly recommend Thomas and Cover's book, a very readable intro on info theory. The point is we don't need to know the distribution from which the bits came from to do very well in the limit. (There are gains to be had in the region before "in the limit," but these gains will track the kinds of gains you get in statistics if you want to move beyond asymptotic theory).

Comment author: Alsadius 06 August 2013 03:43:39PM 0 points [-]

I'm not as familiar with Shannon's work as I'd like to be, but don't Shannon's bounds limit compression efficiency and not compression speed? This is a speed test, not an efficiency test.

Comment author: IlyaShpitser 07 August 2013 01:32:18AM *  0 points [-]

???

From the url:

Create a compressed version (self-extracting archive) of the 100MB file enwik8 of less than about 16MB. More precisely:

[etc]

There are restrictions on runspeed, but they seem to be there to limit brute forcing all possible approaches. There is also talk of "compressing intelligently." I am not even sure what a pure speed test of compression would mean -- the identity function is pretty fast!


There are linear time implementations of compression schemes that reach the Shannon bound, so any runtime improvement of an asymptotically optimal encoder will not be dramatic either...

Comment author: Alsadius 07 August 2013 01:47:32AM *  1 point [-]

This is my point - the algorithm challenge is to do a given compression task quickly, and Shannon has no limits I'm aware of on how fast that can be done. So there's no problem of approaching physical limits the way there would be for, say, "Encode this file in the smallest space possible".

Edit: The stuff below the line either wasn't there when I replied or I didn't see it. You're right though, if linear-time implementations already exist, then you're basically hard-bounded.

Comment author: IlyaShpitser 07 August 2013 04:49:35PM 0 points [-]

To be sure, constant factors matter a lot! But I think in this case the curve would be pretty boring.