Since this challenge was solved pretty quickly, I thought I'd try to make a substantially more difficult program induction challenge.

The Challenge

Find a short program which generates this binary file.

Hints (because I expect this to be extremely hard)

  • For a sense of how simple the program is, I generated the file with a python script of a similar length and complexity to the solution to the prior challenge.
    • Other than writing out to the file at the end, the generator uses only basic primitives of the python language (integer arithmetic, logic and flow control, arrays)
  • The only numeric constants used in the generator script are .
  • The generator script takes nearly 10 minutes to run on my laptop!
    • So if you have any good hypotheses, you might want to test them with a compiled language for a 100-fold speedup over python.
New Comment
8 comments, sorted by Click to highlight new comments since:

Meta: in the future, it would probably be better to space these challenges out more. I think that a lot of people blew their effort and interest on the previous one. It might be interesting if these came out once a month like the D&D.Sci challenges.

Do you think it would be a good idea to delete this and repost it at the beginning of August?

Alternatively, I could repost an easier version in a month, since I'd be shocked if anyone solved this one. Though I guess that was part of the point -- to show that induction of fairly simple programs is super hard for humans in general. The previous challenge was too easy because each element in the output sequence was a simple function of two elements earlier in the sequence (and the 'elements' were easy to identify as they were output in the standard floating point format). On the other hand, it would be neat to make tough-but-solvable program induction challenges a recurring monthly thing as you suggested. Thoughts?

What I'd like to see is an ordinary piece of media (i.e. an image), but reformatted in an 'alien' way. Change RGB to the NTSC color system, then order the pixels in a non-cartesian way (or just swap rows and columns, or something), then write the data in your own invented number format, then compress it using an atypical algorithm, then add some arbitrary bytes to the front representing metadata, etc. Maybe when delivering it to the community, tell them it's an image, and see how long before people figure it out it's an image of a tiger or whatever.

I have something in the pipeline, but it'll take a while... if it's trying to be "actually" alien, it's kinda important that it's internally consistent. "Add some arbitrary bytes to [...represent] metadata" is exactly what you don't want to do. Because if you do, sure, it'll be hard, it'll (probably) be eventually solvable, but it'll be... somewhat dissatisfying. Same for using stuff like NTSC, it's just... why would they come up with exactly that? It just doesn't make any sense!

So, in case anyone else wants to also make a good challenge in this style, here's some guidelines. The first and most important one:

  • Ensure that every. single. layer. has some clearly recognizable artifact / regularity. Blocks + checksums, pilot tones, carrier waves, whatever. It can occasionally be rather obscure / hard to identify, but never completely absent. If you have to get through ≈10 layers of stuff to solve the whole thing, getting stuck for a few hours (or even days) with no indication of whether you're even heading in the right direction is very unsexy and will likely kill the mood. (It's also internally consistent w.r.t. the back story of an alien message: For an intentional transmission, they want you to understand what they're sending. Even an accidental or amateur message would likely use (their) standard protocols / file formats, which will likely have some regularity – headers, blocks, trailers, ... – because that makes it easier to debug problems.)

The rest falls into the "do some world building" and "ensure internal consistency" buckets:

  • Weird senses are cool, not everyone needs to have a focus on vision, especially not on exactly 3 types of receptors focusing on exactly the same 400–700 nm range as we do – the electromagnetic spectrum is way larger. (Even animals already have pretty crazy stuff going on!) Vision doesn't have to be the primary sense (with low-resolution "eyes", other senses may be preferable.) Having a two-dimensional arrangement of identical building blocks (a.k.a. "images" made of "pixels") isn't necessarily ideal for all senses. Different senses likely also result in other things being obvious, considered likely or unlikely, or being missed entirely when trying to find message formats that are most widely understood. (Beings whose senses revolve entirely around sound and movements may not have static things like books or writing at all, a cave-dwelling culture near a very aggressive star may consider all EM / "light" exposure as dangerous or deadly (compare: the popular image of radioactivity for us) and strongly avoid developments in that direction.)

  • Try not to copy "historical accidents" blindly. Most existing media formats are harshly optimized towards being barely good enough to fool human senses. (Especially lossy compression generally exploits shortcomings of human senses to remove stuff that we would be unable to sense in the presence of stronger surrounding stimuli, but even simple "lossless" formats are generally human-centric. Historically, memory/storage was not cheap at all, and you really wanted to get the most out of your bits.) An alien message will likely try to avoid optimizing in the wrong way, but may well fail at that in interesting ways.
    RGB bakes material properties and lighting into just three numbers and ignores all other things, even though we are able to see e.g. metamerism effects in the real world. Stereo audio bakes sound waves into two channels, even though we can detect much more finely where sounds are coming from in the real world. (Your very personal ear shape influences how things sound coming from different directions, so while you can shove approximate directions and distances into two channels, it sounds quite different from what you're used to in the real world.) Different priorities will result in different trade-offs.
    Even for computers, binary isn't required. (There were decimal computers, that's just way more complex than 2 states and so eventually that won out. If the prevailing number base is 3 or 4 instead of 10, that's much more doable.) Same goes for 8-bit groups as the next building block. (16 is "more square" and behaves nicer in some contexts. But we also had what's basically 9-bit computers, that were using a 36-bit word size, so the next level of grouping doesn't have to cleanly divide at all. It also provides space for e.g. a parity bit if you do want to stick to "base-powered" block sizes.)

  • Look at how things would actually transmit – you don't get a convenient blob of bits. (That would already lock in a binary base, potentially bytes as the next level too, making it much harder to do actual fun stuff with weird senses.) Actually, you'd get some noisy recording (likely with burst errors blotting out information here and there) from which you have to recover the frequency at which symbols are coded, then recover that sequence and try to recognize and fix errors, and so on... beyond parity bits, there's full error correction codes, interleaving, and all the rest of coding theory. You may want to repeat the same message in a loop, or have a sequence of messages to make it easier to decode the main message (e.g. stylized geometric shapes, simple but anisotropic patterns, or chirps to help fix directions and scales before the "messy" recorded data like images or sound.)

...and so on like that, but I hope that's enough to send anyone considering to make such a challenge at the "actual" alien message level in approximately the right direction. Eagerly anticipating silly alien selfies! ;)

I don't know if it's necessary to delete, but I'll bet you'll get a lot more uptake if you repost another in August.

Not the main point of the post, just a small comment

So if you have any good hypotheses, you might want to test them with a compiled language for a 100-fold speedup over python

 

This used to be a kind of common knowledge that python is 100 times slower than C++ for example - at least I heard it quite a lot in the world of competitive programming - but I think this is kind of false nowaday

Libraries like NumPy are basically compiled C code accessible to the python user; thus, using such libraries will yield comparable performance to C/C++. Since a lot of operations can be written using NumPy or similar libraries, python is in fact no longer that slow.

I don't have yet access to the code of this challenge, so maybe this one cannot be written this way, and therefore it would genuinely be 100 times faster using another language. But, in general, many standard operations can be done in python without such a big slowdown.

I doubt you could use numpy to compute this efficiently, since (afaik) numpy only gives you a big speedup on very regular vector/matrix type computations, which this is not.

[+][comment deleted]10