Thanks to everyone who participated in the Alien Message contest, and congratulations on successfully being able to decode the message. I gave a time limit of 2 weeks at the start of the contest, but you completely solved it in a mere 2 days.

Solution

Here is the code that generated the message:

/*
 * generate_message.cxx
 * 
 * This code operates a continuous tag-system, and writes the resulting
 * data to a file.
 * 
 */

#include <iostream>
using std::cout, std::endl;
#include <fstream>
using std::ofstream;
#include <vector>
using std::vector;
#include <cmath>
using std::rint;

using std::string;


ofstream& open_wb(string filename) {
  return *new std::ofstream(filename, std::ios::binary);
}

double abs(double x) {
  return x >= 0 ? x : -x;
}

class TagSystem {
  public:
  int pos{ 0 };
  vector<double> data;
  
  TagSystem(double x, double y, double z) {
    data.push_back(x);
    data.push_back(y);
    data.push_back(z);
  }
  
  void iterate() {
    uint readsize = 1;
    if (data.size() - pos >= 2 && abs(data[pos]) < 2 && abs(data[pos+1]) < 2) {
      readsize = 2;
    }
    if (data.size() - pos < readsize) {
      cout << "Tag system halted at position " << pos << endl;
    }
    switch (readsize) {
      case 1:
        data.push_back(data[pos] / 2);
        break;
      case 2:
        data.push_back(data[pos]*data[pos+1] + 1.0);
        data.push_back(data[pos] - data[pos+1]);
        data.push_back(data[pos+1] - data[pos]);
        break;
    }
    pos += readsize;
  }
};

void write_double_to_file(double x, ofstream& file) {
  file.write((char*)&x, sizeof(double));
}

int main(int argc, char **argv)
{
  cout << "Writing file..." << endl;
  ofstream& fout = open_wb("./out.bin");
  TagSystem ts(1.3, 2.1, -0.5);
  for (int i = 0; i < 1000; i++) {
    ts.iterate();
  }
  for (double elem : ts.data) {
    write_double_to_file(elem, fout);
    cout << elem << endl;
  }
  fout.close();
  return 0;
}

The message is generated by a continuous version of a tag system (actually a slight generalization, since the number of items consumed at each step can vary). Instead of a discrete set of symbols from an alphabet, the items are actually numbers, and the relationship between the numbers consumed and the numbers written is defined by a simple mathematical function:

To keep the numbers bounded (so I wouldn't end up with a bunch of NaNs), I chose to divide the number by 2 before putting it back on the end of the queue if it had an absolute value greater than 2, rather than feed it to the above function. The file was generated by just writing down all the numbers that passed through the tag system, in order. The actual encoding of the numbers in the file was as double-precision floats (more on this later).

I thought I was constructing a fairly tricky puzzle by having a large distance in the file between numbers and their children. Nevertheless, a full and correct solution to the contest was posted by harfe here, a mere 2 days after the start of the contest. Much credit also goes to gjm for identifying the encoding of the file to be floating point and to both gjm and Rafael Harth for noticing various local numerical relationships. Also, blf was early in identifying some of the non-local structure created by the tag system. Many other commenters also had import insights, but as of now, the original post has nearly 100 comments on it, so I can't list you all here.

Reflections

Floating point

Many people were unhappy with the fact that encoding of the file was standard double precision floating point, rather than anything more exotic and alien. I can provide a few reasons for this:

Watsonian reasons for the file being floating point encoded:

  • The aliens sent their message using a continuous transmission channel, like the frequency shift of a pulsar relative to its average or something like that. NASA measured this continuous value and stored the result as floating point data.
  • Alternatively, and probably more likely: Your friend was indeed pranking you.

Doylist reasons for the file being floating point encoded:

  • When I created the contest, I was not thinking of working out the encoding as being an interesting part of the challenge, as opposed to working out the higher level structure of the tag system.
  • I asked for a solution in the format of a program that generated the values. In general when compressing a stream of data generated by a system based on real numbers, the program would need to contain a table of corrections due to imprecision in the representation of the numbers. The compression is still very good, since the number of bits required to specify each correction is small, but it's rather inconvenient. If I use floating point, though, then I can generally expect that everyone's computer is going to round off the errors in the same way, and so no table of corrections is needed.

All that being said, I regret this choice, and next time I would probably try to use a more "alien" way of representing the data. (Probably I would choose a discrete system to generate the data, to avoid the issue of rounding.) I suspect that the backstory caused many people to reject the "this is just floating point" hypothesis, despite the evidence for it, so I think that aspect of the contest ended up being a little unfair.

Other

Overall, this challenge was a little too easy. Besides using a standard and well-known encoding, the message consisted of the entire execution of the system, rather than being a series of measurements from a much larger system. There's probably also room for that larger system to be a little more complicated. So I'd advise anyone trying to run their own version of this contest to use weirder encodings, to not have all the data in your program included in the message, and to consider bumping up the trickyness of the program a few notches up from what I did here. (Though you should also try not to err too far in the other direction of making the task pointlessly difficult. It's trivial to make the task nearly impossibly by letting the message be the output of a cryptographic pseudorandom number generator.)

Still, considering the great speed with which this was solved, I'm counting this as one point for the power of intelligence to unravel weird-but-structured data. Thanks again to everyone who participated.

New Comment
2 comments, sorted by Click to highlight new comments since:

>The aliens sent their message using a continuous transmission channel, like the frequency shift of a pulsar relative to its average or something like that. NASA measured this continuous value and stored the result as floating point data.

 

Then it makes no sense for them to publish it in binary without mentioning the encoding, or making it part of the puzzle to begin with.

It was a fun challenge! Fun to see it solved, as well! Thanks