Very nice. This is the cleanest result on cognitive (or rationality) costs in co-operative systems that I've seen. Modal combat seems kind of esoteric compared to, say, iterated prisoners' dilemma tournaments with memory, but it pays off nicely here. It gives you the outcomes of a set of other-modelling agents (without e.g. doing a whole lot of simulation), and the box-operator depth then plugs in as a natural modelling-cost measure.
Did you ever publish any of your modal combat code (I have a vague recollection that you had some Haskell code ?) ?
Don't humans have to give up on doing their own science then (at least fundamental physics) ?
I guess I can have the FAI make me a safe "real physics box" to play with inside the system; something that emulates what it finds out about real physics.
If you failed you'd want to distinguish between (a) rationalism sucking, (b) your rationalism sucking, or (c) EVE already being full of rationalists.
Whether or not success in Eve is relevant outside Eve is debatable, but I think the complexity, politics and intense competition means that it would be hard to find a better online proving ground.
Good advice, but I would go further. Don't use your inbox as a to-do list at all. I maintain a separate to-do list for roughly three reasons.
(1) You can't have your inbox in chronological and priority order. Keeping an inbox and email folders in chronological order is good for searching and keeping track of email conversations.
(2) Possibly just my own psychological quirk, but inbox emails feel like someone waiting for me and getting impatient. I can't seem to get away from my inbox fundamentally representing a communications channel with people on the other end. Watching me.
(3) When I "do email", I know I'm done when I have literally inbox zero, and I get the satisfaction of that several times a day.
I have found that I need scrupulous email and task accounting though. Every email gets deleted (and that advice on unsubscribing is good), or handled right away (within say 2 minutes), or gets a task on a to-do list and the email goes into a subject folder for when it comes to be dealt with.
Not just the environment in which you share your goals, but also how you suspect you will react to the responses you get.
When reading through these two scenarios, I can just as easily imagine someone reacting in exactly the opposite way. That is, in the first case, thinking "gosh, I didn't know I had so many supportive friends", "I'd better not let them down", and generally getting a self-reinforcing high when making progress.
Conversely, say phase 1 had failed and got the responses stated above. I can imagine someone thinking "hey my friends are a bunch of jerks" and "they're right, I'm probably going to fail again", and then developing a flinch thinking about weight loss, and losing interest in trying.
My five minutes thoughts worth.
Metrics that might useful (on the grounds that in hindsight people would say that they made bad decisions): traffic accident rate, deaths due to smoking, bankruptcy rates, consumer debt levels.
Experiments you could do if you could randomly sample people and get enough of their attention: simple reasoning tests (e.g. confirmation bias), getting people to make some concrete predictions and following them up a year later.
Maybe something measuring people's level of surprise at real vs fake facebook news (on the grounds people should be more surprised at fake news) ?
Doing theoretical research that ignores practicalities is sometimes turns out to be valuable in practice. It can open a door to something you assumed to be impossible; or save a lot of wasted effort on a plan that turns out to have an impossible sub-problem.
A concrete example of first category might be something like quantum error correcting codes. Prior to that theoretical work, a lot of people thought that quantum computers were not worth pursuing because noise and decoherence would be an insurmountable problem. Quantum fault tolerance theorems did nothing to help solve the very tough practical problems of building a quantum computer, but it did show people that it might be worth pursuing - and here we are 20 years later closing in on practical quantum computers.
I think source code based decision theory might have something of this flavour. It doesn't address all those practical issues such as how one machine comes to trust that another machine's source code is what it says. That might indeed scupper the whole thing. But it does clarify where the theoretical boundaries of the problem are.
You might have thought "well, two machines could co-operate if they had identical source code, but that's too restrictive to be practical". But it turns out that you don't need identical source code if you have the source code and can prove things about it. Then you might have though "ok, but those proofs will never work because of non-termination and self-reference" ... and it turns out that that is wrong too.
Theoretical work like this could inform you about what you could hope to achieve if you could solve the practical issues; and conversely what problems are going to come up that you are absolutely going to have to solve.
Will second "Good and Real" as worth reading (haven't read any of the others).
Maybe translating AI safety literature into Japanese would be a high-value use of your time ?
I’m worried about this cracked team.