Kyre — LessWrong

Ilya Sutskever created a new AGI startup

I’m worried about this cracked team.

Prisoners' Dilemma with Costs to Modeling

Very nice. This is the cleanest result on cognitive (or rationality) costs in co-operative systems that I've seen. Modal combat seems kind of esoteric compared to, say, iterated prisoners' dilemma tournaments with memory, but it pays off nicely here. It gives you the outcomes of a set of other-modelling agents (without e.g. doing a whole lot of simulation), and the box-operator depth then plugs in as a natural modelling-cost measure.

Did you ever publish any of your modal combat code (I have a vague recollection that you had some Haskell code ?) ?

Thought experiment: coarse-grained VR utopia

Kyre9y50

Don't humans have to give up on doing their own science then (at least fundamental physics) ?

I guess I can have the FAI make me a safe "real physics box" to play with inside the system; something that emulates what it finds out about real physics.

Becoming a Better Community

Kyre9y00

If you failed you'd want to distinguish between (a) rationalism sucking, (b) your rationalism sucking, or (c) EVE already being full of rationalists.

Whether or not success in Eve is relevant outside Eve is debatable, but I think the complexity, politics and intense competition means that it would be hard to find a better online proving ground.

Inbox zero - A guide - v2 (Instrumental behaviour)

Kyre9y20

Good advice, but I would go further. Don't use your inbox as a to-do list at all. I maintain a separate to-do list for roughly three reasons.

(1) You can't have your inbox in chronological and priority order. Keeping an inbox and email folders in chronological order is good for searching and keeping track of email conversations.

(2) Possibly just my own psychological quirk, but inbox emails feel like someone waiting for me and getting impatient. I can't seem to get away from my inbox fundamentally representing a communications channel with people on the other end. Watching me.

(3) When I "do email", I know I'm done when I have literally inbox zero, and I get the satisfaction of that several times a day.

I have found that I need scrupulous email and task accounting though. Every email gets deleted (and that advice on unsubscribing is good), or handled right away (within say 2 minutes), or gets a task on a to-do list and the email goes into a subject folder for when it comes to be dealt with.

Should you share your goals

Kyre9y20

Not just the environment in which you share your goals, but also how you suspect you will react to the responses you get.

When reading through these two scenarios, I can just as easily imagine someone reacting in exactly the opposite way. That is, in the first case, thinking "gosh, I didn't know I had so many supportive friends", "I'd better not let them down", and generally getting a self-reinforcing high when making progress.

Conversely, say phase 1 had failed and got the responses stated above. I can imagine someone thinking "hey my friends are a bunch of jerks" and "they're right, I'm probably going to fail again", and then developing a flinch thinking about weight loss, and losing interest in trying.

Measuring the Sanity Waterline

Kyre9y00

My five minutes thoughts worth.

Metrics that might useful (on the grounds that in hindsight people would say that they made bad decisions): traffic accident rate, deaths due to smoking, bankruptcy rates, consumer debt levels.

Experiments you could do if you could randomly sample people and get enough of their attention: simple reasoning tests (e.g. confirmation bias), getting people to make some concrete predictions and following them up a year later.

Maybe something measuring people's level of surprise at real vs fake facebook news (on the grounds people should be more surprised at fake news) ?

My problems with Formal Friendly Artificial Intelligence work

Kyre9y90

Doing theoretical research that ignores practicalities is sometimes turns out to be valuable in practice. It can open a door to something you assumed to be impossible; or save a lot of wasted effort on a plan that turns out to have an impossible sub-problem.

A concrete example of first category might be something like quantum error correcting codes. Prior to that theoretical work, a lot of people thought that quantum computers were not worth pursuing because noise and decoherence would be an insurmountable problem. Quantum fault tolerance theorems did nothing to help solve the very tough practical problems of building a quantum computer, but it did show people that it might be worth pursuing - and here we are 20 years later closing in on practical quantum computers.

I think source code based decision theory might have something of this flavour. It doesn't address all those practical issues such as how one machine comes to trust that another machine's source code is what it says. That might indeed scupper the whole thing. But it does clarify where the theoretical boundaries of the problem are.

You might have thought "well, two machines could co-operate if they had identical source code, but that's too restrictive to be practical". But it turns out that you don't need identical source code if you have the source code and can prove things about it. Then you might have though "ok, but those proofs will never work because of non-termination and self-reference" ... and it turns out that that is wrong too.

Theoretical work like this could inform you about what you could hope to achieve if you could solve the practical issues; and conversely what problems are going to come up that you are absolutely going to have to solve.

Non-Fiction Book Reviews

Kyre9y00

Will second "Good and Real" as worth reading (haven't read any of the others).

Earning money with/for work in AI safety

Kyre10y30

Maybe translating AI safety literature into Japanese would be a high-value use of your time ?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments