Wiki Contributions

Comments

The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to 'safe' solutions reported by the Oracle.

Sure, I mostly agree with the distinction you're making here between "sins of commission" and "sins of omissions". Contrary to you, though, I believe that getting rid of the threat of "sins of commission" is extremely useful. If the output of the Oracle is just optimized to fulfill your satisfaction goal and not for anything else, you've basically gotten rid of the superintelligent adversary in your threat model.

'Devising a plan to take over the world' for a misaligned Oracle is not difficult, it is easy, because the initial steps like 'unboxing the Oracle' are the default convergent outcome of almost all ordinary non-dangerous use which in no way mentions 'taking over the world' as the goal. ("Tool AIs want to be Agent AIs.") To be safe, an Oracle has to have a goal of not taking over the world.

I agree that for many ambitious goals, 'unboxing the Oracle' is an instrumental goal. It's overwhelmingly important that we use such an Oracle setup only for goals that are achievable without such instrumental goals being pursued as a consequence of a large fraction of the satisficing outputs. (I mentioned this in footnote 2, but probably should have highlighted it more.) I think this is a common limitation of all soft-optimization approaches.

There are many, many orders of magnitude more ways to be insecure than to be secure, and insecure is the wide target to hit.

This is talking about a different threat model than mine. You're talking here about security in a more ordinary sense, as in "secure from being hacked by humans" or "secure from accidentally leaking dangerous information". I feel like this type of security concerns should be much easier to address, as you're defending yourself not against superintelligences but against humans and accidents.

The example you gave about the Oracle producing a complicated plan that leaks the source of the Oracle is an example of this: It's trivially defended against by not connecting the device the Oracle is running on to the internet and not using the same device to execute the great "cure all cancer" plan. (I don't believe that either you or I would have made that mistake!)

Ah, I think there was a misunderstanding. I (and maybe also quetzal_rainbow?) thought that in the inverted world also no "apparently-very-lucrative deals" that turn out to be scams are known, whereas you made a distinction between those kind of deals and Ponzi schemes in particular.

I think my interpretation is more in the spirit of the inversion, otherwise the Epistemologist should really have answered as you suggested, and the whole premise of the discussion (people seem to have trouble understanding what the Spokesperson is doing) is broken.

I think this would be a good argument against Said Achmiz's suggested response, but I feel the text doesn't completely support it, e.g. the Epistemologist says "such schemes often go through two phases" and "many schemes like that start with a flawed person", suggesting that such schemes are known to him.

In Section 5 we discuss why expect oversight and control of powerful AIs to be difficult.

Another typo, probably missing a "we".

The soft optimization post took 24 person-weeks (assuming 4 people half-time for 12 weeks) plus some of Jeremy's time.

Team member here. I think this is a significant overestimate, I'd guess at 12-15 person-weeks. If it's relevant I can ask all former team members how much time they spent; it was around 10h per week for me. Given that we were beginners and spent a lot of time learning about the topic, I feel we were doing fine and learnt a lot. 

Working on this part-time was difficult for me and the fact that people are not working on these things full-time in the camp should be considered when judging research output.

Missile attacks are not piracy, though, right?

It's good that you learned a few things from these incidents, but I'm sceptical of the (different) claim implied by the headline that Peter Zeihan was meaningfully correct here. If you interpret "directions" imprecisely enough, it's not hard to be sometimes directionally correct.

I know this answer doesn't qualify, but very likely the best you can currently do is: Don't do it. Don't train the model.

(I downvoted your comment because it's just complaining about downvotes to unrelated comments/posts and not meaningfully engaging with the topic at hand)

"Powerful AIs Are Black Boxes" seems like a message worth sending out

Everybody knows what (computer) scientists and engineers mean by "black box", of course.

I guess it's hard to keep "they are experimenting with / building huge amounts of tanks" and "they are conducting combined arms exercises" secret from France and Russia, so they would have a lot of advance warning and could then also develop tanks.

But if you have lot more than a layman's understanding of tank design / combined arms doctrine, you could still come out ahead in this.

Load More