User Comment Replies

The Lebowski Theorem — Charitable Reads of Anti-AGI-X-Risk Arguments, Part 2

This is interesting — maybe the "meta Lebowski" rule should be something like "No superintelligent AI is going to bother with a task that is harder than hacking its reward function in such a way that it doesn't perceive itself as hacking its reward function." One goes after the cheapest shortcut that one can justify.

2Donald Hobson2y

Firstly "Bother" and "harder" are strange words to use. Are we assuming lazy AI? Suppose action X would hack the AI's reward signal. The AI is totally clueless of this, has no reason to consider X and doesn't do X. If the AI knows what X does, it still doesn't do it. I think the AI would need some sort of doublethink, to realize that X hacked it's reward, yet also not realize this. I also think this claim is factually false. Many humans can and do set out towards goals far harder than accessing large amounts of psychoactives.

1Phenoca3y

No. People with free will do activities we consider meaningful, even when it isn't a source of escapism.

3avturchin3y

I met the idea of Lebowski theorem as an argument which explains the Fermi paradox: all advance civilizations or AIs wirehead themselves. But here I am not convinced. For example, if civilization consists of many advance individuals and many of them wirehead themselves, then remaining will be under pressure of Darwinian evolution and eventually only the ones survive who find the ways to perform space exploration without wireheading. Maybe they will be some limited specialized minds with very specific ways of thinking – and this could explain absurdity of observed UAP behaviour. Actually, I explored more about wireheading here: "Wireheading as a Possible Contributor to Civilizational Decline".

2avturchin3y

I sent my above comment for the following competition and recommend you to send your post too https://ftxfuturefund.org/announcing-the-future-funds-ai-worldview-prize/

3avturchin3y

Yes, very good formulation. I would add "and most AI aligning failures are types of meta Lebowski rule"

Charitable Reads of Anti-AGI-X-Risk Arguments, Part 1

sstich3y30

Not sure if anyone made a review of this, but it seems to me that if you compare the past visions of the future, such as science fiction stories, we have less of everything except for computers.

Yes! J Storrs Hall did an (admittedly subjective) study of this, and argues that the probability of a futurist prediction from the 1960s being right decreases logarithmically with the energy intensity required to achieve it. This is from Appendix A in "Where's my Flying Car?"

So it is not obvious in my opinion in which direction to update. I am tempted to say t

sstich3y20

Thanks — I'm not arguing for this position, I just want to understand the anti AGI x-risk arguments as well as I can. I think success would look like me being able to state all the arguments as strongly/coherently as their proponents would.

LESSWRONG
LW

All of sstich's Comments + Replies