All of nicholashalden's Comments + Replies

It strikes me that you're wearing a lot of risk beyond the face value bet. Even if we assume everyone is acting in good faith, there's likely credit risk across 10 different people promising a $100k+ payout (because most people don't have that much cash, and even among those who do, there's some likelihood of falling below that level of liquidity after a 5 year period). On your side, it looks like you're just sending people your side of the bet before resolution, so they wear zero credit risk, even though the credit risk on your end was smaller to begin wi... (read more)

Out of curiosity, is there anywhere you've written about your object-level view on this? The EY post fleshes out what I would call strong consensus on LW thoroughly, is there some equivalence of this that you've put together?

I disagree with the idea that true things necessarily have explanations that are both convincing and short.

I don't think it's necessary for something to be true (there's no short, convincing explanation of eg quantum mechanics), but I think accurate forecasts tend to have such explanations (Tetlock's work strongly argues for this).

I agree there is a balance to be struck between losing your audience and being exhaustive, just that the vast majority of material I've read is on one side of this.

On that point, have you seen any of my videos, and do you have th

... (read more)

This seems to violate common sense. Why would you think about this in log space? 99% and 1% are identical in if(>0) space, but they have massively different implications for how you think about a risk (just like 20 and 70% do!)

It's much more natural way how to think about it (cf eg TE Janes, Probability theory, examples in Chapter IV)

In this specific case of evaluating hypothesis, the distance in the logodds space indicates the strength the evidence you would need to see to update. Close distance implies you don't that much evidence to update between the positions (note the distance between 0.7 and 0.2 is closer than 0.9 and 0.99). If you need only a small amount of evidence to update, it is easy to imagine some other observer as reasonable as you had accumulated a bit or two so... (read more)

It's very strange to me that there isn't a central, accessible "101" version of the argument given how much has been written.

I don't think anyone should make false claims, and this is an uncharitable mischaracterization of what I wrote. I am telling you that, from the outside view, what LW/rationalism gets attention for is the "I am sure we are all going to die", which I don't think is a claim most of its members hold, and this repels the average person because it violates common sense.

The object level responses you gave are so minimal and dismissive that ... (read more)

6RobertM
Yeah, I probably should have explicitly clarified that I wasn't going to be citing my sources there.  I agree that the fact that it's costly to do so is a real problem, but Robert Miles points out, some of the difficulty here is insoluble. There are several, in fact; but as I mentioned above, none of them will cover all the bases for all possible audiences (and the last one isn't exactly short, either).  Off the top of of my head, here are a few: * An artificially structured argument for expecting AGI ruin  * The alignment problem from a deep learning perspective  * AGI safety from first principles: Introduction 

Thanks for your reply. I welcome an object-level discussion, and appreciate people reading my thoughts and showing me where they think I went wrong.

  1. The hidden complexity of wishes stuff is not persuasive to me in the context of an argument that AI will literally kill everyone. If we wish for it not to, there might be some problems with the outcome, but it won't kill everyone. In terms of Bay Area Lab 9324 doing something stupid, I think by the time thousands of labs are doing this, if we have been able to successfully wish for stuff without catastrophe bei
... (read more)
4lukemarks
Soft upvoted your reply, but have some objections. I will respond using the same numbering system you did such that point 1 in my reply will address point 1 of yours.  1. I agree with this in the context of short-term extinction (i.e. at or near the deployment of AGI), but would offer that an inability to remain competitive and loss of control is still likely to end in extinction, but in a less cinematic and instantaneous way. In accordance with this, the potential horizon for extinction-contributing outcomes is expanded massively. Although Yudkowsky is most renowned for hard takeoff, soft takeoff has a very differently shaped extinction-space and (I would assume) is a partial reason for his high doom estimate. Although I cannot know this for sure, I would imagine he has a >1% credence in soft takeoff. 'Problems with the outcome' seem highly likely to extend to extinction given time.  2. There are (probably) an infinite number of possible mesa-optimizers. I don't see any reason to assume an upper bound on potential mesa-optimization configurations, and yes; this is not a 'slam dunk' argument. Rather, as derived from the notion that even slightly imperfect outcomes can extend to extinction, I was suggesting that you are trying to search an infinite space for a quark that fell out of your pocket some unknown amount of time ago whilst you were exploring said space. This can be summed up as 'it is not probable that some mesa-optimizer selected by gradient descent will ensure a Good Outcome'. 3. This still does not mean that the only form of brain hacking is via highly immersive virtual reality. I recall the Tweet that this comment came from, and I interpreted it as a highly extreme and difficult form of brain hacking used to prove a point (the point being that if ASI could accomplish this it could easily accomplish psychological manipulation). Eliezer's breaking out of the sandbox experiments circa 2010 (I believe?) are a good example of this. 4. Alternatively you

Thank you for the reply. I agree we should try and avoid AI taking over the world.

On "doom through normal means"--I just think there are very plausibly limits to what superintelligence can do. "Persuasion, hacking, and warfare" (appreciate this is not a full version of the argument) don't seem like doom to me. I don't believe something can persuade generals to go to war in a short period of time, just because it's very intelligent. Reminds me of this.
 

On values--I think there's a conflation between us having ambitious goals, and whatever is actually b... (read more)

4Daniel Kokotajlo
Thanks to you likewise! On doom through normal means: "Persuasion, hacking, and warfare" aren't by themselves doom, but they can be used to accumulate lots of power, and then that power can be used to cause doom. Imagine a world in which human are completely economically, militarily, and politically obsolete, thanks to armies of robots directed by superintelligent AIs. Such a world could and would do very nasty things to humans (e.g. let them all starve to death) unless the superintelligent AIs managing everything specifically cared about keeping humans alive and in good living conditions. Because keeping humans alive & in good living conditions would, ex hypothesi, not be instrumentally valuable to the economy, or the military, etc. How could such a world arise? Well, if we have superintelligent AIs, they can do some hacking, persuasion, and maybe some warfare, and create that world. How long would this process take? IDK, maybe years? Could be much less. But I wouldn't be surprised if it takes several years, even maybe five years. I'm not conflating those things. We have ambitious goals and are trying to get our AIs to have ambitious goals -- specifically we are trying to get them to have our ambitious goals. It's not much of a stretch to imagine this going wrong, and them ending up with ambitious goals that are different from ours in various ways (even if somewhat overlapping).
4AnthonyC
Remember that persuasion from an ASI doesn't need to look like "text-based chatting with a human." It includes all the tools of communication available. Actually-near-flawless forgeries of any and every form of digital data you could ever ask for, as a baseline, all based on the best possible inferences made from all available real data. How many people today are regularly persuaded of truly ridiculous things by perfectly normal human-scale-intelligent scammers, cults, conspiracy theorists, marketers, politicians, relatives, preachers, and so on? The average human, even the average IQ 120-150 human, just isn't that resistant to persuasion in favor of untrue claims.
5metachirality
A few things I've seen give pretty worrying lower bounds for how persuasive a superintelligence would be: * How it feels to have your mind hacked by an AI * The AI in a box boxes you (content warning: creepy blackmail-y acausal stuff) Remember that a superintelligence will be at least several orders of magnitude more persuasive than character.ai or Stuart Armstrong.