Is there a better way of discovering strong arguments for a non-expert than asking for them publicly?
Also, it assumes there is a separate module for making predictions, which cannot be manipulated by the agent. This assumption is not very probable in my view.
Isn't this a blocker for any discussion of particular utility functions?
If a simple philosophical argument can cut the expected odds of AI doom by an order of magnitude, we might not change our current plans, but it suggests that we have a lot of confusion on the topic that further research might alleviate.
And more generally, "the world where we almost certainly get killed by ASI" and "The world where we have an 80% chance of getting killed by ASI" are different worlds, and, ignoring motives to lie for propaganda purposes, if we actually live in the latter we should not say we live in the former.
I don't think wireheading is "myopic" when it overlaps with self-maintenance. Classic example would be painkillers; they do ~nothing but make you "feel good now" (or at least less bad), but sometimes feeling less bad is necessary to function properly and achieve long-term value. I think that gratitude journaling is also part of this overlap area. That said I don't know many peoples' experiences with it so maybe it's more prone to "abuse" than I expect.
A corrigible AI is one that is cooperative to attempts to modify it to bring it more in line with what its creators/users want it to be. Some people think that this is a promising direction for alignment research, since if an AI could be guaranteed to be corrigible, even if it end up with wild/dangerous goals, we could in principle just modify it to not have those goals and it wouldn't try to stop us.
"Alignment win condition," as far as I know, is a phrase I just made up. I mean it as something that, regardless of whether it "solves" alignment in a specifi...
I don't trust a hypothetical arbitrary superintelligence but I agree that a superintelligence is too much power for any extant organization, which means that "corrigibility" is not an alignment win condition. An AI resisting modification to do bad things (whatever that might mean on reflection) seems like a feature, not a bug.
Do you believe or allow for a distinction between value and ethics? Intuitively it feels like metaethics should take into account the Goodness of Reality principle, but I think my intuition comes from a belief that if there's some objective notion of Good, ethics collapses to "you should do whatever makes the world More Gooder," and I suppose that that's not strictly necessary.
I do draw a distinction between value and ethics. Although my current best guess is that decision theory does in some sense reduce ethics to a subset of value, I do think it's a subset worth distinguishing. For example, I still have a concept of evaluating how ethical someone is, based on how good they are at paying causal costs for larger acausal gains.
I think the Goodness of Reality principle is maybe a bit confusingly named, because it's not really a claim about the existence of some objective notion of Good that applies to reality per se, and is ...
The adulterer, the slave owner and the wartime rapist all have solid evolutionary reasons to engage in behaviors most of us might find immoral. I think their moral blind spots are likely not caused by trapped priors, like an exaggerated fear of dogs is.
I don't think the evopsych and trapped-prior views are incompatible. A selection pressure towards immoral behavior could select for genes/memes that tend to result in certain kinds of trapped prior.
I also suspect something along the lines of "Many (most?) great spiritual leaders were making a good-faith effort to understand the same ground truth with the same psychological equipment and got significantly farther than most normal people do." But in order for that to be plausible, you would need a reason why the almost-truths they found are so goddamn antimemetic that the most studied and followed people in history weren't able to make them stick. Some of the selection pressure surely comes down to social dynamics. I'd like to think that people who hav...
But in order for that to be plausible, you would need a reason why the almost-truths they found are so goddamn antimemetic that the most studied and followed people in history weren't able to make them stick.
A few thoughts:
That "so now what" doesn't sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.
Alright, based on your phrasing I had thought it was something you believed. I'm open to moral realism and I don't immediately see how phenomena being objectively bad would imply that physics is objectively bad.
Why does something causing something bad make that thing itself bad?
Then I'd like to see some explanation why it doesn't have an answer, which would be adding back to normality.
I'm not saying it doesn't, I'm saying it's not obvious that it does. Normalcy requirements don't mean all our possibly-confused questions have answers, they just put restrictions on what those answers should look like. So, if the idea of successors-of-experience is meaningful at all, our normal intuition gives us desiderata like "chains of sucessorship are continuous across periods of consciousness" and "chains of successorship do not fork or mer...
[...]only autonomous (driven by internal will) actions, derived from duty to the moral law, can be considered moral.
[...] a belief in a thing has a totalising effect on the will of the subject.
What makes this totalizing effect distinct from the "duty to moral law" explicitly called for?
People tend to agree that one should care about the successor of your subjective experience. The question is whether there will be one or not.And this is the question of fact.
But the question of "what, if anything, is the successor of your subjective experience" does not obviously have a single factual answer.
I can conceptualize a world where a soul always stays tied to the initial body, and as soon as its destroyed, its destroyed as well.
If souls are real (and the Hard Problem boils down to "it's the souls, duh"), then a teleporter that doesn't rea...
I'm not convinced that there is a single "way" one should expect to wake up in the morning. If we're talking about things like observer-moments and exotic theories of identity, I don't think we can reliably communicate by analogy to mundane situations, since our intuitions might differ in subtle ways that don't matter in those situations.
For instance, should I believe that I will wake up because that will lead me to make decisions that lead to world-states I prefer, or should I expect to wake up because it is true that I will probably wake up? If the latte...
What does it mean to "should expect" something, if your identity is transmitted across multiple universes with different ground truths?
I think the key to approaches like this is to eschew pre-existing, complex concepts like "human flourishing" and look for a definition of Good Things that is actually amenable to constructing an agent that Does Good Things. There's no guarantee that this would lead anywhere; it relies on some weak form of moral realism. But an AGI that follows some morality-you-largely-agree-with by its very structure is a lot more appealing to me than an AGI that dutifully maximizes the morality-you-punched-into-its-utility-function-at-bootup, appealing enough that I think it's worth wading into moral philosophy to see if the idea pans out.
Should it make a difference? Same iterative computation.
Not necessarily, a lot of information is being discarded when you're only looking at the paper/verbal output. As an extreme example, if the emulated brain had been instructed (or had the memory of being instructed) to say the number of characters written on the paper and nothing else, the computational properties of the system as a whole would be much simpler than of the emulation.
I might be missing the point. I agree with you that an architecture that predicts tokens isn't necessarily non-conscious. ...
I don't think that in the example you give, you're making a token-predicting transformer out of a human emulation; you're making a token-predicting transformer out of a virtual system with a human emulation as a component. In the system, the words "what's your earliest memory?" appearing on the paper are going to trigger all sorts of interesting (emulated) neural mechanisms that eventually lead to a verbal response, but the token predictor doesn't necessarily need to emulate any of that. In fact, if the emulation is deterministic, it can just memorize what...
The number of poor people is much larger than the number of billionaires, but the number of poor people who THINK they're billionaires probably isn't that much larger. Good point about needing to forget the technique, though.
Is this an independent reinvention of the law of attraction? There doesn't seem to be anything special about "stop having a disease by forgetting about it" compared to the general "be in a universe by adopting a mental state compatible with that universe." That said, becoming completely convinced I'm a billionaire seems more psychologically involved than forgetting I have some disease, and the ratio of universes where I'm a billionaire versus I've deluded myself into thinking I'm a billionaire seems less favorable as well.
Anyway, this doesn't seem like a g...
What does it mean when one "should anticipate" something? At least in my mind, it points strongly to a certain intuition, but the idea behind that intuition feels confused. "Should" in order to achieve a certain end? To meet some criterion? To boost a term in your utility function?
I think the confusion here might be important, because replacing "should anticipate" with a less ambiguous "should" seems to make the problem easier to reason about, and supports your point.
For instance, suppose that you're going to get your brain copied next week. After you get ...
I read this paragraph as saying ~the same thing as the original post in a different tone