Hi. Checking back on this account on a whim after a long time of not using it. You're right. 2012!Mestroyer was a noob and I am still cleaning up his bad software.
I would need a bunch of guarantees about the actual mechanics of how the AI was forced to answer before I stopped seeing vague classes of ways this could go wrong. And even then, I'd assume there were some I'd missed, and if the AI has a way to show me anything other than "yes" or "no", or I can't prevent myself from thinking about long sequences of bits instead of just single bits separately, I'd be afraid it could manipulate me.
An example of a vague class of ways this could go wrong is if the AI figures out what my CEV would want usin...
That's the goal, yeah.
It doesn't have to know what my CEV would be to know what I would want in those bits, which is a compressed seed of an FAI targetted (indirectly) at my CEV.
But there are problems like, "How much effort is it required to put into it?" (clearly I don't want it to spend far more compute power than it has trying to come up with the perfect combination of bits which will make my FAI unfold a little bit faster, but I also don't want it to spend no time optimizing. How do I get it to pick somewhere in between without it already wanting to pick the optimal amount of optimization for me?) "What decision theory is my CEV using to decide those bits? (Hopefully not something exploitable, but how do I specify that?)"
Given that I'm turning the stream of bits, 10KiB long I'm about to extract from you into an executable file, through this exact process, which I will run on this particular computer (describe specifics of computer, which is not the computer the AI is currently running on) to create your replacement, would my CEV prefer that this next bit be a 1 or a 0? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? ...
(Note: I would not actually try this.)
~5, huh? Am I to credit?
This reminds me of this SMBC. There are fields (modern physics comes to mind too) that no one outside of them can understand what they are doing anymore, yet that appear to have remained sane. There are more safeguards against postmodernists' failure mode than this one. In fact, I think there is a lot more wrong with postmodernism than that they don't have to justify themselves to outsiders. Math and physics have mechanisms determining what ideas within them get accepted that imbue them with their sanity. In math, there are proofs. In physics, there are ex...
CFAR seems to be trying to be using (some of) our common beliefs to produce something useful to outsiders. And they get good ratings from workshop attendees.
The last point is particularly important, since on one hand, with the current quasi-Ponzi mechanism of funding, the position of preserved patients is secured by the arrival of new members.
Downvoted because if I remember correctly, this is wrong; the cost of preservation of a particular person includes a lump of money big enough for the interest to pay for their maintenance. If I remember incorrectly and someone points it out, I will rescind my downvote.
I use text files. (.txt, because I hate waiting for a rich text editor to open, and I hate autocomplete for normal writing) It's the only way to be able to keep track of them. I sometimes write paper notes when I don't have a computer nearby, but I usually don't keep those notes. Sometimes if I think of something I absolutely have to remember as I'm dozing off to sleep, I'll enter it in my cell phone because I use that as an alarm clock and it's always close to my bed. But my cell phone's keyboard makes writing notes really slow, so I avoid it under normal...
In response to your first paragraph,
Human morality is indeed the complex unfolding of a simple idea in a certain environment. It's not the one you're thinking of though. And if we're talking about hypotheses for the fundamental nature of reality, rather than a sliver of it (because a sliver of something can be more complicated than the whole) you have to include the complexity of everything that contributes to how your simple thing will play out.
Note also that we can't explain reality with a god with a utility function of "maximize the number of copie...
I agree. "AGI Safety"/"Safe AGI" seems like the best option. if people say, "Let me save you some time and tell you right now that's impossible" half of the work is done. The other half is just convincing them that we have to do it anyway because otherwise everyone is doomed. (This is of course, as long as they are using "impossible" in a loose sense. If they aren't, the problem can probably be fixed by saying "our definition of safety is a little bit more loose than the one you're probably thinking of, but not so much more loose that it becomes easy").
Time spent doing any kind of work with a high skill cap.
Edit: Well, okay not any kind of work meeting that criterion, to preempt the obvious LessWrongian response. Any kind you can get paid for is closer to true.
One of my old CS teachers defended treating the environment as adversarial and knowing your source code, because of hackers. See median of 3 killers. (I'd link something, but besides a paper, I can't find a nice link explaining what they are in a small amount of googling).
I don't see why Yudkowsky makes superintelligence a requirement for this.
Also, it doesn't even have to be source code they have access to (which they could if it was open-source software anyway). There are such things as disassemblers and decompilers.
[Edit: removed implication that Yudkowsky thought source code was necessary]
A lot of stuff on LessWrong is relevant to picking which charity to donate to. Doing that correctly is of overwhelming importance. Far more important than working a little bit more every week.
This is the kind of thing that when I take the outside view about my response, it looks bad. There is a scholarly paper refuting one of my strongly-held beliefs, a belief I arrived at due to armchair reasoning. And without reading it, or even trying to understand their argument indirectly, I'm going to brush it off as wrong. Merely based on the kind of bad argument (Bad philosophy doing all the work, wrapped in a little bit of correct math to prove some minor point once you've made the bad assumptions) I expect it to be, because this is what I think it wou...
To be fair to yourself, would you reject it if it were a proof of something you agreed with?
If they had gone out and 'proven' mathematically that sentient robots ARE possible, I'd be equally skeptical - not of the conclusion, but of the validity of the proof, because the core of the question is not mathematical in nature.
Actually, if you do this with something besides a test, this sounds like a really good way to teach a third-grader probabilities.
we're human beings with the blood of a million savage years on our hands. But we can stop it. We can admit that we're killers, but we're not going to kill Today.
My impression is that they don't, because I haven't seen people who do this as low status. But they've all been people who are clearly high status anyway, due to their professional positions.
This is a bad template for reasoning about status in general, because of countersignaling.
Omniscience and omnipotence are nice and simple, but "morally perfect" is a word that hides a lot of complexity. Complexity comparable to that of a human mind.
I would allow ideal rational agents, as long as their utility functions were simple (Edit: by "allow" I mean they don't get the very strong prohibition that a human::morally_perfect agent does) , and their relationship to the world was simple (omniscience and omnipotence are a simple relationship to the world). Our world does not appear to be optimized according to a utility func...
A first approximation to what I want to draw a distinction between is parts of a hypothesis that are correlated with the rest of the parts, and parts that aren't, so that and adding them decreases the probability of the hypothesis more. In the extreme case, if a part of a hypothesis is logically deduced from the other parts, then it's perfectly correlated and doesn't decrease the probability at all.
When we look at a hypothesis, (to simplify, assume that all the parts can be put into groups such that everything within a group has probability 1 conditioned o...
Perhaps I'm misusing the phrase "ontologically basic," I admit my sole source for what it means is Eliezer Yudkowsky's summary of Richard Carrier's definition of the supernatural, "ontologically basic mental things, mental entities that cannot be reduced to nonmental entities." Minds are complicated, and I think Occam's razor should be applied to the fundamental nature of reality directly. If a mind is part of the fundamental nature of reality, then it can't be a result of simpler things like human minds appear to be, and there is no lessening the complexity penalty.
It seemed pretty obvious to me that MIRI thinks defenses cannot be made, whether or not such a list exists, and wants easier ways to convince people that defenses cannot be made. Thus the part that said: "We would especially like suggestions which are plausible given technology that normal scientists would expect in the next 15 years. So limited involvement of advanced nanotechnology and quantum computers would be appreciated. "
I think theism (not to be confused with deism, simulationism, or anything similar) is a position only a crazy person could defend because:
God is an ontologically basic mental entity. Huge Occam penalty.
The original texts the theisms these philosophers probably adhere to require extreme garage-dragoning to avoid making a demonstrably false claim. What's left after the garage-dragoning is either deism or an agent with an extremely complicated utility function, with no plausible explanation for why this utility function is as it is.
I've already listened
Theistic philosophers raised as atheists? Hmm, here is a question you could ask:
"Remember your past self, 3 years before you became a theist. And think, not of the reasons for being a theist you know now, but the one that originally convinced you. What was the reason, and if you could travel back in time and describe that reason, would that past self agree that that was a good reason to become a theist?"
Mestroyer keeps saying this is a personality flaw of mine
An imaginary anorexic says: "I don't eat 5 supersize McDonalds meals a day. My doctor keeps saying this is a personality flaw of mine."
I don't pay attention to theistic philosophers (at least not anymore, and I haven't for a while). There's seeking evidence and arguments that could change your mind, and then there's wasting your time on crazy people as some kind of ritual because that's the kind of thing you think rationalists are supposed to do.
If a few decades is enough to make an FAI, we could build one and either have it deal with the aliens, or have it upload everyone, put them in static storage, and send a few Von Neumann probes faster than it would be economical for aliens to send them to catch us if they are interested in maximum spread, instead of maximum speed, to galaxies which will soon be outside the aliens' cosmological horizon.
Can't answer any of the bolded questions, but...
When you did game programming how much did you enjoy it? For me, it became something that was both productive (relatively, because it taught me general programming skills) and fun (enough that I could do it all day for several days straight, driven by excitement rather than willpower). If you are like me and the difference in fun is big enough, it will probably outweigh the benefit of doing programming exercises designed to teach you specific things. Having a decent-sized codebase that I wrote myself to refac...
The downside to not reading what I write is that when you write your own long reply, it's an argument against a misunderstood version of my position.
I am done with you. Hasta nunca.
3 AM? Y'all are dedicated.
Two regular attendees. Two people who sometimes show up. One person who's new and whose attendance rate hasn't been well-established.
You, personally, probably don't care about all sentient beings. You probably care about other things. It takes a very rare, very special person to truly care about "all sentient beings," and I know of 0 that exist.
I care about other things, yes, but I do care quite a bit about all sentient beings as well (though not really on the level of "something to protect", I'll admit). And I have cared about them before I even heard of Eliezer Yudkowsky. In fact, when I first encountered EY's writing, I figured he did not care about all sentien...
The problem with "Act on what you feel in your heart" is that it's too generalizable. It proves too much, because of course someone else might feel something different and some of those things might be horrible.
It looks like there's all this undefined behavior, and demons coming out the nose from the outside because you aren't looking at the exact details of what's going on in with their feelings that are choosing the beliefs. Though a C compiler given an undefined construct may cause your program to crash, it will never literally cause demons...
Is Year 2000-era computing power your true estimate for a level of computing power that is significantly safer than what comes after?
This quote seems like it's lumping every process for arriving at beliefs besides reason into one. "If you don't follow the process I understand and is guaranteed not to produce beliefs like that, then I can't guarantee you won't produce beliefs like that!" But there are many such processes besides reason, that could be going on in their "hearts" to produce their beliefs. Because they are all opaque and non-negotiable and not this particular one you trust not to make people murder Sharon Tate, does not mean that they all have the same pr...
To have a thing to protect is rare indeed. (Aside: If your thing-to-protect is the same as a notable celebrity, or as the person you learned the concept from, it is not your thing-to-protect.)
Really? What if the thing you protect is "all sentient beings," and that happens to be the same as the thing the person who introduced it to you or a celebrity protects? There're some pretty big common choices (Edited to remove inflationary language) or what a human would want to protect.
Beware value hipsterism.
Or, if by "thing to protect", you ...
Thou shalt never engage in solipsism or defeatism, nor wallow in ennui or existential angst, or in any other way declare that thy efforts are pointless and that exerting thyself is entirely without merit. For just as it is true that matters may never get to the point where they cannot possibly get any worse, so is it true that no situation is impossible to improve upon. Verily, the most blessed of silver linings is the fact that the inherent incertitude of one’s own beliefs also implies that there is never cause for complete hopelessness and despair.
Ab...
Overthinking issues that are really very simple
Counter-signallign as a smart-person mistake
Valuing intelligence above all other qualities
Rigidly adhering to rules -- compare the two endings of "Three Worlds Collide" and the decision by which they diverge.
Expecting other people to always be rational
Got nothing for the last two. I don't think the last one is a mistake that very many people at all make. (I think being right about things has surprising benefits well past the point that most people can see it having benefits).
Other smart person mistak...
Context: Aang ("A") is a classic Batman's Rule (never kill) hero, as a result of his upbringing in Air Nomad culture. It appears to him that he must kill someone in order to save the world. He is the only one who can do it, because he's currently the one and only avatar. Yangchen ("Y") is the last avatar to have also been an Air Nomad, and has probably faced similar dilemmas in the past. Aang can communicate with her spirit, but she's dead and can't do things directly anymore.
The story would have been better if Aang had listened to her advice, in my opinion.
And anyhow, why didn't they forcibly sedate every human until after the change? Then if they decided it wasn't worthwhile they could choose to die then.
It wouldn't be their own value system making the decision. It would be the modified version after the change.
Unrelatedly, you like Eliezer Yudkowsky's writing, huh? You should read HPMOR.
Something that's helped me floss consistently: (a) getting a plastic holder thing, not the little kind where it's still extremely difficult to reach your back teeth, but a reusable one with a long handle that you wrap floss onto, and (b) keeping it next to my computer, within arms reach.
If you are told a billion dollars hasn't been taxed from people in a city, how many people getting to keep a thousand dollars (say) do you imagine? Probably not a million of them. How many hours not worked, or small things that they buy do you imagine? Probably not any.
But now that I think about it, I'd rather have an extra thousand dollars than be able to drink at a particular drinking fountain.
But I don't think fairness the morality center is necessarily fairness over differing amounts of harm. It could be differing over social status. You could have an inflated sense of fairness, so that you cared much more than the underlying difference in what people get.
You're familiar with the idea of anthropomorphization, right? Well, by analogy to that, I would call what you did here "rationalistomorphization," a word I wish was added to LessWrong jargon.
This reaction needs only scope insensitivity to explain, you don't need to invoke purity. Though I actually agree with you that liberals have a disgust moral center.
What is the best textbook on datamining? I solemnly swear that upon learning, I intend to use my powers for good.
This sounds like bad instrumental rationality. If your current option is "don't publish it in paperback at all", and you are presented with an option you would be willing to take, publishing at a certain quality, if that quality was the best quality, then the fact that there may be better options you haven't explored should never return your "best choice to make" to "don't publish it in paperback at all." Your only viable candidates should be: "Publish using a suboptimal option" and "Do verified research about what is the best option and then do that."
As they say, "The perfect is the enemy of the good."
Downvoted for the fake utility function.
"I wont let the world be destroyed because then rationality can't influence the future" is an attempt to avoid weighing your love of rationality against anything else.
Think about it. Is it really that rationality isn't in control any more that bugs you, not everyone dying, or the astronomical number of worthwhile lives that will never be lived?
If humanity dies to a paperclip maximizer, which goes on to spread copies of itself through the universe to oversee paperclip production, each of those copies being rational beyond what any human can achieve, is that okay with you?
Whether or not the lawful-goods of the world like Yvain are right, they are common. There are tons of people who want to side with good causes, but who are repulsed by the dark side even when used in favor of those causes. Maybe they aren't playing to win, but you don't play to win by saying you hate them for for following their lawful code.
For many people, the lawful code of "I'm siding with the truth" comes before the good code of "I'm going to press whatever issue." When these people see a movement playing dirty, advocating arguments...
Science is tailored to counteract human cognitive biases. Aliens might or might not have the same biases. AIs wouldn't need science.
For example, science says you make the hypothesis, then you run the test. You're supposed to make a prediction, not explain why something happened in retrospect. This is to prevent hindsight bias and rationalization from changing what we think is a consequence of our hypotheses. But the One True Way does not throw out evidence because humans are too weak to use it.
But how do you avoid those problems? Also, why should contemplating tradeoffs between how much we can get values force us to pick one? I bet you can imagine tradeoffs between bald people being happy, and people with hair being happy, but that doesn't mean you should change your value from "happiness" to one of the two. Which way you choose in each situation depends on how many bald people there are, and how many non-bald people there are. Similarly, with the right linear combination, these are just tradeoffs, and there is no reason to stop caring about one term because you care about the other more. And you didn't answer my last question. Why would most people meta-reflectively endorse this method of reflection?
Don't know, sorry.