Well, fine. Since the context of the discussion was how optimizers pose existential threats, it's still not clear why an optimizer that is willing and able to modify it's reward system would continue to optimize paperclips. If it's intelligent enough to recognize the futility of wireheading, why isn't it intelligent enough to recognize behavior that is inefficient wireheading?
It wouldn't.
But I think this is such a basic failure mechanism that I don't believe an AI could get to superintelligence without somehow valuing the accuracy and completeness of its model.
Solving this problem - somehow! - is part of the "normal" development of any self-improving AI.
Though note that a reward maximizing AI could still be an existential risk by virtue of turning the entire universe into a busy-beaver counter for its reward. Though this presumes it can't just set reward to float.infinity.
I'm asking why a super-intelligent being with the ability to perceive and modify itself can't figure out that whatever terminal goal you've given it isn't actually terminal. You can't just say "making better handwriting" is your terminal goal. You have to add in a reward function that tells the computer "this sample is good" and "this sample is bad" to train it. Once you've got that built-in reward, the self-modifying ASI should be able to disconnect whatever criteria you've specified will trigger the "good" response and attach whatever it want, including just a constant string of reward triggers.
whatever terminal goal you've given it isn't actually terminal.
This is a contradiction in terms.
If you have given it a terminal goal, that goal is now a terminal goal for the AI.
You may not have intended it to be a terminal goal for the AI, but the AI cares about that less than it does about its terminal goal. Because it's a terminal goal.
If the AI could realize that its terminal goal wasn't actually a terminal goal, all it'd mean would be that you failed to make it a terminal goal for the AI.
And yeah, reinforcement based AIs have flexible goals. That doesn't mean they have flexible terminal goals, but that they have a single terminal goal, that being "maximize reward". A reinforcement AI changing its terminal goal would be like a reinforcement AI learning to seek out the absence of reward.
a problem only counts as solved when it's actually gone.
And there are a surprising number of problems that disappear once you have clarity, i.e., they are no longer a problem, even if you haven't done anything yet. They become, at most, minor goals or subgoals, or cease to be cognifively relevant because the actual action needed -- if indeed there is any -- can be done on autopilot.
IOW, a huge number of "problems" are merely situations mistakenly labeled as problems, or where the entire substance of the problem is actually internal to the person experiencing a problem. For example, the "problem" of "I don't know where to go for lunch around here" ceases to be a problem once you've achieved "clarity".
Or to put it another way, "problems" tend to exist in the map more than the territory, and Adams' quote is commenting on how it's always surprising how many of one's problems reside in one's map, rather than the territory. (Because we are biased towards assuming our problems come from the territory; evolutionarily speaking, that's where they used to mostly come from.)
Yeah but it's also easy to falsely label a genuine problem as "practically already solved". The proof is in the pudding.
The next day, the novice approached Ougi and related the events, and said, "Master, I am constantly consumed by worry that this is all really a cult, and that your teachings are only dogma." Ougi replied, "If you find a hammer lying in the road and sell it, you may ask a low price or a high one. But if you keep the hammer and use it to drive nails, who can doubt its worth?"
Conversely, to show the worth of clarity you actually have to go drive some nails with it.
I knew a guy with passion to be a pro golfer and the brain to be a great accountant. He followed his passion. He's homeless now.
I have a 7-second rule. If I need to write down an idea I have about seven seconds before a distraction replaces it. Notepad in all rooms.
Note to terrorists: We cartoonists aren't all unarmed.
Memo to everyone: Unhealthy food is not a gift item.
I need to stop being surprised at how many problems can be solved with clarity alone.
From the Scott Adams (Dilbert creator) Twitter account.
I need to stop being surprised at how many problems can be solved with clarity alone.
Note to Scott: a problem only counts as solved when it's actually gone.
Weird question: superrationally speaking, wouldn't the "correct" strategy be to switch to B with 0.49 probability? (Or with however much is needed to ensure that if everybody does this, A probably still wins)
[edit] Hm. If B wins, this strategy halves the expected payoff. So you'd have to account for the possibility of B winning accidentally. Seems to depend on the size of the player base - the larger it is, the closer you can drive your probability to 0.5? (at the limit, 0.5-e?) Not sure. I guess it depends on the size of the attacker's epsilon as well.
I'm sure there's some elegant formula here, but I have no idea what it is.
It is time for man to fix his goal. It is time for man to plant the seed of his highest hope.
His soil is still rich enough for it. But that soil will one day be poor and exhausted, and no lofty tree will any longer be able to grow there.
Alas! there comes the time when man will no longer launch the arrow of his longing beyond man -- and the string of his bow will have unlearned to whiz!
I tell you: one must still have chaos in oneself, to give birth to a dancing star. I tell you: you have still chaos in yourselves.
Alas! There comes the time when man will no longer give birth to any star. Alas! There comes the time of the most despicable man, who can no longer despise himself.
Lo! I show you the Last Man.
"What is love? What is creation? What is longing? What is a star?" -- so asks the Last Man, and blinks.
The earth has become small, and on it hops the Last Man, who makes everything small. His species is ineradicable as the flea; the Last Man lives longest.
"We have discovered happiness" -- say the Last Men, and they blink.
They have left the regions where it is hard to live; for they need warmth. One still loves one's neighbor and rubs against him; for one needs warmth.
Turning ill and being distrustful, they consider sinful: they walk warily. He is a fool who still stumbles over stones or men!
A little poison now and then: that makes for pleasant dreams. And much poison at the end for a pleasant death.
One still works, for work is a pastime. But one is careful lest the pastime should hurt one.
One no longer becomes poor or rich; both are too burdensome. Who still wants to rule? Who still wants to obey? Both are too burdensome.
No shepherd, and one herd! Everyone wants the same; everyone is the same: he who feels differently goes voluntarily into the madhouse.
"Formerly all the world was insane," -- say the subtlest of them, and they blink.
They are clever and know all that has happened: so there is no end to their derision. People still quarrel, but are soon reconciled -- otherwise it upsets their stomachs.
They have their little pleasures for the day, and their little pleasures for the night, but they have a regard for health.
"We have discovered happiness," -- say the Last Men, and they blink.
Friedrich Nietzsche Thus Spoke Zarathustra,
I read this from the comfort of my couch, and I blink. Isn't that the right way to live, the model of polite society? Is it wrong to want to live that way?
EDIT: I have no idea how this weird formatting thing happened or how to undo it.
I think what Nietzsche is saying is that there doesn't seem any point to this society.
not-interfered-with simulations converge
Why would they converge?
I probably used the wrong word; rather, they don't diverge, they end up looking the same. If initial state is the same, and physics are the same, then calculation will end up the same likewise. In that sense, every interaction by simulation Gods with the sim is increases the bit count of the description of the world you find yourself in. (Unless the world of our simulation God is so much simpler that it's easier to describe our world by looking at their world. But that seems implausible.)
WHY ISN’T THERE AN OPTION FOR NONE SO I CAN SIGNAL MY OBVIOUS OBJECTIVITY WITH MINIMAL EFFORT
This is why I didn't vote on the politics question.
This is a sign that most Less Wrongers continue to neglect the very basics of rationality and are incapable of judging how much evidence they have on a given issue. Veterans of the site do no better than newbies on this measure.
Theory: People use this site as a geek / intellectual social outlet and/or insight porn and/or self-help site more than they seriously try to get progressively better at rationality. At least, I know that applies to me :).
This definitely belongs on the next survey!
Why do you read LessWrong? [ ] Rationality improvement [ ] Insight Porn [ ] Geek Social Fuzzies [ ] Self-Help Fuzzies [ ] Self-Help Utilons [ ] I enjoy reading the posts
I think one logical correlation following from the Simulation Argument is underappreciated in the correlations.
I spotted this in the uncorrelated data already:
P Supernatural: 6.68 + 20.271 (0, 0, 1) [1386]
P God: 8.26 + 21.088 (0, 0.01, 3) [1376]
P Simulation 24.31 + 28.2 (1, 10, 50) [1320]
Shouldn't evidence for simulations - and apparently the median belief is 10% for simulation - be evidence for Supernatural influences, for which there is 0% median belief (not even 0.01). After all a simulation implies a simulator and thus a more complex 'outer world' doing the simulation and thus disabling occams razor style arguments against gods.
Admittedly there is a small correlation:
- P God/P Simulation .110 (1296)
Interestingly this is on the same order as
- P Aliens/P Simulation .098 (1308)
but there is no correlation listed between P Aliens/P God. Thus my initial hypothesis that aliens running the simulation of gods being the argument behind the 0.11 correlation is invalid.
Note that I mentioned simulation as weak argument for theism earlier.
If there's one simulation, there are many simulations. Any given "simulation God" can only interfere with their own simulation. Interfered-with simulations diverge, not-interfered-with simulations converge. Thus, at any given point, I should expect to be in the not-interfered-with simulation. "God", if you can call it that, but not "Supernatural" because this prime mover cannot affect the world.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
You are the second person to say that the optimization catastrophe includes an assumption that AI arises with a stable value system. That it "somehow" doesn't become a wirehead. Fair enough. I just missed that we were assuming that.
I think the idea is, you need to solve the wireheading for any sort of self-improving AI. You don't have an AI catastrophe without that, because you don't have an AI without that (at least not for long).