PhilosophyTutor comments on Siren worlds and the perils of over-optimised search - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (411)
It seems based on your later comments that the premise of marketing worlds existing relies on there being trade-offs between our specified wants and our unspecified wants, so that the world optimised for our specified wants must necessarily be highly likely to be lacking in our unspecified ones ("A world with maximal bananas will likely have no apples at all").
I don't think this is necessarily the case. If I only specify that I want low rates of abortion, for example, then I think it highly likely that 'd get a world that also has low rates of STD transmission, unwanted pregnancy, poverty, sexism and religiousity because they all go together, I think you could specify any one of those variables and almost all of the time you would get all the rest as a package deal without specifying them.
Of course a malevolent AI could probably deliberately construct a siren world to maximise one of those values and tank the rest but such worlds seem highly unlikely to arise organically. The rising tide of education, enlightenment, wealth and egalitarianism lifts most of the important boats all at once, or at least that is how it seems to me.
Yes, certainly. That's a problem of optimisation with finite resources. If A is a specified want and B is an unspecified want, then we shouldn't confuse "there are worlds with high A and also high B" with "the world with the highest A will also have high B".
You would get a world with no conception, or possibly with no humans at all.
I don't think you have highlighted a fundamental problem since we can just specify that we mean a low percentage of conceptions being deliberately aborted in liberal societies where birth control and abortion are freely available to all at will.
My point, though, is that I don't think it is very plausible that "marketing worlds" will organically arise where there are no humans, or no conception, but which tick all the other boxes we might think to specify in our attempts to describe an ideal world. I don't see how there being no conception or no humans could possibly be a necessary trade-off with things like wealth, liberty, rationality, sustainability, education, happiness, the satisfaction of rational and well-informed preferences and so forth.
Of course a sufficiently God-like malevolent AI could presumably find some way of gaming any finite list we give it, since there are probably an unbounded number of ways of bringing about horrible worlds, so this isn't a problem with the idea of siren worlds. I just don't find the idea of market worlds very plausible because so many of the things we value are fundamentally interconnected.
The "no conception" example is just to illustrate that bad things happen when you ask an AI to optimise along a certain axis without fully specifying what we want (which is hard/impossible).
A marketing world is fully optimised along the "convince us to choose this world" axis. If at any point, the AI in confronted with a choice along the lines of "remove genuine liberty to best give the appearance of liberty/happiness", it will choose to do so.
That's actually the most likely way a marketing world could go wrong - the more control the AI has over people's appearance and behaviour, the more capable it is of making the world look good. So I feel we should presume that discrete-but-total AI control over the world's "inhabitants" would be the default in a marketing world.
I think this and the "finite resources therefore tradeoffs" argument both fail to take seriously the interconnectedness of the optimisation axes which we as humans care about.
They assume that every possible aspect of society is an independent slider which a sufficiently advanced AI can position at will, even though this society is still going to be made up of humans, will have to be brought about by or with the cooperation of humans and will take time to bring about. These all place constraints on what is possible because the laws of physics and human nature aren't infinitely malleable.
I don't think discreet but total control over a world is compatible with things like liberty, which seem like obvious qualities to specify in an optimal world we are building an AI to search for.
I think what we might be running in to here is less of an AI problem and more of a problem with the model of AI as an all-powerful genie capable of absolutely anything with no constraints whatsoever.
Precisely and exactly! That's the whole of the problem - optimising for one thing (appearance) results in the loss of other things we value.
Next challenge: define liberty in code. This seems extraordinarily difficult.
So we do agree that there are problem with an all-powerful genie? Once we've agreed on that, we can scale back to lower AI power, and see how the problems change.
(the risk is not so much that the AI would be an all powerful genie, but that it could be an all powerful genie compared with humans).
This just isn't always so. If you instruct an AI to optimise a car for speed, efficiency and durability but forget to specify that it has to be aerodynamic, you aren't going to get a car shaped like a brick. You can't optimise for speed and efficiency without optimising for aerodynamics too. In the same way it seems highly unlikely to me that you could optimise a society for freedom, education, just distribution of wealth, sexual equality and so on without creating something pretty close to optimal in terms of unwanted pregnancies, crime and other important axes.
Even if it's possible to do this, it seems like something which would require extra work and resources to achieve. A magical genie AI might be able to make you a super-efficient brick-shaped car by using Sufficiently Advanced Technology indistinguishable from magic but even for that genie it would have to be more work than making an equally optimal car by the defined parameters that wasn't a silly shape. In the same way an effectively God-like hypothetical AI might be able to make a siren world that optimised for everything except crime and create a world perfect in every way except that it was rife with crime but it seems like it would be more work, not less.
I think if we can assume we have solved the strong AI problem, we can assume we have solved the much lesser problem of explaining liberty to an AI.
We've got a problem with your assumptions about all-powerful genies, I think, because I think your argument relies on the genie being so ultimately all-powerful that it is exactly as easy for the genie to make an optimal brick-shaped car or an optimal car made out of tissue paper and post-it notes as it is for the genie to make an optimal proper car. I don't think that genie can exist in any remotely plausible universe.
If it's not all-powerful to that extreme then it's still going to be easier for the genie to make a society optimised (or close to it) across all the important axes at once than one optimised across all the ones we think to specify while tanking all the rest. So for any reasonable genie I still think market worlds don't make sense as a concept. Siren worlds, sure. Market worlds, not so much, because the things we value are deeply interconnected and you can't just arbitrarily dump-stat some while efficiently optimising all the rest.
The strong AI problem is much easier to solve than the problem of motivating an AI to respect liberty. For instance, the first one can be brute forced (eg AIXItl with vast resources), the second one can't. Having the AI understand human concepts of liberty is pointless unless it's motivated to act on that understanding.
An excess of anthropomophisation is bad, but an analogy could be about creating new life (which humans can do) and motivating that new life to follow specific rules are requirements if they become powerful (which humans are pretty bad at at).
I don't believe that strong AI is going to be as simple to brute force as a lot of LessWrongers believe, personally, but if you can brute force strong AI then you can just get it to run a neuron-by-neuron simulation of the brain of a reasonably intelligent first year philosophy student who understands the concept of liberty and tell the AI not to take actions which the simulated brain thinks offend against liberty.
That is assuming that in this hypothetical future scenario where we have a strong AI we are capable of programming that strong AI to do any one thing instead of another, but if we cannot do that then the entire discussion seems to me to be moot.
I've met far too many first-year philosophy students to be comfortable with this program.
How? "tell", "the simulated brain thinks" "offend": defining those incredibly complicated concepts contains nearly the entirety of the problem.
And therein lies the rub. Current research-grade AGI formalisms don't actually allow us to specifically program the agent for anything, not even paperclips.
Unless you start by removing the air, in some way that doesn't count against the car's efficiency.
This also creates some interesting problems... Suppose a very powerful AI is given human liberty as a goal (or discovers that this is a goal using coherent extrapolated volition). Then it could quickly notice that its own existence is a serious threat to that goal, and promptly destroy itself!
yes, but what about other AIs that might be created, maybe without liberty as a top goal - it would need to act to prevent them from being built! It's unlikely that "destroy itself" is the best option it can find...
Except that acting to prevent other AIs from being built would also encroach on human liberty, and probably in a very major way if it was to be effective! The AI might conclude from this that liberty is a lost cause in the long run, but it is still better to have a few extra years of liberty (until the next AI gets built), rather than ending it right now (through its own powerful actions).
Other provocative questions: how much is liberty really a goal in human values (when taking the CEV for humanity as a whole, not just liberal intellectuals)? How much is it a terminal goal, rather than an instrumental goal? Concretely, would humans actually care about being ruled over by a tyrant, as long as it was a good tyrant? (Many people are attracted to the idea of an all-powerful deity for instance, and many societies have had monarchs who were worshipped as gods.) Aren't mechanisms like democracy, separation of powers etc mostly defence mechanisms against a bad tyrant? Why shouldn't a powerful "good" AI just dispense with them?
A certain impression of freedom is valued by humans, but we don't seem to want total freedom as a terminal goal.
I think Asimov did this first with his Multivac stories, although rather than promptly destroy itself Multivac executed a long-term plan to phase itself out.