Dweomite

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

I'm confused about how continuity poses a problem for "This sentence has truth value in [0,1)" without also posing an equal problem for "this sentence is false", which was used as the original motivating example. 

I'd intuitively expect "this sentence is false" == "this sentence has truth value 0" == "this sentence does not have a truth value in (0,1]"

On my model, the phrase "I will do X" can be either a plan, a prediction, or a promise.

A plan is what you intend to do.

A prediction is what you expect will happen.  ("I intend to do my homework after dinner, but I expect I will actually be lazy and play games instead.")

A promise is an assurance.  ("You may rely upon me doing X.")

How about this: I train on all available data, but only report performance for the lots predicted to be <$1000?

This still feels squishy to me (even after your footnote about separately tracking how many lots were predicted <$1000). You're giving the model partial control over how the model is tested.

The only concrete abuse I can immediately come up with is that maybe it cheats like you predicted by submitting artificially high estimates for hard-to-estimate cases, but you miss it because it also cheats in the other direction by rounding down its estimates for easier-to-predict lots that are predicted to be just slightly over $1000.

But just like you say that it's easier to notice leakage than to say exactly how (or how much) it'll matter, I feel like we should be able to say "you're giving the model partial control over which problems the model is evaluated on, this seems bad" without necessarily predicting how it will matter.

My instinct would be to try to move the grading closer to the model's ultimate impact on the client's interests.  For example, if you can determine what each lot in your data set was "actually worth (to you)", then perhaps you could calculate how much money would be made or lost if you'd submitted a given bid (taking into account whether that bid would've won), and then train the model to find a bidding strategy with the highest expected payout.

But I can imagine a lot of reasons you might not actually be able to do that: maybe you don't know the "actual worth" in your training set, maybe unsuccessful bids have a hard-to-measure opportunity cost, maybe you want the model to do something simpler so that it's more likely to remain useful if your circumstances change.

Also you sound like you do this for a living so I have about 30% probability you're going to tell me that my concerns are wrong-headed for some well-studied reason I've never heard of.

Dweomite2-2

I think you're still thinking in terms of something like formalized political power, whereas other people are thinking in terms of "any ability to affect the world".

Suppose a fantastically powerful alien called Superman comes to earth, and starts running around the city of Metropolis, rescuing people and arresting criminals. He has absurd amounts of speed, strength, and durability. You might think of Superman as just being a helpful guy who doesn't rule anything, but as a matter of capability he could demand almost anything from the rest of the world and the rest of the world couldn't stop him. Superman is de facto ruler of Earth; he just has a light touch.

If you consider that acceptable, then you aren't objecting to "god-like status and control", you just have opinions about how that control should be exercised.

If you consider that UNacceptable, then you aren't asking for Superman to behave in certain ways, you are asking for Superman to not exist (or for some other force to exist that can check him).

Most humans (probably including you) are currently a "prisoner" of a coalition of humans who will use armed force to subdue and punish you if you take any actions that the coalition (in its sole discretion) deems worthy of such punishment. Many of these coalitions (though not all of them) are called "governments". Most humans seem to consider the existence of such coalitions to be a good thing on balance (though many would like to get rid of certain particular coalitions).

I will grant that most commenters on LessWrong probably want Superman to take a substantially more interventionist approach than he does in DC Comics (because frankly his talents are wasted stopping petty crime in one city).

Most commenters here still seem to want Superman to avoid actions that most humans would disapprove of, though.

Then we're no longer talking about "the way humans care about their friends", we're inventing new hypothetical algorithms that we might like our AIs to use. Humans no longer provide an example of how that behavior could arise naturally in an evolved organism, nor a case study of how it works out for people to behave that way.

My model is that friendship is one particular strategy for alliance-formation that happened to evolve in humans. I expect this is natural in the sense of being a local optimum (in the ancestral environment), but probably not in the sense of being simple to formally define or implement.

I think friendship is substantially more complicated than "I care some about your utility function". For instance, you probably stop valuing their utility function if they betray you (friendship can "break"). I also think the friendship algorithm includes a bunch of signalling to help with coordination (so that you understand the other person is trying to be friends), and some less-pleasant stuff like evaluations of how valuable an ally the other person is and how the friendship will affect your social standing.

Friendship also appears to include some sort of check that the other person is making friendship-related-decisions using system 1 instead of system 2--possibly as a security feature to make it harder for people to consciously exploit (with the unfortunate side-effect that we penalize system-2-thinkers even when they sincerely want to be allies), or possibly just because the signalling parts evolved for system 1 and don't generalize properly.

(One could claim that "the true spirit of friendship" is loving someone unconditionally or something, and that might be simple, but I don't think that's what humans actually implement.)

You appear to be thinking of power only in extreme terms (possibly even as an on/off binary).  Like, that your values "don't have power" unless you set up a dictatorship or something.

But "power" is being used here in a very broad sense. The personal choices you make in your own life are still a non-zero amount of power to whatever you based those choices on. If you ever try to persuade someone else to make similar choices, then you are trying to increase the amount of power held by your values.  If you support laws like "no stealing" or "no murder" then you are trying to impose some of your values on other people through the use of force.

I mostly think of government as a strategy, not an end. I bet you would too, if push came to shove; e.g. you are probably stridently against murdering or enslaving a quarter of the population, even if the measure passes by a two-thirds vote.  My model says almost everyone would endorse tearing down the government if it went sufficiently off the rails that keeping it around became obviously no longer a good instrumental strategy.

Like you, I endorse keeping the government around, even though I disagree with it sometimes. But I endorse that on the grounds that the government is net-positive, or at least no worse than [the best available alternative, including switching costs]. If that stopped being true, then I would no longer endorse keeping the current government. (And yes, it could become false due to a great alternative being newly-available, even if the current government didn't get any worse in absolute terms. e.g. someone could wait until democracy is invented before they endorse replacing their monarchy.)

I'm not sure that "no one should have the power to enforce their own values" is even a coherent concept. Pick a possible future--say, disassembling the earth to build a Dyson sphere--and suppose that at least one person wants it to happen, and at least one person wants it not to happen. When the future actually arrives, it will either have happened, or not--which means at least one person "won" and at least one person "lost". What exactly does it mean for "neither of those people had the power to enforce their value", given that one of the values did, in fact, win? Don't we have to say that one of them clearly had enough power to stymie the other?

You could say that society should have a bunch of people in it, and that no single person should be able to overpower everyone else combined. But that doesn't prevent some value from being able to overpower all other values, because a value can be endorsed by multiple people!

I suppose someone could hypothetically say that they really only care about the process of government and not the result, such that they'll accept any result as long as it is blessed by the proper process. Even if you're willing to go to that extreme, though, that still seems like a case of wanting "your values" to have power, just where the thing you value is a particular system of government. I don't think that having this particular value gives you any special moral high ground over people who value, say, life and happiness.

I also think that approximately no one actually has that as a terminal value.

In the context of optimization, values are anything you want (whether moral in nature or otherwise).

Any time a decision is made based on some value, you can view that value as having exercised power by controlling the outcome of that decision.

Or put more simply, the way that values have power, is that values have people who have power.

I feel like your previous comment argues against that, rather than for it. You said that people who are trapped together should be nice to each other because the cost of a conflict is very high. But now you're suggesting that ASIs that are metaphorically trapped together would aggressively attack each other to enforce compliance with their own behavioral standards. These two conjectures do not really seem allied to me.

Separately, I am very skeptical of aliens warring against ASIs to acausally protect us. I see multiple points where this seems likely to fail:

  • Would aliens actually take our side against an ASI merely because we created it? If humans hear a story about an alien civilization creating a successor species, and then the successor species overthrowing its creators, I do not expect humans to automatically be on the creators' side in this story. I expect humans will take a side mostly based on how the two species were treating each other (overthrowing abusive masters is usually portrayed as virtuous in our fiction), and that which one of them is the creator will have little weight. I do not think "everyone should be aligned with their creators" is a principle that humans would actually endorse (except by motivated reasoning, in situations where it benefits us).
    • Also note that humans are not aligned with the process that produced us (evolution) and approximately no humans think this is a problem
  • Even if the aliens sympathize with us, would they care enough to take expensive actions about it?
  • Even if the aliens would war to save us, would the ASI predict that?  It can only acausally save us if the ASI successfully predicts the policy.  Otherwise, the war might still happen, but that doesn't help us.
  • Even if the ASI predicts this, will it comply?  This seems like what dath ilan would consider a "threat", in that the aliens are punishing the ASI rather than enacting their own BATNA.  It may be decision-theoretically correct to ignore the threat.
  • This whole premise, of us being saved at the eleventh hour by off-stage actors, seems intuitively like the sort of hypothesis that would be more likely to be produced by wishful thinking than by sober analysis, which would make me distrust it even if I couldn't see any specific problems with it.

I don't see why either expecting or not-expecting to meet other ASIs would make it instrumental to be nice to humans.

Load More