All of Paperclip Minimizer's Comments + Replies

How does this interact with time preference ? As stated, an elementary consequence of this theorem is that either lending (and pretty much every other capitalist activity) is unprofitable, or arbitrage is possible.

5johnswentworth
Great question. The setup here assumes zero interest rates - in particular, I'm implicitly allowing borrowing without interest via short sales (real-world short sales charge interest). Once we allow for nonzero interest, there's a rate charged to borrow, and the price of each asset is its discounted expected value rather than just expected value. That's one of several modifications needed in order to use this theorem in real-world finance. (The same applies to the usual presentation of the Dutch Book arguments, and the same modification is possible.)

That would be a good argument if it were merely a language model, but if it can answer complicated technical questions (and presumably any other question), then it must have the necessary machinery to model the external world, predict what it would do in such and such circumstances, etc.

[This comment is no longer endorsed by its author]Reply
6TheWakalix
I'm confused. I already addressed the possibility of modeling the external world. Did you think the paragraph below was about something else, or did it just not convince you? (If the latter, that's entirely fine, but I think it's good to note that you understand my argument without finding it persuasive. Conversational niceties like this help both participants understand each other.) Or to put it another way, it understands how the external world works, but not that it's part of the external world. It doesn't self-model in that way. It might even have a model of itself, but it won't understand that the model is recursive. Its value function doesn't assign a high value to words that its model says will result in its hardware being upgraded, because the model and the goals aren't connected in that way. T-shirt slogan: "It might understand the world, but it doesn't understand that it understands the world." You might say "this sort of AI won't be powerful enough to answer complicated technical questions correctly." If so, that's probably our crux. I have a reference class of Deep Blue and AIXI, both of which answer questions at a superhuman level without understanding self-modification, but the former doesn't actually model the world and AIXI doesn't belong in discussions of practical feasibility. So I'll just point at the crux and hope you have something to say about it. You might say, as Yudkowsky has before, "this design is too vague and you can attribute any property to it that you like; come back when you have a technical description". If so, I'll admit I'm just a novice speculating on things they don't understand well. If you want a technical description then you probably don't want to talk to me; someone at OpenAI would probably be much better at describing how language models work and what their limitations are, but honestly anyone who's done AI work or research would be better at this than me. Or you can wait a decade and then I'll be in the class of "peop

My point is, if it can answer complicated technical questions, then it is probably a consequentialist that models itself and its environment.

[This comment is no longer endorsed by its author]Reply
3TheWakalix
Why do you think that non-consequentialists are more limited than humans in this domain? I could see that being the case, but I could also have seen that being the case for chess, and yet Deep Blue won't take over the world even with infinite compute. (Possible counterpoint: chess is far simpler than language.) "But Deep Blue backchains! That's not an example of a superhuman non-consequentialist in a technical domain." Yes, it's somewhat consequentialist, but in a way that doesn't have to do with the external world at all. The options it generates are all of the form "move [chess piece] to [location]." Similarly, language models only generate options of the form "[next word] comes next in [context]." No [next word] will result in the model attempting to seize more resources and recursively self-improve. This is why I said "a consequentialist that models itself and its environment". But it goes even further than that. An AI might model a location that happens to be its environment, including its own self. But if this model is not connected in the right way to its consequentialism, it still won't take over the world. It has to generate actions within its environment to do that, and language models simply don't work that way. Another line of thought: AIXI will drop an anvil on its head - it doesn't understand self-change. FOOM/Computronium is actually even more stringent: it has to be a non-Cartesian consequentialist that models itself in its environment. You need have to have solved the Embedded Agent problems. Now, people will certainly want to solve these at some point and build a FOOM-capable AI. It's probably necessary to solve them to build a generally intelligent AI that interacts sensibly with the world on its own. But I don't think you need to solve them to build a language model, even a superintelligent language model.

But this leads to a moral philosophy question: are time-discounting rates okay, and is your future self actually less important in the moral calculus than your present self ?

If an AI can answer a complicated technical question, then it evidently has the ability to use resources to further its goal of answering said complicated technical question, else it couldn't answer a complicated technical question.

[This comment is no longer endorsed by its author]Reply
5TheWakalix
It has the ability to use resources, but not necessarily the ability to get more of them. This is because it is not a consequentialist that models itself and its environment. Omohundro's convergent instrumental goals only apply to consequentialists.
2John_Maxwell
"If my calculator can multiply two 100-digit numbers, then it evidently has the ability to use resources to further its goal of doing difficult arithmetic problems, else it couldn't do difficult arithmetic problems." This is magical thinking.

But don't you need to get a gears-level model of how blackmail is bad to think about how dystopian a hypothetical legal-blackmail sociey is ?

5TheWakalix
Charitable interpretation of dystopianists: they're using the outside view. Uncharitable interpretation of dystopianists: they can come up with persuasive young adult novels against any change, regardless of whether the change would be bad or not. "I can make a narrative in which this is a bad thing" != "this is a bad thing".

The world being turned in computronium computing in order to solve the AI alignment problem would certainly be an ironic end to it.

[This comment is no longer endorsed by its author]Reply
5John_Maxwell
I'm not sure if you're being serious or not, but in case you are: Do you know much about how language models work? If so, which part of the code is the part that's going to turn the world into computronium? We already have narrow AIs that are superhuman in their domains. To my knowledge, nothing remotely like this "turn the world to computronium in order to excel in this narrow domain" thing has ever happened. This post might be useful to read. In Scott Alexander jargon, a language model seems like a behavior-executor, not a utility-maximizer.

My point is that it would be a better idea to put as prompt "What follows is a transcript of a conversation between two people:".

0Gurkenglas
I doubt it, but it sure sounds like a good idea to develop a theory of what prompts are more useful/safe.
2Pattern
That makes sense.

Note the framing. Not “should blackmail be legal?” but rather “why should blackmail be illegal?” Thinking for five seconds (or minutes) about a hypothetical legal-blackmail society should point to obviously dystopian results. This is not a subtle. One could write the young adult novel, but what would even be the point.

Of course, that is not an argument. Not evidence.

What ? From a consequentialist point of view, of course it is. If a policy (and "make blackmail legal" is a policy) probably have bad consequences, then it is a bad policy.

3TheWakalix
I'm not sure how much of this is projection, but I got the impression that people wanted a more gears-level model of how blackmail is bad.
9hazel
"It's obviously bad. Think about it and you'll notice that. I could write a YA dystopian novel about how the consequences are bad." <-- isn't an argument, at all. It assumes bad consequences rather than demonstrating or explaining how the consequences would be bad. That section is there for other reasons, partially (I think?) to explain Zvi's emotional state and why he wrote the article, and why it has a certain tone.

It was how it was trained, but Gurkenglas is saying that GPT-2 could male a human-like conversation because Turing test transcripts are in the GPT-2 dataset, but it's conversations between humans in the GPT-2 dataset that would make possible GPT-2 making human-like conversations and thus potentially passing the Turing Test.

1Gurkenglas
I think Pattern thought you meant "GPT-2 was trained on sentences generated by dumb programs.". I expect that a sufficiently better GPT-2 could deduce how to pass a Turing test without a large number of Turing test transcripts in its training set, just by having the prompt say "What follows is the transcript of a passing Turing test." and having someone on the internet talk about what a Turing test is. If you want to make it extra easy, let the first two replies to the judge be generated by a human.

But if the blackmail information is a good thing to publish, then blackmailing is still immoral, because it should be published and people should be incentivized to publish it, not to not publish it. We, as a society, should ensure that if, say, someone routinely engage in kidnapping children to harvest their organs, and someone knows this information, then she should be incentivized to send this information to the relevant authorities and not to keep this information to herself, for reasons that are I hope obvious.

I'm not sure what you're trying to say. I'm only saying that if your goal is to have an AI generate sentences that look like they were wrote by humans, then you should get a corpus with a lot of sentences that were wrote by humans, not sentences wrote by other, dumber, programs. I do not see why anyone would disagree with that.

6Pattern
That's not how it was trained?

It would make much more sense to train GPT-2 using discussions between humans if you want it to pass the Turing Test.

6Gurkenglas
A reason GPT-2 is impressive is that it performs better in some specialized tasks than specialized models.

You need to define the terms you use in a way so that what you are saying is useful by having pragmatic consequences on the real world of actual things, and not simply on the same level as arguing by definition.

-1Andaro
I observe that you are communicating in bad faith and with hostility, so I will use my right to exit for any further communication with you.

If you have such a large definition of the right to exit being blocked, then there is practically no such thing as the right to exit not being blocked, and the claim in your original comment is useless.

1Andaro
What? Why? No sane person would classify "he will murder me if I leave" as "the right to exit isn't blocked". I don't expect much steelmanning from the downvote-bots here, but if you're strawmanning on a rationalist board, good-faith communication becomes disincentivized. It's not like I have skin in the game; all my relationships are nonviolent and I neither give a shit about feminism nor anti-feminism. Still, if "she's such a nice person but sometimes she explodes" isn't compatible with revealed preference for the overall relationship, I don't know what is. My argument was never an argument that such relationships are great or that you should absolutely never use your right to exit. It's just a default interpretation of many relationships that are being maintained even though they contain abuse. Obviously if you're ankle-chained to a wall without a phone, that doesn't qualify as revealed preference. And while I don't object to ways government can buffer against the suffering of homelessness or socioeconomic hardship, it's still a logical necessity that the socioeconomic advantages of a relationship are a part of that relationship's attractiveness, just like good pay is a reason for people to stay in shitty jobs and it doesn't violate the concept of revealed preference, it doesn't make those jobs nonconsensual and it wouldn't necessarily make people better off if those jobs didn't exist. And by the way, it's right to exit, not right to exist. There's a big difference.
-4Andaro
I didn't read the whole post, but most of that is just the right to exit being blocked by various mechanisms, including socioeconomic pressure and violence. And the socioeconomic ones aren't even necessarily incompatible with revealed preference; if the alternative is homelessness, this may suck, but the partner still has no obligation to continue the relationship and the socioeconomic advantages are obviously a part of the package.

Excellent article ! You might want to add some trigger warnings, though.

edit: why so many downvotes in so little time ?

Hey admins: The "ë" in "Michaël Trazzi" is weird, probably a bug in your handling of Unicode.

2Said Achmiz
It looks just fine on both Less Wrong and GreaterWrong. (In both places, it correctly appears as U+00EB “Latin Small Letter E with Diaresis”.) Perhaps it is a bug on your end? What platform / browser / version are you running?

Actually we all fall prey to this particular one without realizing it, in one aspect or another.

At least, you do. (With apologies to Steven Brust)

4Apollo13
I quite agree with Paperclip Minimizer, you succumbed to the typical mind fallacy even as you talk about the typical mind fallacy, how ironic.

An high-Kolmogorov-complexity system is still a system.

I'm not sure what it would even mean to not have a Real Moral System. The actual moral judgments must come from somewhere.

7gjm
Anyone who makes moral judgements has a Real Moral Something. But suppose there's no human-manageable way of predicting your judgements; nothing any simpler or more efficient than presenting them to your brain and seeing what it does. You might not want to call that a system. And suppose that for some questions, you don't have an immediate answer, and what answer you end up with depends on irrelevant-seeming details: if we were somehow able to rerun your experience from now to when we ask you the question and you decide on an answer, we would get different answers on different reruns. (This might be difficult to discover, of course.) In that case, you might not want to say that you have a real opinion on those questions, even though it's possible to induce you to state one.

Using PCA on utility functions could be an interesting research subject for wannabe AI risk experts.

I don't see the argument. I have an actual moral judgement that painless extermination of all sentient beings is evil, and so is tiling the universe with meaningless sentient beings.

4gjm
I have no trouble believing that you do, but I don't understand how that relates to the point at issue here. (I wasn't meaning to imply that no one has actual moral judgements, at all; nor that no one has actual moral judgements that match their immediate instinctive reactions; if the problem is that it seemed like I meant either of those, then I apologize for being insufficiently clear.) The argument I was making goes like this: -1. Scott suggests that there may not be any such thing as his Real Moral System, because different ways of systematizing his moral judgements may be indistinguishable when asked about the sort of question he has definite moral judgements about, but all lead to different and horrifying conclusions when pushed far beyond that. 0. Paul says that if Scott didn't have a Real Moral System then he wouldn't be horrified by those conclusions, but would necessarily feel indifferent to them. 1. No: he might well still feel horror at those conclusions, because not having a Real Moral System doesn't mean not having anything that generates moral reactions; one can have immediate reactions of approval or disapproval to things, but not reflectively endorse them. Scott surely has some kind of brain apparatus that can react to whatever it's presented with, but that's not necessarily a Real Moral System because he might disavow some of its reactions; if so, he presumably has some kind of moral system (which does that disavowing), but there may be some questions to which it doesn't deliver answers. All of this is perfectly consistent with there being other people whose Real Moral System does deliver definite unambiguous answers in all these weird extreme cases.

don’t trust studies that would be covered in the Weird News column of the newspaper

-- Ozy

Good post. Some nitpicks:

There are many models of rationality from which a hypothetical human can diverge, such as VNM rationality of decision making, Bayesian updating of beliefs, certain decision theories or utilitarian branches of ethics. The fact that many of them exist should already be a red flag on any individual model’s claim to “one true theory of rationality.”

VNM rationality, Bayesian updating, decision theories, and utilitarian branches of ethics all cover different areas. They aren't incompatible and actually fit rather neatly into each oth

... (read more)

While this may seem like merely a niche issue, given the butterfly effect and a sufficiently long timeline with the possibility of simulations, it is almost guaranteed that any decision will change.

I think you accidentally words.

2Chris_Leong
"will change the reference class"
5Ben Pace
[Mod notice] This comment is needlessly aggressive and seems to me to be out of character for you. If you write a comment this needlessly aggressive again I’ll give you a temporary suspension.

Noticing an unachievable goal may force it to have an existential crisis of sorts, resulting in self-termination.

Do you have reasoning behind this being true, or is this baseless anthropomorphism ?

It should not hurt an aligned AI, as it by definition conforms to the humans' values, so if it finds itself well-boxed, it would not try to fight it.

So it is an useless AI ?

Your whole comment is founded on a false assumption. Look at Bayes' formula. Do you see any mention of whether your probability estimate is "just your prior" or "the result of a huge amount of investigation and very strong reasoning" ? No ? Well this mean that this doesn't effect how much you'll update.

0Richard_Ngo
This is untrue. Consider a novice and an expert who both assign 0.5 probability to some proposition A. Let event B be a professor saying that A is true. Let's also say that both the novice and the expert assign 0.5 probability to B. But the key term here is P(B|A). For a novice, this is plausibly quite high, because for all they know there's already a scientific consensus on A which they just hadn't heard about yet. For the expert, this is probably near 0.5, because they're confident that the professor has no better source of information than they do. In other words, experts may update less on evidence because the effect of that evidence is "screened off" by things they already knew. But it's difficult to quantify this effect.

"self-aware" can also be "self-aware" as in, say, "self-aware humor"

I don't see why negative utilitarians would be more likely than positive utilitarians to support animal-focused effective altruism over (near-term) human-focused effective altruism.

This actually made me not read the whole sequence.

[1] It would be rather audacious to claim that this is true for each of the four axioms. For instance, do please demonstrate how you would Dutch-book an agent that does not conform to the completeness axiom!

How can an agent not conform the completeness axiom ? It literally just say "either the agent prefer A to B, or B to A, or don't prefer anything". Offer me an example of an agent that don't conform to the completeness axiom.

Obviously it’s true that we face trade-offs. What is not so obvious is literally the entire rest of the section I quoted.

The

... (read more)

(Note: I ask that you not take this as an invitation to continue arguing the primary topic of this thread; however, one of the points you made is interesting enough on its own, and tangential enough from the main dispute, that I wanted to address it for the benefits of anyone reading this.)

[1] It would be rather audacious to claim that this is true for each of the four axioms. For instance, do please demonstrate how you would Dutch-book an agent that does not conform to the completeness axiom!

How can an agent not conform the completeness axiom ? It li

... (read more)
5Said Achmiz
Though there’s a great deal more I could say here, I think that when accusations of “looking for Internet debate points” start to fly, that’s the point at which it’s best to bow out of the conversation.

This one is not a central example, since I’ve not seen any VNM-proponent put it in quite these terms. A citation for this would be nice. In any case, the sort of thing you cite is not really my primary objection to VNM (insofar as I even have “objections” to the theorem itself rather than to the irresponsible way in which it’s often used), so we can let this pass.

VNM is used to show why you need to have utility functions if you don't want to get Dutch-booked. It's not something the OP invented, it's the whole point of VNM. One wonder what you thought VN

... (read more)

VNM is used to show why you need to have utility functions if you don’t want to get Dutch-booked. It’s not something the OP invented, it’s the whole point of VNM. One wonder what you thought VNM was about.

This is a confused and inaccurate comment.

The von Neumann-Morgenstern utility theorem states that if an agent’s preferences conform to the given axioms, then there exists a “utility function” that will correspond to the agent’s preferences (and so that agent can be said to behave as if maximizing a “utility function”).

We may then ask whether there is a

... (read more)

Is Aumann robust to untrustworthiness ?

This bidimensional model is weird.

  • I can imagine pure mania: assigning a 100% probability to everything going right
  • I can imagine pure depression: assign a 100% probability to everything going wrong
  • I can imagine pure anxiety: a completely flat probability distribution of things going right or wrong

But I can't imagine pure top left mood. This lead me to think that the mood square is actually a mood triangle, and that there is no top left mood, only a spectrum of moods between anxiety and mania.

This is excellent advice. Are you a moderator ?

6Dagon
Nope, just a consumer making use of a feedback channel to help a producer (you) make more stuff I want to read :)

I don't know. This make me anxious about writing critical posts in the future. I was about to begin to write another post that is similarly a criticism of an article wrote by someone else, and I don't think I'm going to do so.

Dagon110

I'm torn. I don't want you to be anxious about writing posts which happen to disagree with a point made by someone. Overall, writing more is better and I hope you don't feel punished by your honorable (IMO) removal of the post. However, I don't think LW is a good place for rebuttals of posts made elsewhere.

If you're making a point of interest to rationalists, I'd recommend to make it stand alone, and refer in passing to the incorrect/misleading posts only as a pointer to a different take. I wouldn't generally address it specifically to the outside poster, I'd make it more general than that.

Can I ask you what you mean by this ?

7ryan_b
What I would normally expect in online forums is for a critical post to stay up, and if another person expressed unhappiness for that to change nothing, and for this to result in less engagement or to persist as ongoing acrimony which might spill over into other posts. Purely selfishly, I don't like seeing those kinds of things happen; as a consequence my engagement would tend to decrease. From only seeing your retraction, I can update in favor of that not happening and instead have a soft assumption that similar interactions are being handled, and if I see one that isn't it is probably an exception and not the rule, which addresses my selfish concerns. Further, it increases my confidence that you and Ozy will both continue to write and therefore I expect good contributors to be preserved going forward. Since we are less likely to lose good contributors over time, and through that lose passive observers over time (insofar as I am a stand-in), it increases my confidence in the long-term health of the community. Maintaining good will takes challenging maintenance work, I think this qualifies, and I wanted to recognize it.

Never heard of a prank like this, this sound weird.

More generally, commenting isn't a good way to train oneself as a rationalist, but blogging is.

8cousin_it
Not sure about rationality, but blogging certainly makes you better at writing what your audience wants. That's not always a good thing though. I'm pretty sure Scott's audience has made him more political, he wasn't that way before. It's like one of those pranks where all students agree to act attentive when the professor walks right and act distracted when the professor walks left, so by the end of the lecture the professor is huddled in the right corner without knowing why. A better test of rationality would be noticing when it happens to you :-)

This isn't what "conflict theory" mean. Conflict theory is a specific theory about the nature of conflict, that say conflict is inevitable. Conflict theory doesn't simply mean that conflict exist.

I don't agree with your pessimism. To re-use your example, if you formalize the utility created by freedom and equality, you can compare both and pick the most efficient policies.

2cousin_it
Yeah, you can do that if you try. The only problem is that something like "freedom of association is important" itself feels important. The same thing happens with personal importance judgments, like "I care about becoming a published writer" or "being a good Christian matters to me". They are self-defending.

The author explain very clearly what the differences are between "people hate losses more than they like gains" and loss aversion. Loss aversion is people hating losing $1 while having $2 more than they like gaining $1 while having $1, even though it both case this the difference between having $1 and $2.

I think we do disagree on if it's a good idea to widely spread as a message "HEY SUICIDAL PEOPLE HAVE YOU REALIZED THAT IF YOU KILL YOURSELF EVERYONE WILL SAY NICE THINGS ABOUT YOU AND WORK ON SOLVING PROBLEMS YOU CARE ABOUT LET’S MAKE SURE TO HIGHLIGHT THIS EXTENSIVELY".

Load More