That would be a good argument if it were merely a language model, but if it can answer complicated technical questions (and presumably any other question), then it must have the necessary machinery to model the external world, predict what it would do in such and such circumstances, etc.
My point is, if it can answer complicated technical questions, then it is probably a consequentialist that models itself and its environment.
But this leads to a moral philosophy question: are time-discounting rates okay, and is your future self actually less important in the moral calculus than your present self ?
If an AI can answer a complicated technical question, then it evidently has the ability to use resources to further its goal of answering said complicated technical question, else it couldn't answer a complicated technical question.
But don't you need to get a gears-level model of how blackmail is bad to think about how dystopian a hypothetical legal-blackmail sociey is ?
The world being turned in computronium computing in order to solve the AI alignment problem would certainly be an ironic end to it.
My point is that it would be a better idea to put as prompt "What follows is a transcript of a conversation between two people:".
Note the framing. Not “should blackmail be legal?” but rather “why should blackmail be illegal?” Thinking for five seconds (or minutes) about a hypothetical legal-blackmail society should point to obviously dystopian results. This is not a subtle. One could write the young adult novel, but what would even be the point.
Of course, that is not an argument. Not evidence.
What ? From a consequentialist point of view, of course it is. If a policy (and "make blackmail legal" is a policy) probably have bad consequences, then it is a bad policy.
It was how it was trained, but Gurkenglas is saying that GPT-2 could male a human-like conversation because Turing test transcripts are in the GPT-2 dataset, but it's conversations between humans in the GPT-2 dataset that would make possible GPT-2 making human-like conversations and thus potentially passing the Turing Test.
But if the blackmail information is a good thing to publish, then blackmailing is still immoral, because it should be published and people should be incentivized to publish it, not to not publish it. We, as a society, should ensure that if, say, someone routinely engage in kidnapping children to harvest their organs, and someone knows this information, then she should be incentivized to send this information to the relevant authorities and not to keep this information to herself, for reasons that are I hope obvious.
I'm not sure what you're trying to say. I'm only saying that if your goal is to have an AI generate sentences that look like they were wrote by humans, then you should get a corpus with a lot of sentences that were wrote by humans, not sentences wrote by other, dumber, programs. I do not see why anyone would disagree with that.
It would make much more sense to train GPT-2 using discussions between humans if you want it to pass the Turing Test.
You need to define the terms you use in a way so that what you are saying is useful by having pragmatic consequences on the real world of actual things, and not simply on the same level as arguing by definition.
If you have such a large definition of the right to exit being blocked, then there is practically no such thing as the right to exit not being blocked, and the claim in your original comment is useless.
Excellent article ! You might want to add some trigger warnings, though.
edit: why so many downvotes in so little time ?
Hey admins: The "ë" in "Michaël Trazzi" is weird, probably a bug in your handling of Unicode.
Actually we all fall prey to this particular one without realizing it, in one aspect or another.
At least, you do. (With apologies to Steven Brust)
An high-Kolmogorov-complexity system is still a system.
I'm not sure what it would even mean to not have a Real Moral System. The actual moral judgments must come from somewhere.
Using PCA on utility functions could be an interesting research subject for wannabe AI risk experts.
I don't see the argument. I have an actual moral judgement that painless extermination of all sentient beings is evil, and so is tiling the universe with meaningless sentient beings.
Good post. Some nitpicks:
There are many models of rationality from which a hypothetical human can diverge, such as VNM rationality of decision making, Bayesian updating of beliefs, certain decision theories or utilitarian branches of ethics. The fact that many of them exist should already be a red flag on any individual model’s claim to “one true theory of rationality.”
VNM rationality, Bayesian updating, decision theories, and utilitarian branches of ethics all cover different areas. They aren't incompatible and actually fit rather neatly into each oth
...While this may seem like merely a niche issue, given the butterfly effect and a sufficiently long timeline with the possibility of simulations, it is almost guaranteed that any decision will change.
I think you accidentally words.
Noticing an unachievable goal may force it to have an existential crisis of sorts, resulting in self-termination.
Do you have reasoning behind this being true, or is this baseless anthropomorphism ?
It should not hurt an aligned AI, as it by definition conforms to the humans' values, so if it finds itself well-boxed, it would not try to fight it.
So it is an useless AI ?
Your whole comment is founded on a false assumption. Look at Bayes' formula. Do you see any mention of whether your probability estimate is "just your prior" or "the result of a huge amount of investigation and very strong reasoning" ? No ? Well this mean that this doesn't effect how much you'll update.
"self-aware" can also be "self-aware" as in, say, "self-aware humor"
I don't see why negative utilitarians would be more likely than positive utilitarians to support animal-focused effective altruism over (near-term) human-focused effective altruism.
[1] It would be rather audacious to claim that this is true for each of the four axioms. For instance, do please demonstrate how you would Dutch-book an agent that does not conform to the completeness axiom!
How can an agent not conform the completeness axiom ? It literally just say "either the agent prefer A to B, or B to A, or don't prefer anything". Offer me an example of an agent that don't conform to the completeness axiom.
Obviously it’s true that we face trade-offs. What is not so obvious is literally the entire rest of the section I quoted.
The
...(Note: I ask that you not take this as an invitation to continue arguing the primary topic of this thread; however, one of the points you made is interesting enough on its own, and tangential enough from the main dispute, that I wanted to address it for the benefits of anyone reading this.)
...[1] It would be rather audacious to claim that this is true for each of the four axioms. For instance, do please demonstrate how you would Dutch-book an agent that does not conform to the completeness axiom!
How can an agent not conform the completeness axiom ? It li
This one is not a central example, since I’ve not seen any VNM-proponent put it in quite these terms. A citation for this would be nice. In any case, the sort of thing you cite is not really my primary objection to VNM (insofar as I even have “objections” to the theorem itself rather than to the irresponsible way in which it’s often used), so we can let this pass.
VNM is used to show why you need to have utility functions if you don't want to get Dutch-booked. It's not something the OP invented, it's the whole point of VNM. One wonder what you thought VN
...VNM is used to show why you need to have utility functions if you don’t want to get Dutch-booked. It’s not something the OP invented, it’s the whole point of VNM. One wonder what you thought VNM was about.
This is a confused and inaccurate comment.
The von Neumann-Morgenstern utility theorem states that if an agent’s preferences conform to the given axioms, then there exists a “utility function” that will correspond to the agent’s preferences (and so that agent can be said to behave as if maximizing a “utility function”).
We may then ask whether there is a
...Is Aumann robust to untrustworthiness ?
This bidimensional model is weird.
But I can't imagine pure top left mood. This lead me to think that the mood square is actually a mood triangle, and that there is no top left mood, only a spectrum of moods between anxiety and mania.
This is excellent advice. Are you a moderator ?
I don't know. This make me anxious about writing critical posts in the future. I was about to begin to write another post that is similarly a criticism of an article wrote by someone else, and I don't think I'm going to do so.
I'm torn. I don't want you to be anxious about writing posts which happen to disagree with a point made by someone. Overall, writing more is better and I hope you don't feel punished by your honorable (IMO) removal of the post. However, I don't think LW is a good place for rebuttals of posts made elsewhere.
If you're making a point of interest to rationalists, I'd recommend to make it stand alone, and refer in passing to the incorrect/misleading posts only as a pointer to a different take. I wouldn't generally address it specifically to the outside poster, I'd make it more general than that.
Can I ask you what you mean by this ?
Never heard of a prank like this, this sound weird.
More generally, commenting isn't a good way to train oneself as a rationalist, but blogging is.
I'm not sure what you mean.
This isn't what "conflict theory" mean. Conflict theory is a specific theory about the nature of conflict, that say conflict is inevitable. Conflict theory doesn't simply mean that conflict exist.
I don't agree with your pessimism. To re-use your example, if you formalize the utility created by freedom and equality, you can compare both and pick the most efficient policies.
Fixed ;)
The author explain very clearly what the differences are between "people hate losses more than they like gains" and loss aversion. Loss aversion is people hating losing $1 while having $2 more than they like gaining $1 while having $1, even though it both case this the difference between having $1 and $2.
I think we do disagree on if it's a good idea to widely spread as a message "HEY SUICIDAL PEOPLE HAVE YOU REALIZED THAT IF YOU KILL YOURSELF EVERYONE WILL SAY NICE THINGS ABOUT YOU AND WORK ON SOLVING PROBLEMS YOU CARE ABOUT LET’S MAKE SURE TO HIGHLIGHT THIS EXTENSIVELY".
How does this interact with time preference ? As stated, an elementary consequence of this theorem is that either lending (and pretty much every other capitalist activity) is unprofitable, or arbitrage is possible.