All of zulupineapple's Comments + Replies

Maybe I should just let you tell me what framework you are even using in the first place.

I'm looking at the Savage theory from your own https://plato.stanford.edu/entries/decision-theory/ and I see U(f)=∑u(f(si))P(si), so at least they have no problem with the domains (O and S) being different. Now I see the confusion is that to you Omega=S (and also O=S), but to me Omega=dom(u)=O.

Furthermore, if O={o0,o1}, then I can group the terms into u(o0)P("we're in a state where f evaluates to o0") + u(o1)P("we're in a state where f evaluates to o1"), I'm just movin... (read more)

2abramdemski
(Just to be clear, I did not write that article.) I think the interpretation of Savage is pretty subtle. The objects of preference ("outcomes") and objects of belief ("states") are treated as distinct sets. But how are we supposed to think about this? * The interpretation Savage seems to imply is that both outcomes and states are "part of the world", but the agent has somehow segregated parts of the world into matters of belief and matters of preference. But however the agent has done this, it seems to be fundamentally beyond the Savage representation; clearly within Savage, the agent cannot represent meta-beliefs about which matters are matters of belief and which are matters of preference. So this seems pretty weird.  * We could instead think of the objects of preference as something like "happiness levels" rather than events in the world. The idea of the representation theorem then becomes that we can peg "happiness levels" to real numbers. In this case, the picture looks more like standard utility functions; S is the domain of the function that gives us our happiness level (which can be represented by a real-valued utility).  * Another approach which seems somewhat common is to take the Savage representation but require that S=O. Savage's "acts" then become maps from world to world, which fits well with other theories of counterfactuals and causal interventions.  So even within a Savage framework, it's not entirely clear that we would want the domain of the utility function to be different from the domain of the belief function. I should also have mentioned the super-common VNM picture, where utility has to be a function of arbitrary states as well. The question is, what math-speak is the best representation of the things we actually care about? 

A classical probability distribution over  with a utility function understood as a random variable can easily be converted to the Jeffrey-Bolker framework, by taking the JB algebra as the sigma-algebra, and V as the expected value of U.

Ok, you're saying that JB is just a set of axioms, and U already satisfies those axioms. And in this construction "event" really is a subset of Omega, and "updates" are just updates of P, right? Then of course U is not more general, I had the impression that JB is a more distinct and specific thing.

Regarding the o... (read more)

2abramdemski
Ah, if you don't see 'worlds' as meaning any such thing, then I wonder, are we really arguing about anything at all? I'm using 'worlds' that way in reference to the same general setup which we see in propositions-vs-models in model theory, or in Ω vs the σ-algebra in the Kolmogorov axioms, or in Kripke frames, and perhaps some other places.  We can either start with a basic set of "worlds" (eg, Ω) and define our "propositions" or "events" as sets of worlds, where that proposition/event 'holds' or 'is true' or 'occurs'; or, equivalently, we could start with an algebra of propositions/events (like a σ-algebra) and derive worlds as maximally specific choices of which propositions are true and false (or which events hold/occur). Maybe I should just let you tell me what framework you are even using in the first place. There are two main alternatives to the Jeffrey-Bolker framework which I have in mind: the Savage axioms, and also the thing commonly seen in statistics textbooks where you have a probability distribution which obeys the Kolmogorov axioms and then you have random variables over that (random variables being defined as functions of type Ω→R). A utility function is then treated as a random variable. It doesn't sound like your notion of utility function is any of those things, so I just don't know what kind of framework you have in mind.

Answering out of order:

<...> then I think the Jeffrey-Bolker setup is a reasonable formalization.

Jeffrey is a reasonable formalization, it was never my point to say that it isn't. My point is only that U is also reasonable, and possibly equivalent or more general. That there is no "case against" it. Although, if you find Jeffery more elegant or comfortable, there is nothing wrong with that.

do you believe that any plausible utility function on bit-strings can be re-represented as a computable function (perhaps on some other representation, rather than

... (read more)
2abramdemski
I do agree that my post didn't do a very good job of delivering a case against utility functions, and actually only argues that there exists a plausibly-more-useful alternative to a specific view which includes utility functions as one of several elements.  Utility functions definitely aren't more general. A classical probability distribution over Ω with a utility function understood as a random variable can easily be converted to the Jeffrey-Bolker framework, by taking the JB algebra as the sigma-algebra, and V as the expected value of U. Technically the sigma-algebra needs to be atomless to fit JB exactly, but Zoltan Domotor (Axiomatization of Jeffrey Utilities) generalizes this considerably. I've heard people say that there is a way to convert in the other direction, but that it requires ultrafilters (so in some sense it's very non-constructive). I haven't been able to find this construction yet or had anyone explain how it works. So it seems to me, but I recognize that I haven't shown in detail, that the space of computable values is strictly broader in the JB framework; computable utility functions + computable probability gives us computable JB-values, but computable JB-values need not correspond to computable utility functions. Thus, the space of minds which can be described by the two frameworks might be equivalent, but the space of minds which can be described by computations does not seem to be; the JB space, there, is larger. Well, the Jeffrey-Bolker kind of explanation is as follows: agents really only need to consider and manipulate the probabilities and expected values of events (ie, propositions in the agent's internal language). So it makes some sense to assume that these probabilities and expected values are computable. But this does not imply (as far as I know) that we can construct 'worlds' as maximal specifications of which propositions are true/false and then define a utility function on those worlds which is consistent with the computable

If you actually do want to work on AI risk, but something is preventing you, you can just say "personal reasons", I'm not going to ask for details.

I understand that my style is annoying to some. Unfortunately, I have not observed polite and friendly people getting interesting answers, so I'll have to remain like that.

3Mitchell_Porter
Your questions opened multiple wounds, but I'll get over it.  I "work on" AI risk, in the sense that I think about it when I can. Under better circumstances, I suspect I could make important contributions. I have not yet found a path to better circumstances. 

OK, there are many people writing explanations, but if all of them are rehashing the same points from Superintelligence book, then there is not much value in that (and I'm tired of reading the same things over and over). Of course you don't need new arguments or new evidence, but it's still strange if there aren't any.

Anyone who has read this FAQ and others, but isn't a believer yet, will have some specific objections. But I don't think everyone's objections are unique, a better FAQ should be able to cover them, if their refutations exist to begin with.

Als... (read more)

4Mitchell_Porter
etc.  I presume you have no idea how enraging these questions are, because you know less than nothing about my life.  I will leave it to you to decide whether this "Average Redditor" style of behavior (look it up, it's a Youtube character) is something you should avoid in future. 

Stampy seems pretty shallow, even more so than this FAQ. Is that what you meant by it not filling "this exact niche"?

By the way, I come from AGI safety from first principles, where I found your comment linking to this. Notably, that sequence says "My underlying argument is that agency is not just an emergent property of highly intelligent systems, but rather a set of capabilities which need to be developed during training, and which won’t arise without selection for it." which is reasonable and seems an order of magnitude more conservative than this FAQ, which doesn't really touch the question of agency at all.

I'm talking specifically about discussions on LW. Of course in reality Alice ignores Bob's comment 90% of the time, and that's a problem in it's own right. It would be ideal if people who have distinct information would choose to exchange that information.

I picked a specific and reasonably grounded topic, "x-risk", or "the probability that we all die in the next 10 years", which is one number, so not hard to compare, unless you want to break it down by cause of death. In contrived philosophical discussions, it can certainly be hard to determine who agrees ... (read more)

I want neither. I observe that Raemon cannot find an up to date introduction that he's happy with, and I point out that this is really weird. What I want is an explanation to this bizarre situation.

Is your position that Raemon is blind, and good, convincing explanations are actually abundant? If so, I'd like to see them, it doesn't matter where from.

2Mitchell_Porter
Expositions of AI risk are certainly abundant. there have been numerous books and papers. Or just go to Youtube and type in "AI risk". As for whether any given exposition is convincing, I am no connoisseur. For a long time, I have taken it for granted that AI can be both smarter than humans and dangerous to humans. I'm more interested in details, like risk taxonomies and alignment theories.  But whether a given exposition is convincing, depends on the audience as well as on the author. Some people have highly specific objections. In our discussion, you questioned whether adversarial relations between AI and human are likely to occur, and with Raemon you bring up the topic of agency, so maybe you specifically need an argument that AIs would ever end up acting against human interests?  As for Raemon, I suspect he would like a superintelligence FAQ that that acknowledges the way things are in 2023 - e.g. the rise of a particular AI paradigm to dominate discussion (deep learning and large language models), and the existence of a public debate about AI safety, all the way up to the UN Security Council.  I don't know if you know, but after being focused for 20 years on rather theoretical issues of AI, MIRI has just announced it will be changing focus to "broad public communication". If you look back at their website, in the 2000s their introductory materials were mostly aimed at arguing that smarter-than-human AI is possible and important. Then in the 2010s (which is the era of Less Wrong), the MIRI homepage was more about their technical papers and workshops and so on, and didn't try to be accessible to a general audience. Now in the mid-2020s, they really will be aiming at a broader audience. 

"The world is full of adversarial relationships" is pretty much the weakest possible argument and is not going to convince anyone.

Are you saying that MIRI website has convincing introductory explanation of AI risk, the kind that Raemon wishes he had? Surely he would have found them already? If there aren't, then, again, why not?

2Mitchell_Porter
Let me first clarify something. Are you asking because you want to understand MIRI's specific model of AI risk; or do you just want a simple argument that AI risk is real, and it doesn't matter who makes the argument?  You're writing as if the reality of AI risk depends on whether or not there's an up-to-date FAQ about it, on this website. But you know that Less Wrong does not have a monopoly on AI doom, right? Everyone from the founders of deep learning to the officials of the deep state are worried about AI now, because it has become so powerful. This issue is somewhere in the media every day now; and it's just common sense, given the way of the world, that entities which are not human and smarter than human potentially pose a risk to the human race. 

If our relationship to them is adversarial, we will lose. But you also need to argue that this relationship will (likely) be adversarial.

Also, I'm not asking you to make the case here, I'm asking why the case is not being made on front page of LW and on every other platform. Would that not help with advocacy and recruitment? No idea what "keeping up with current events" means.

5Mitchell_Porter
The world is full of adversarial relationships, from rivalry among humans, to machines that resist doing what we want them to do. There are many ways in which AIs and humans might end up clashing.  Superintelligent AI is of particular concern because you probably don't get a second chance. If your goals clash with the goals of a superintelligent AI, your goals lose. So we have a particular incentive to get the goals of superintelligent AI correct in advance.  Less Wrong was set up to be a forum for discussion of rationality, not a hub of AI activism specifically. Eliezer's views on AI form just a tiny part of his "Sequences" here. People wanting to work on AI safety could go to the MIRI website or the "AI Alignment" forum.  Certainly Less Wrong now overflows with AI news and discussion. It wasn't always like that! Even as recently as 2020, I think there was more posting about Covid than there was about AI. A turning point was April last year, when the site founder announced that he thought humanity was on track to fail at the challenge of AI safety. Then came ChatGPT, and ecstasy and dread about AI became mainstream. If the site is now all AI, all the time, that simply reflects the state of the world. 

I certainly don't evaluate my U on quarks. Omega is not the set of worlds, it is the set of world models, and we are the ones who decide what that model should be. In "procrastination" example you intentionally picked a bad model, so it proves nothing (if the world only has one button we care about, then maybe |Omega|=2 and everything is perfectly computable).

Further on, it seems to me that if we set our model to be a list of "events" we've observed, then we get the exact thing you're talking about. Although you're imprecise and inconsistent about what an ... (read more)

2abramdemski
I agree that it makes more sense to suppose "worlds" are something closer to how the agent imagines worlds, rather than quarks. But on this view, I think it makes a lot of sense to argue that there are no maximally specific worlds -- I can always "extend" a world with an extra, new fact which I had not previously included. IE, agents never "finish" imagining worlds; more detail can always be added (even if only in separate magisteria, eg, imagining adding epiphenomenal facts). I can always conceive of the possibility of a new predicate beyond all the predicates which a specific world-model discusses. If you buy this, then I think the Jeffrey-Bolker setup is a reasonable formalization. If you don't buy this, my next question would be whether you really think that the sort of "world" ("world model", as you called it) which an agent attaches value to always are "closed off" (ie sperify all the facts one way or the other; do not admit further detail) -- or, perhaps, you merely want to argue that this can sometimes be the case but not always. (Because if it's sometimes the case but not always, this argues against both the traditional view where Omega is the set which the probability is a measure over & the utility function is a function of, and against the Jeffrey-Bolker picture.) I find it implausible that the sort of "world model" which we can model humans as having-values-as-a-function-of is "closed off" -- we can appreciate ideas like atoms and quarks, adding these to our ontology, without necessarily changing other aspects of our world-model. Perhaps sometimes we can "close things off" like this -- we can consider the possibility that there "is nothing else" -- but even so, I think this is better-modeled as an additional assertion which we add to the set of propositions defining a possibility rather than modeling us as having bottomed out in an underlying set of "world" which inherently decide all propositions. You seem to be suggesting that any such example cou

Seems like a red flag. How can there not be a more up-to-date one? Is advocacy and recruitment not a goal of AI-risk people? Are they instrumentally irrational? What is preventing you from writing such a post right now?

Most importantly, could it be that people struggle to write a good case for AI-risk, because the case for it is actually pretty weak, when you think about it?

7Raemon
People have made tons of slightly-different things all tackling this sort of goal (for example: https://stampy.ai/ ), they just didn't happen to fill this exact niche. I do think maybe it'd actually just be good for @Scott Alexander to write an up-to-date one.  A lot of why I like this one is Scott's prose, which I feel awkward completely copying and making changes to, and writing a new thing from scratch is a pretty high skill operation.
5Mitchell_Porter
The case for AI risk, is the same as the case for computers beating humans at chess. If the fate of the world depended on unaided humans being able to beat the best chess computers, we would have fought and lost about 25 years ago. Computers long ago achieved supremacy in the little domain of chess. They are now going to achieve supremacy in the larger domain of everyday life. If our relationship to them is adversarial, we will lose as surely as even the world champion of human chess loses to a moderately strong chess program.  If this FAQ is out of date, it might be because everyone is busy keeping up with current events. 

The link is broken. I was only able to find the article here, with the wayback machine.

In the examples, sometimes the problem is people having different goals for the discussion, sometimes it is having different beliefs about what kinds of discussions work, and sometimes it might be about almost object-level beliefs. If "frame" refers to all of that, then it's way too broad and not a useful concept. If your goal is to enumerate and classify the different goals and different beliefs people can have regarding discussions, that's great, but possibly to broad to make any progress.

My own frustration with this topic is lack of ... (read more)

Making long term predictions is hard. That's a fundamental problem. Having proxies can be convenient, but it's not going to tell you anything you don't already know.

That's what I think every time I hear "history repeats itself". I wish Scott had considered the idea.

The biggest claim Turchin is making seems to be about the variance of the time intervals between "bad" periods. Random walk would imply that it is high, and "cycles" would imply that it is low.

For example, say I wanted to know how good/enjoyable a specific movie would be.

My point is that "goodness" is not a thing in the territory. At best it is a label for a set of specific measures (ratings, revenue, awards, etc). In that case, why not just work with those specific measures? Vague questions have the benefit of being short and easy to remember, but beyond that I see only problems. Motivated agents will do their best to interpret the vagueness in a way that suits them.

Is your goal to find a method to generate specific interpretations an... (read more)

1ozziegooen
Hm... At this point I don't feel like I have a good intuition for what you find intuitive. I could give more examples, but don't expect they would convince you much right now if the others haven't helped. I plan to eventually write more about this, and eventually hopefully we should have working examples up (where people are predicting things). Hopefully things should make more sense to you then. Short comments back<>forth are a pretty messy communication medium for such work.
"What is the relative effectiveness of AI safety research vs. bio risk research?"

If you had a precise definition of "effectiveness" this shouldn't be a problem. E.g. if you had predictions for "will humans go extinct in the next 100 years?" and "will we go extinct in the next 100 years, if we invest 1M into AI risk research?" and "will we go extinct, if we invest 1M in bio risk research?", then you should be able to make decisions with that. And these questions should work fine in existing forecasting ... (read more)

1Tetraspace
There's something of a problem with sensitivity; if the x-risk from AI is ~0.1, and the difference in x-risk from some grant is ~10^-6, then any difference in the forecasts is going to be completely swamped by noise. (while people in the market could fix any inconsistency between the predictions, they would only be able to look forward to 0.001% returns over the next century)
3ozziegooen
Coming up with a precise definition is difficult, especially if you want multiple groups to agree. Those specific questions are relatively low-level; I think we should ask a bunch of questions like that, but think we may also want some more vague things as well. For example, say I wanted to know how good/enjoyable a specific movie would be. Predicting the ratings according to movie reviewers (evaluators) is an approach I'd regard as reasonable. I'm not sure what a precise definition for movie quality would look like (though I would be interested in proposals), but am generally happy enough with movie reviews for what I'm looking for. Agreed that that itself isn't a forecast, I meant in the more general case, for questions like, "How much value will this organization create next year" (as you pointed out). I probably should have used that more specific example, apologies. Can you be more explicit about your definition of "clearly"? I'd imagine that almost any proposal at a value function would have some vagueness. Certificates of Impact get around this by just leaving that for the review of some eventual judges, kind of similar to what I'm proposing. The goal for this research isn't fixing something with prediction markets, but just finding more useful things for them to predict. If we had expert panels that agreed to evaluate things in the future (for instance, they are responsible for deciding on the "value organization X has created" in 2025), then prediction markets and similar could predict what they would say.

While it's true that preferences are not immutable, the things that change them are not usually debate. Sure, some people can be made to believe that their preferences are inconsistent, but then they will only make the smallest correction needed to fix the problem. Also, sometimes debate will make someone claim to have changed their preferences, just to that they can avoid social pressures (e.g. "how dare you not care about starving children!"), but this may not reflect in their actions.

Regardless, my claim is that many (or most) people discount a lot, and that this would be stable under reflection. Otherwise we'd see more charity, more investment and more work on e.g. climate change.

Ok, that makes the real incentives quite different. Then, I suspect that these people are navigating facebook using the intuitions and strategies from the real world, without much consideration for the new digital environment.

Yes, and you answered that question well. But the reason I asked for alternative responses, was so that I could compare them to unsolicited recommendations from the anime-fan's point of view (and find that unsolicited recommendations have lower effort or higher reward).

Also, I'm not asking "How did your friend want the world to be different", I'm asking "What action could your friend have taken to avoid that particular response?". The friend is a rational agent, he is able to consider alternative strategies, but he shouldn't expect that other people will change their behavior when they have no personal incentive to do so.

What is the domain of U? What inputs does it take? In your papers you take a generic Markov Decision Process, but which one will you use here? How exactly do you model the real world? What is the set of states and the set of actions? Does the set of states include the internal state of the AI?

You may have been referring to this as "4. Issues of ontology", but I don't think the problem can be separated from your agenda. I don't see how any progress can be made without answering these questions. Maybe your can start with naive answers, an... (read more)

Answer by zulupineapple00

Discounting. There is no law of nature that can force me to care about preventing human extinction years from now, more than eating a tasty sandwich tomorrow. There is also no law that can force me to care about human extinction much more that about my own death.

There are, of course, more technical disagreements to be had. Reasonable people could question how bad unaligned AI will be or how much progress is possible in this research. But unlike those questions, the reasons of discounting are not debatable.

2Adam Scholl
"Not debatable" seems a little strong. For example, one might suspect both that some rational humans disprefer persisting, and also that most who think this would change their minds upon further reflection.

I do things my way because I want to display my independence (not doing what others tell me) and intelligence (ability to come up with novel solutions), and because I would feel bored otherwise (this is a feature of how my brain works, I can't help it).

"I feel independent and intelligent", "other people see me as independent and intelligent", "I feel bored" are all perfectly regular outcomes. They can be either terminal or instrumental goals. Either way, I disagree that these cases somehow don't fit in the usual preference model. You're only having this problem because you're interpreting "outcome" in a very narrow way.

Yes. The latter seems to be what OP is asking about: "If one wanted it to not happen, how would one go about that?". I assume OP is taking the perspective of his friends, who are annoyed by this behavior, rather than the perspective of the anime-fans, who don't necessarily see anything wrong with the situation.

2DanielFilan
In the literal world, I'm an anime fan, but the situation seems basically futile: the people recommending anime seem like they're accomplishing nothing but generating frustration. More metaphorically, I'm mostly interested in how to prevent the behaviour either as somebody complaining about anime or as a third party, and secondarily interested in how to restrain myself from recommending anime.
2Matt Goldenberg
Note that my response was responding to this original question: It want obvious to me that this was asking "How did your friend want the world to be different such that the incentives were to respond differently?"

That sounds reasonable, but the proper thing is not usually the easy thing, and you're not going to make people do the proper thing just by saying that it is proper.

If we want to talk about this as a problem in rationality, we should probably talk about social incentives, and possible alternative strategies for the anime-hater (you're now talking about a better strategy for the anime-fan, but it's not good to ask other people to solve your problems). Although I'm not sure to what extent this is a problem that needs solving.

2Raemon
It sounds like you two are currently talking about two different problems: mr-hire is asking "how do avoid being That Guy Who Pressures People about Anime" and you're asking the question "If I want to avoid people pestering me with anime questions, or people in general to stop this behavior, what would have to change?"

And then the other person says "no thanks", and you both stand in awkward silence? My point is that offering recommendations is a natural thing to say, even if not perfect, and it's nice to have something to say. If you want to discourage unsolicited recommendations, then you need to propose a different trajectory for the conversation. Changing topic is hard, and simply going away is rude. People give unsolicited recommendations because it seems to be the best option available.

3DanielFilan
At this juncture, it seems important to note that all examples I can think of took place on Facebook, where you can just end interactions like this without it being awkward.
9Matt Goldenberg
I think I would probably change the subject in a case like this. Good "vibing" conversation skill here is to "fractionate" the conversation, frequently cut topics before they reach their natural conclusion so that when you reach a conversation dead end like this, you have somewhere to go back to. Ditto with being able to make situational observations to restart a conversation, and having in your back pocket a list of topics and questions to go to. I don't think the proper thing to do here is to make someone else feel awkward or annoyed so that you feel less awkward, the proper thing to do is to learn the conversational skills to make people not feel awkward.

Sure, but it remains unclear what response the friend wanted from the other person. What better options are there? Should they just go away? Change topic? I'm looking for specific answers here.

2Matt Goldenberg
My response in this case would be to say something like "Well, I've got some shows that might change you're mind if you're ever interested. "Then leave it to them to continue that thread if interested. This goes with my general policy to try to avoid giving unsolicited advice.
a friend of mine observed that he couldn’t talk about how he didn’t like anime without a bunch of people rushing in to tell him that anime was actually good and recommending anime for him to watch

What response did your friend want? The reaction seems very natural to me (especially from anime fans). Note that your friend as at some point tried watching anime, and he has now chosen to talk about anime, which could easily mean that on some level he wants to like anime, or at least understand why others like it.

2Matt Goldenberg
Possible scenario where this comes up: Your friends are talking about anime, they ask you if you watch anime, you say "I don't like anime," they say "well you just haven't watched the right shows, have you tried..."
I got this big impossibility result

That's a part of the disagreement. In the past you clearly thought that Occam's razor was an "obvious" constraint that might work. Possibly you thought it was a unique such constraint. Then you found this result, and made a large update in the other direction. That's why you say the result is big - rejecting a constraint that you already didn't expect to work wouldn't feel very significant.

On the other hand, I don't think that Occam's razor is unique such constraint. So when I ... (read more)

So it seems that there was progress in applied rationality and in AI. But that's far from everything LW has talked about. What about more theoretical topics, general problems in philosophy, morality, etc? Do you feel than discussing some topics resulted in no progress and was a waste of time?

There's some debate about which things are "improvements" as opposed to changes.

Important question. Does the debate actually exist, or is this a figure of speech?

1 is trivial, so yes. But I don't agree with 2. Maybe the disagreement comes from "few" and "obvious"? To be clear, I count evaluating some simple statistic on a large data set as one constraint. I'm not so sure about "obvious". It's not yet clear to me that my simple constraints aren't good enough. But if you say that more complex constraints would give us a lot more confidence, that's reasonable.

From OP I understood that you want to throw out IRL entirely. e.g.

If we give up the assumption of human ra
... (read more)
4Stuart_Armstrong
Ok, we strongly disagree on your simple constraints being enough. I'd need to see these constraints explicitly formulated before I had any confidence in them. I suspect (though I'm not certain) that the more explicit you make them, the more tricky you'll see that it is. And no, I don't want to throw IRL out (this is an old post), I want to make it work. I got this big impossibility result, and now I want to get around it. This is my current plan: https://www.lesswrong.com/posts/CSEdLLEkap2pubjof/research-agenda-v0-9-synthesising-a-human-s-preferences-into
But it's not like there are just these five preferences and once we have four of them out of the way, we're done.

My example test is not nearly as specific as you imply. It discards large swaths of harmful and useless reward functions. Additional test cases would restrict the space further. There are still harmful Rs in the remaining space, but their proportion must be much lower than in the beginning. Is that not good enough?

What you're seeing as "adding enough clear examples" is actually "hand-crafting R(0) in totality".
... (read more)
2Stuart_Armstrong
We may not be disagreeing any more. Just to check, do you agree with both these statements: 1. Adding a few obvious constraints rule out many different R, including the ones in the OP. 2. Adding a few obvious constraints is not enough to get a safe or reasonable R.

This is true, but it doesn't fit well with the given example of "When will [country] develop the nuclear bomb?". The problem isn't that people can't agree what "nuclear bomb" means or who already has them. The problem is that people are working from different priors and extrapolating them in different ways.

Are you going to state your beliefs? I'm asking because I'm not sure what that looks like. My concern is that the statement will be very vague or very long and complex. Either way, you will have a lot of freedom to argue that actually your actions do match your statements, regardless of what those actions are. Then the statement would not be useful.

Instead I suggest that you should be accountable to people who share your beliefs. Having someone who disagrees with you try to model your beliefs and check your actions against that model seems like a source of conflict. Of course, stating your beliefs can be helpful in recognizing these people (but it is not the only method).

What's the motivation? In what case is lower accuracy for higher consistency a reasonable trade off? Especially consistency over time sounds like something that would discourage updating on new evidence.

3ozziegooen
I attempted to summarize some of the motivation for this here: https://www.lesswrong.com/posts/Df2uFGKtLWR7jDr5w/?commentId=tdbfBQ6xFRc7j8nBE
3Elizabeth
Some examples are where people care more about fairness, such as criminal sentencing and enterprise software pricing. However you're right that implicit in the question was "without new information appearing", although you'd want the answer to update the same way every time the same new information appeared.
3ChristianKl
If every study on depression would use it's own metric for depression that's optimal for the specific study it would be hard to learn from the studies and aggregate information from them. It's much better when you have a metric that has consistency. Consistent measurements allow reacting to how a metric changes over time which is often very useful for evaluating interventions.

Evaluating R on a single example of human behavior is good enough to reject R(2), R(4) and possibly R(3).

Example: this morning I went to the kitchen and picked up a knife. Among possible further actions, I had A - "make a sandwich" and B - "stab myself in the gut". I chose A. R(2) and R(4) say I wanted B and R(3) is indifferent. I think that's enough reason to discard them.

Why not do this? Do you not agree that this test discards dangerous R more often than useful R? My guess is that you're asking for very strong formal guarantees from the assumptions that you consider and use a narrow interpretation of what it means to "make IRL work".

2Stuart_Armstrong
Rejecting any specific R is easy - one bit of information (at most) per specific R. So saying "humans have preferences, and they are not always rational or always anti-rational" rules out R(1), R(2), and R(3). Saying "this apparent preference is genuine" rules out R(4). But it's not like there are just these five preferences and once we have four of them out of the way, we're done. There are many, many different preferences in the space of preferences, and many, many of them will be simpler than R(0). So to converge to R(0), we need to add huge amounts of information, ruling out more and more examples. Basically, we need to include enough information to define R(0) - which is what my research project is trying to do. What you're seeing as "adding enough clear examples" is actually "hand-crafting R(0) in totality". For more details see here: https://arxiv.org/abs/1712.05812

The point isn't that there is nothing wrong or dangerous about learning biases and rewards. The point is that the OP is not very relevant to those concerns. The OP says that learning can't be done without extra assumptions, but we have plenty of natural assumptions to choose from. The fact that assumptions are needed is interesting, but it is by no means a strong argument against IRL.

What if in reality due to effects currently beyond our understanding, our actions are making the future more likely to be dystopian in some way than if we took rando
... (read more)
6Stuart_Armstrong
You'd think so, but nobody has defined these assumptions in anything like sufficient detail to make IRL work. My whole research agenda is essentially a way of defining these assumptions, and it seems to be a long and complicated process.

I feel like there are several concerns mixed together, that should be separated:

1. Lack of communication, which is the central condition of the usual Shelling points.

2. Coordination (with some communication), where we agree to observe x41 because we don't trust the rest of the group to follow a more complex procedure.

3. Limited number of observations (or costly observations). In that case you may choose to only observe x41, even if you are working alone, just to lower your costs.

I don't think 2 and 3 have much to do with Shelling. These considera... (read more)

2Pattern
I think a "Theory" heading, and a "Example" heading would make for a nice compromise.

Is this ad hominem? Reasonable people could say that clone of saturn values ~1000 self-reports way too little. However it is not reasonable to claim that he is not at all skeptical of himself, and not aware of his biases and blind spots, and is just a contrarian.

"If I, clone of saturn, were wrong about Double Crux, how would I know? Where would I look to find the data that would disconfirm my impressions?"

Personally, I would go to a post about Double Crux, and ask for examples of it actually working (as Said Achmiz did). Alternatively, I would li... (read more)

The problem is that with these additional and obvious constraints, humans cannot be assigned arbitrary values, unlike the title of the post suggests. Sure there will be multiple R that pass any number of assumptions and we will be uncertain about which to use. However, because we don't perfectly know π(h), we had that problem to begin with. So it's not clear why this new problem matters. Maybe our confidence in picking the right R will be a little lower then expected, but I don't see why this reduction must be large.

4Rohin Shah
If we add assumptions like this, they will inevitably be misspecified, which can lead to other problems. For example, how would you operationalize that π is good at optimizing R? What if in reality due to effects currently beyond our understanding, our actions are making the future more likely to be dystopian in some way than if we took random actions? Should our AI infer that we prefer that dystopia, since otherwise we wouldn't be better than random? (See also the next three posts in this sequence.)
I learned a semester worth of calculus in three weeks

I'm assuming this is a response to my "takes years of work" claim, I have a few natural questions:

1. Why start counting time from the start of that summer program? Maybe you had never heard of calculus before that, but you had been learning math for many years already. If you learned calculus in 3 weeks, that simply means that you already had most of the necessary math skills, and you only had to learn a few definitions and do a little practice in applying them. Many people don't alre... (read more)

7Jay Molstad
1) True, but by the time that roommate took the class he had had comparable math foundations to what I had had when I took the class. Considering the extra years, arguably rather more. (Upon further thought I realized that I had taken the class in 1988 at the age of 15) 2) That was first-semester calc, Purdue's Math 161 class (for me and the roommate). Intro calc. Over the next two years I took two more semesters of calc, one of differential equations, and one of matrix algebra. By the time I met my freshman roommate (he was a bit older than me) and he started the calc class, I'd had five semesters of college math (which was all I ever took b/c I don't enjoy math). Also, that roommate was a below-average college student, but there are people in the world with far less talent than he had. 3) Because time is the only thing you can't buy. Time in college can be bought, but not cheaply even then. I got through school with good grades and went on to grad school as planned; his plans didn't work out. Of course time marched on and I had failures of my own. I agree that there's more to success than one particular kind of intelligence. Persistence, looks, money, luck, and other factors matter. But my roommate's calculus aptitude was a showstopper for his engineering ambitions, and I don't think his situation was terribly uncommon.

The worst case scenario is if two people both decide that a question is settled, but settle it in opposite ways. Then we're only moving from a state of "disagreement and debate" to a state of "disagreement without debate", which is not progress.

I appreciate the concrete example. I was expecting more abstract topics, but applied rationality is also important. Double Cruxes pass the criteria of being novel and the criteria of being well known. I can only question if they actually work or made an impact (I don't think I see many examples of them in LW), and if LW actually contributed to their discovery (apart from promoting CFAR).

Answer by zulupineapple10

The fact that someone does not understand calculus, does not imply that they are incapable of understanding calculus. They could simply be unwilling. There are many good reasons not to learn calculus. For one, it takes years of work. Some people may have better things to do. So I suggest that your entire premise is dubious - the variance may not be as large as you imagine.

2Jay Molstad
Personally, I learned a semester worth of calculus in three weeks for college credit at a summer program (the Purdue College Credit Program circa 1989, specifically) when I was 16. Out of 20ish students (pre-selected for academic achievement), about 15% (see note 1) aced it while still goofing around, roughly 60% got college credit but found the experience difficult, and some failed. Two years later, my freshman roommate (note 2) took the same Purdue course over 16 weeks and failed it. The question isn't "why don't some people understand calculus", but "why do some people learn it easily while others struggle, often failing". Note 1: This wasn't a statistically robust sample. "About 15%" means "Chris, Bill, and I". Note 2: That roommate wanted to be an engineer and was well aware that he could only achieve that goal by passing calculus. He was often working on his homework at 1:30 am, much to my annoyance. He worked harder on that course than I had, despite being 18 years old and having a (presumably) more mature brain.

That's a measly one in a billion. Why would you believe that this is enough? Enough for what? I'm talking about the preferences of a foreign agent. We don't get to make our own rules about what the agent prefers, only the agent can decide that.

Regarding practical purposes, sure you could treat the agent as if it was indifferent between A, B and C. However, given the binary choice, it will choose A over B, every time. And if you offered to trade C to B, B to A and A to C, at no cost, then the agent would gladly walk the cycle any number of times (if we can ignore the inherent costs of trading).

Defecting in Prisoner's dilema sounds morally bad, while defecting in Stag hunt sounds more reasonable. This seems to be the core difference between the two, rather than the way their payoff matrices actually differ. However, I don't think that viewing things in moral terms is useful here. Defecting in Prisoner's dilema can also be reasonable.

Also, I disagree with the idea of using "resource" instead of "utility". The only difference the change makes is that now I have to think, "how much utility is Alexis getting from 10 resources?" and come up with my own value. And if his utility function happens not to be monotone increasing, then the whole problem may change drastically.

This is all good, but I think the greatest problem with prediction markets is low status and low accessibility. To be fair though, improved status and accessibility are mostly useful in that they bring in more "suckers".

There is also a problem of motivation - the ideal of futarchy is appealing, but it's not clear to me how we go from betting on football to impacting important decisions.

Note, that the key feature of log function used here is not its slow growth, but the fact that it takes negative values on small inputs. For example, if we take the function u(r)=log (r+1), so that u(0)=0, then RC holds.

Although there are also solutions that prevent RC without taking negative values, e.g u(r) = exp{-1/r}.

a longer time horizon

Now that I think of it, a truly long-term view would not bother with such mundane things as making actual paperclips with actual iron. That iron isn't going anywhere, it doesn't matter whether you convert it now or later.

If you care about maximizing the number of paperclips at the heat death of the universe, your greatest enemies are black holes, as once some matter has fallen into them, you will never make paperclips from that matter again. You may perhaps extract some energy from the black hole, and convert that into matter... (read more)

Load More