But humans are capable of thinking about what their values "actually should be" including whether or not they should be the values evolution selected for (either alone or in addition to other things). We're also capable of thinking about whether things like wireheading are actually good to do, even after trying it for a bit.
We don't simply commit to tricking our reward systems forever and only doing that, for example.
So that overall suggests a level of coherency and consistency in the "coherent extrapolated volition" sense. Evolution enabled CEV without us becoming completely orthogonal to evolution, for example.
Unfortunately, I do not have a long response prepared to answer this (and perhaps it would be somewhat inappropriate, at this time), however I wanted to express the following:
They wear their despair on their sleeves? I am admittedly somewhat surprised by this.
I think if you ask people a question like, "Are you planning on going off and doing something / believing in something crazy?", they will, generally speaking, say "no" to that, and that is roughly more likely the more isomorphic your question is to that, even if you didn't exactly word it that way. My guess is that it was at least heavily implied that you meant "crazy" by the way you worded it.
To be clear, they might have said "yes" (that they will go and do the thing you think is crazy), but I doubt they will internally represent that thing or wanting to ...
Sometimes people want to go off and explore things that seem far away from their in-group, and perhaps are actively disfavored by their in-group. These people don't necessarily know what's going to happen when they do this, and they are very likely completely open to discovering that their in-group was right to distance itself from that thing, but also, maybe not.
People don't usually go off exploring strange things because they stop caring about what's true.
But if their in-group sees this as the person "no longer caring about truth-seeking," th...
Not sure how convinced I am by your statement. Perhaps you can add to it a bit more?
What "the math" appears to say is that if it's bad to believe things because someone told it to me "well" then there would have to be some other completely different set of criteria, that has nothing to do with what I think of it, for performing the updates.
Don't you think that would introduce some fairly hefty problems?
I suppose I have two questions which naturally come to mind here:
2. Why do you see communications as being as decoupled (rather, either that it is inherently or that it should be) from research as you currently do?
The things we need to communicate about right now are nowhere near the research frontier.
One common question we get from reporters, for example, is "why can't we just unplug a dangerous AI?" The answer to this is not particularly deep and does not require a researcher or even a research background to engage on.
We've developed a list of the couple-dozen most common questions we are asked by the press and ...
...
- Given Nate's comment: "This change is in large part an enshrinement of the status quo. Malo’s been doing a fine job running MIRI day-to-day for many many years (including feats like acquiring a rural residence for all staff who wanted to avoid cities during COVID, and getting that venue running smoothly). In recent years, morale has been low and I, at least, haven’t seen many hopeful paths before us." (Bold emphases are mine). Do you see the first bold sentence as being in conflict with the second, at all? If morale is low, why do you see that as an indica
Remember that what we decide "communicated well" to mean is up to us. So I could possibly increase my standard for that when you tell me "I bought a lottery ticket today" for example. I could consider this not communicated well if you are unable to show me proof (such as the ticket itself and a receipt). Likewise, lies and deceptions are usually things that buckle when placed under a high enough burden of proof. If you are unable to procure proof for me, I can consider that "communicated badly" and thus update in the other (correct) direction.
"Communicated...
If I'm not mistaken, if A = "Dagon has bought a lottery ticket this week" and B = Dagon states "A", then I still think p(A | B) > p(A), even if it's possible you're lying. I think the only way it would be less than the base rate p(A) is if, for some reason, I thought you would only say that if it was definitely not the case.
To be deceptive - this is why you would ask me what your intentions are as opposed to just reveal them.
Your intent was ostensibly to show that you could argue for something badly on purpose and my rules would dictate that I update away from my own thesis.
I added an addendum for that, by the way.
If you understand the core claims being made, then unless you believe that whether or not something is "communicated well" has no relationship whatsoever with the underlying truth-values of the core claims, if it was communicated well, it should have updated you towards belief in the core claims by some non-zero amount.
All of the vice-versas are straightforwardly true as well.
let A = the statement "A" and p(A) be the probability...
My take is that they (those who make such decisions of who runs what) are pretty well-informed about these issues well before they escalate to the point that complaints bubble up into posts / threads like these.
I would have liked this whole matter to have unfolded differently. I don't think this is merely a sub-optimal way for these kinds of issues to be handled, I think this is a negative one.
I have a number of ideological differences with Nate's MIRI and Nate himself that I can actually point to and articulate, and those disagreements could b...
In the sense that the Orthogonality Thesis considers goals to be static or immutable, I think it is trivial.
I've advocated a lot for trying to consider goals to be mutable, as well as value functions being definable on other value functions. And not just that it will be possible or a good idea to instantiate value functions this way, but also that they will probably become mutable over time anyway.
All of that makes the Orthogonality Thesis - not false, but a lot easier to grapple with, I'd say.
In large part because reality "bites back" when an AI has false beliefs, whereas it doesn't bite back when an AI has the wrong preferences.
I saw that 1a3orn replied to this piece of your comment and you replied to it already, but I wanted to note my response as well.
I'm slightly confused because in one sense the loss function is the way that reality "bites back" (at least when the loss function is negative). Furthermore, if the loss function is not the way that reality bites back, then reality in fact does bite back, in the sense that e.g., if I have...
Getting a shape into the AI's preferences is different from getting it into the AI's predictive model. MIRI is always in every instance talking about the first thing and not the second.
Why would we expect the first thing to be so hard compared to the second thing? If getting a model to understand preferences is not difficult, then the issue doesn't have to do with the complexity of values. Finding the target and acquiring the target should have the same or similar difficulty (from the start), if we can successfully ask the model to find the target fo...
Why would we expect the first thing to be so hard compared to the second thing?
In large part because reality "bites back" when an AI has false beliefs, whereas it doesn't bite back when an AI has the wrong preferences. Deeply understanding human psychology (including our morality), astrophysics, biochemistry, economics, etc. requires reasoning well, and if you have a defect of reasoning that makes it hard for you to learn about one of those domains from the data, then it's likely that you'll have large defects of reasoning in other domains as well.
The same...
I have to agree that commentless downvoting is not a good way to combat infohazards. I'd probably take it a step further and argue that it's not a good way to combat anything, which is why it's not a good way to combat infohazards (and if you disagree that infohazards are ultimately as bad as they are called, then it would probably mean it's a bad thing to try and combat them).
Its commentless nature means it violates "norm one" (and violates it much more as a super-downvote).
It means something different than "push stuff that's not that, up", w...
It's a priori very unlikely that any post that's clearly made up of English sentences actually does not even try to communicate anything.
My point is that basically, you could have posted this as a comment on the post instead of it being rejected.
Whenever there is room to disagree about what mistakes have been made and how bad those mistakes are, it becomes more of a problem to apply an exclusion rule like this.
There's a lot of questions here: how far along the axis to apply the rule, which axis or axes are being considered, and how harsh the application of...
It was a mistake to reject this post. This seems like a case where both the rule that was applied is a mis-rule, as well as that it was applied inaccurately - which makes the rejection even harder to justify. It is also not easy to determine which "prior discussion" is being referred to by the rejection reasons.
It doesn't seem like the post was political...at all? Let alone "overly political" which I think is perhaps kind of mind-killy be applied frequently as a reason for rejection. It also is about a subject that is fairly interesting to me, at least: Se...
You write in an extremely fuzzy way that I find hard to understand.
This does. This is a type of criticism that one can't easily translate into an update that can be made to one's practice. You're not saying if I always do this or just in this particular spot, nor are you saying whether it's due to my "writing" (i.e. style) or actually using confused concepts. Also, it's usually not the case that anyone is trying to be worse at communicating, that's why it sounds like a scold.
You have to be careful using blanket "this is false" or "I can't understand any of...
It is probably indeed a crux but I don't see the reason for needing to scold someone over it.
(That's against my commenting norms by the way, which I'll note that so far you, TAG, and Richard_Kennaway have violated, but I am not going to ban anyone over it. I still appreciate comments on my posts at all, and do hope that everyone still participates. In the olden days, it was Lumifer that used to come and do the same thing.)
I have an expectation that people do not continually mix up critique from scorn, and please keep those things separate as much as possib...
First, a question, am I correct in understanding that when you write ~(A and ~A), the first ~ is a typo and you meant to write A and ~A (without the first ~)? Because is a tautology and thus maps to true rather than to false.
I thought of this shortly before you posted this response, and I think that we are probably still okay (even though strictly speaking yes, there was a typo).
Normally we have that ~A means: ~A --> A --> False. However, remember than I am now saying that we can no longer say that "~A" means that "A is False....
Well, to use your "real world" example, isn't that just the definition of a manifold (a space that when zoomed in far enough, looks flat)?
I think it satisfies the either-or-"mysterious third thing" formulae.
~(Earth flat and earth ~flat) --> Earth flat (zoomed in) or earth spherical (zoomed out) or (earth more flat-ish the more zoomed in and vice-versa).
So suppose I have ~(A and ~A). Rather than have this map to False, I say that "False" is an object that you always bounce off of; It causes you to reverse-course, in the following way:
~(A and ~A) --> False --> A or ~A or (some mysterious third thing). What is this mysterious third thing? Well, if you insist that A and ~A is possible, then it must be an admixture of these two things, but you'd need to show me what it is for that to be allowed. In other words:
~(A and ~A) --> A or ~A or (A and ~A).
What this statement means in semantic terms is: Suppo...
I give only maybe a 50% chance that any of the following adequately addresses your concern.
I think the succinct answer to your question is that it only matters if you happened to give me, e.g., a "2" (or anything else) and you asked me what it was and gave me your {0,1} set. In other words, you lose the ability to prove that 2 is 1 because it's not 0, but I'm not that worried about that.
It appears to be commonly said (see the last paragraph of "Mathematical Constructivism"), that proof assistants like Agda or Coq rely on not assuming LoEM. I think th...
I created this self-referential market on Manifold to test the prediction that the truth-value of such a paradox is in fact 1/2. Very few participated, but I think it should always resolve to around 50%. Rather than say such paradoxes are meaningless, I think they can be meaningfully assigned a truth-value of 1/2.
what I think is "of course there are strong and weak beliefs!" but true and false is only defined relative to who is asking and why (in some cases), so you need to consider the context in which you're applying LoEM.
Like in my comment to Richard_Kennaway about probability, I am not just talking about beliefs, but about what is. Do we take it as an axiom or a theorem that A or ~A? Likewise for ~(A and ~A)? I admit to being confused about this. Also, does "A" mean the same thing as "A = True"? Does "~A" mean the same thing as "A = False"? If so, in what sense...
A succinct way of putting this would be to ask: If I were to swap the phrase "law of the excluded middle" in the piece for the phrase "principle of bivalence" how much would the meaning of it change as well as overall correctness?
Additionally, suppose I changed the phrases in just "the correct spots." Does the whole piece still retain any coherence?
If there are propositions or axioms that imply each other fairly easily under common contextual assumptions, then I think it's reasonable to consider it not-quite-a-mistake to use the same name for such propositions.
One of the things I'm arguing is that I'm not convinced that imprecision is enough to render a work "false."
Are you convinced those mistakes are enough to render this piece false or incoherent?
That's a relevant question to the whole point of the post, too.
Indeed. (You don't need to link the main wiki entry, thanks.)
There's some subtlety though. Because either P might be true or not P, and p(P) expresses belief that P is true. So I think probability merely implies that the LoEM might be unnecessary, but it itself pretty much assumes it.
It is sometimes, but not always the case, that p(P) = 0.5 resolves to P being "half-true" once observed. It also can mean that P resolves to true half the time, or just that we only know that it might be true with 0.5 certainty (the default meaning).
The issue that I'm primarily talking about is not so much in the way that errors are handled, it's more about the way of deciding what constitutes an exception to a general rule, as Google defines the word "exception":
a person or thing that is excluded from a general statement or does not follow a rule.
In other words, does everything need a rule to be applied to it? Does every rule need there to be some set of objects under which the rule is applied that lie on one side of the rule rather than the other (namely, the smaller side)?
As soon as we step o...
Raemon's comment below indicates mostly what I meant by:
It seems from talking to the mods here and reading a few of their comments on this topic that they tend to learn towards them being harmful on average and thus need to be pushed down a bit.
Furthermore, I think the mods' stance on this is based primarily on Yudkowsky's piece here. I think the relevant portion of that piece is this (emphases mine):
...But into this garden comes a fool, and the level of discussion drops a little—or more than a little, if the fool is very prolific in their posting.
Both views seem symmetric to me:
Because I can sympathize with both views here, I think we should consider remaining agnostic to which is actually the case.
It seems like the major crux here is whether we think that debates over claim and counter-claim (basically, other cruxes) are likely to be useful or likely to cause harm. It seems from talking to the mods here and reading a few of...
It seems like a big part of this story is mainly about people who have relatively strict preferences kind of aggressively defending their territory and boundaries, and how when you have multiple people like this working together on relatively difficult tasks (like managing the logistics of travel), it creates an engine for lots of potential friction.
Furthermore, when you add the status hierarchy of a typical organization, combined with the social norms that dictate how people's preferences and rights ought to be respected (and implicit agreements bei...
I think it might actually be better if you just went ahead with a rebuttal, piece by piece, starting with whatever seems most pressing and you have an answer for.
I don't know if it is all that advantageous to put together a long mega-rebuttal post that counters everything at once.
Then you don't have that demand nagging at you for a week while you write the perfect presentation of your side of the story.
I think it would be difficult to implement what you're asking for without needing to make the decision about whether investing time in this (or other) subjects is worth anyone's time on behalf of others.
If you notice in yourself that you have conflicting feelings about whether something is good for you to be doing, e.g., in the sense which you've described: that you feel pulled in by this, but have misgivings about it, then I recommend considering this situation to be that you have uncertainty about what you ought to be doing, as opposed to being more cert...
It seems plausible that there is no such thing as "correct" metaphilosophy, and humans are just making up random stuff based on our priors and environment and that's it and there is no "right way" to do philosophy, similar to how there are no "right preferences".
We can always fall back to "well, we do seem to know what we and other people are talking about fairly often" whenever we encounter the problem of whether-or-not a "correct" this-or-that actually exists. Likewise, we can also reach a point where we seem to agree that "everyone seems to agree that o...
If we permit that moral choices with very long-term time horizons can be made with the upmost well-meaning intentions and show evidence of admirable character traits, but nevertheless have difficult-to-see consequences with variable outcomes, then I think that limits us considerably in how much we can retrospectively judge specific individuals.
I wouldn't aim to debate you but I could help you prepare for it, if you want. I'm also looking for someone to help me write something about the Orthogonality Thesis and I know you've written about it as well. I think there are probably things we could both add to each other's standard set of arguments.
I think that I largely agree with this post. I think that it's also a fairly non-trivial problem.
The strategy that makes the most sense to me now is that one should argue with people as if they meant what they said, even if you don't currently believe that they do.
But not always - especially if you want to engage with them on the point of whether they are indeed acting in bad faith, and there comes a time when that becomes necessary.
I think pushing back against the norm that it's wrong to ever assume bad faith is a good idea. I don't thin...
I think your view involves a bit of catastrophizing, or relying on broadly pessimistic predictions about the performance of others.
Remember, the "exception throwing" behavior involves taking the entire space of outcomes and splitting it into two things: "Normal" and "Error." If we say this is what we ought to do in the general case, that's basically saying this binary property is inherent in the structure of the universe.
But we know that there's no phenomenon that can be said to actually be an "error" in some absolute, metaphysical sense. This ...
This is a good reply, because its objections are close to things I already expect will be cruxes.
If you need a strong guarantee of correctness, then this is quite important. I'm not so sure that this is always the case in machine learning, since ML models by their nature can usually train around various deficiencies;
Yeah, I'm interested in why we need strong guarantees of correctness in some contexts but not others, especially if we have control over that aspect of the system we're building as well. If we have choice over how much the system itself c...
Let's try and address the thing(s) you've highlighted several times across each of my comments. Hopefully, this is a crux that we can use to try and make progress on:
"Wanting to be happy" is pretty much equivalent to being a utility-maximizer, and agents that are not utility-maximizers will probably update themselves to be utility-maximizers for consistency.
because they are compatible with goals that are more likely to shift.
...it makes more sense to swap the labels "instrumental" and "terminal" such that things like self-preservation, obtaining resourc
Apologies if this reply does not respond to all of your points.
I would observe that partial observability makes answering this question extraordinarily difficult. We lack interpretability tools that would give us the ability to know, with any degree of certainty, whether a set of behaviors are an expression of an instrumental or terminal goal.
I would posit that perhaps that points to the distinction itself being both too hard as well as too sharp to justify the terminology used in the way that they currently are. An agent could just tell you whether a spec...
My understanding of the difference between a "terminal" and "instrumental" goal is that a terminal goal is something we want, because we just want it. Like wanting to be happy.
One question that comes to mind is, how would you define this difference in terms of properties of utility functions? How does the utility function itself "know" whether a goal is terminal or instrumental?
One potential answer - though I don't want to assume just yet that this is what anyone believes - is that the utility function is not even defined on instrumental goals, in other wo...
"Being unlikely to conflict with other values" is not at the core of what characterizes the difference between instrumental and terminal values.
I think this might be an interesting discussion, but what I was trying to aim at was the idea that "terminal" values are the ones most unlikely to be changed (once they are obtained), because they are compatible with goals that are more likely to shift. For example, "being a utility-maximizer" should be considered a terminal value rather than an instrumental one. This is one potential property of terminal values; I...
Humans don't think "I'm not happy today, and I can't see a way to be happy, so I'll give up the goal of wanting to be happy."
I agree that they don't usually think this. If they tried to, they would brush up against trouble because that would essentially lead to a contradiction. "Wanting to be happy" is pretty much equivalent to being a utility-maximizer, and agents that are not utility-maximizers will probably update themselves to be utility-maximizers for consistency.
So "being happy" or "being a utility-maximizer" will probably end up being a termin...