All of Nathaniel Monson's Comments + Replies

Minor nitpicks: -I read "1 angstrom of uncertainty in 1 atom" as the location is normally distributed with mean <center> and SD 1 angstrom, or as uniformly distributed in solid sphere of radius 1 angstrom. Taken literally, though, "perturb one of the particles by 1 angstrom in a random direction" is distributed on the surface of the sphere (particle is known to be exactly 1 angstrom from <center>). -The answer will absolutely depend on the temperature. (in a neighborhood of absolute zero, the final positions of the gas particles are very close... (read more)

"we don't know if deceptive alignment is real at all (I maintain it isn't, on the mainline)."

You think it isn't a substantial risk of LLMs as they are trained today, or that it isn't a risk of any plausible training regime for any plausible deep learning system? (I would agree with the first, but not the second)

3ryan_greenblatt
See TurnTrout's shortform here for some more discussion.

I agree in the narrow sense of different from bio-evolution, but I think it captures something tonally correct anyway.

2the gears to ascension
this has been an ongoing point of debate recently, and I think we can do much better than incorrect analogy to evolution.
Answer by Nathaniel Monson1-8

I like "evolveware" myself.

2the gears to ascension
it's distinctly not evolved. gradients vs selection-crossover-mutate are very different algos.

I'm not really sure how it ended up there--probably childhood teaching inducing that particular brain-structure? It's just something that was a fundamental part of who I understood myself to be, and how I interpreted my memories/experiences/sense-data. After I stopped believing in God, I definitely also stopped believing that I existed. Obviously, this-body-with-a-mind exists, but I had not identified myself as being that object previously--I had identified myself as the-spirit-inhabiting-this-body, and I no longer believed that existed.

This is why I added "for the first few". Let's not worry about the location, just say "there is a round cube" and "there is a teapot".

Before you can get to either of these axioms , you need some things like "there is a thing I'm going to call reality that it's worth trying to deal with" and "language has enough correspondence to reality to be useful". With those and some similar very low level base axioms in place (and depending on your definitions of round and cube and teapot), I agree that one or another of the axioms could reasonably be called more or l... (read more)

1Ninety-Three
All your examples of high-tier axioms seem to fall into the category of "necessary to proceed", the sort of thing where you can't really do any further epistemology if the proposition is false. How did the God axiom either have that quality or end up high on the list without it?
Answer by Nathaniel Monson20

I don't think I believe in God anymore--certainly not in the way I used to--but I think if you'd asked me 3 years ago, I would have said that I take it as axiomatic that God exists. If you have any kind of consistent epistemology, you need some base beliefs from which to draw the conclusions and one of mine was the existence of an entity that cared about me (and everyone on earth) on a personal level and was sufficiently more wise/intelligent/powerful/knowledgeable than me that I may as well think of it as infinitely so.

I think the religious people I know... (read more)

1Ninety-Three
Surely some axioms can be more rationally chosen than others. For instance, "There is a teapot orbiting the sun somewhere between Earth and Mars" looks like a silly axiom, but "there is a round cube orbiting the sun somewhere between Earth and Mars" looks even sillier. Assuming the possibility of round cubes seems somehow more "epistemically expensive" than assuming the possibility of teapots.

That's fair. I guess I'm used to linkposts which are either full, or a short enough excerpt that I can immediately see they aren't full.

I really appreciated both the original linked post and this one. Thank you, you've been writing some great stuff recently.

One strategy I have, as someone who simultaneously would like to be truth-committed and also occasionally jokes or teases loved ones ("the cake you made is terrible! No one else should have any, I'll sacrifice my taste buds to save everyone!") is to have triggers for entering quaker-mode; if someone asks me a question involving "really" or "actually", I try to switch my demeanour to clearly sincere, and give a literally honest answer. I... hope? that having an explicit mode of truth this way blunts some of the negatives of frequently functioning as an actor.

2Screwtape
You are welcome, and thank you for saying so! I think the triggers for quaker-mode are a decent way of handling it. I try and use both triggers and to switch based on mood and to remember which people are more Quakerish and which are more Actorish, but that pile of heuristics is not always reliable. It mostly works! Sometimes it doesn't, and then I sort things out as best I can. One Parselmouth to another, I hope it works too.

I actually fundamentally agree with most/all of it, I just wanted a cookie :)

I strongly disagreed with all of this!

.

.

.
(cookie please!)

Have an internet cookie for stating there's a disagreement! Can you elaborate a little more?

Glad to, thanks for taking it well.

I think this would have been mitigated by something at the beginning saying "this is an excerpt of x words of a y word post located at url", so I can decide at the outset to read here, read there, or skip.

Is the reason you didn't put the entire thing here basically blog traffic numbers?

1spencerg
At the top it says it’s a link post and links to the full post, I thought that would make it clear that it’s a link post not a full post. It’s difficult to keep three versions in sync as I fix typos and correct mistakes, which is why I prefer to not have three separate full versions.

(I didn't downvote, but here's a guess) I enjoyed what there was of it, but I got really irritated by "This is not the full post - for the rest of it, including an in-depth discussion of the evidence for and against each of these theories, you can find the full version of this post on my blog". I don't know why this bothers me--maybe because I pay some attention to the "time to read" tag at the top, or because having to click through to a different page feels like an annoyance with no benefit to me.

3spencerg
Thanks for letting me know

If you click the link where OP introduces the term, it's the Wikipedia page for psychopathy. Wiki lists 3 primary traits for it, one of which is DAE

0M. Y. Zuo
Is there a specific reason 'affective' was chosen instead of 'emotional' in the naming?  Is it also a connotation issue?

The statement seems like it's assuming:

  1. we know roughly how to build AGI

  2. we decide when to do that

  3. we use the time between now and then to increase chance of successful alignment

  4. if we succeed in alignment early enough, you and your loved ones won't die

I don't think any of these are necessarily true, and I think the ways they are false is asymmetric in a manner that favors caution

2Dagon
It's also assuming: 1. We know roughly how to achieve immortality 2. We can do that exactly in the window of "the last possible moment" of AGI. 3. Efforts between immortality and AGI are fungible and exclusive, or at least related in some way. 4. Ok, yeah - we have to succeed on BOTH alignment and immortality to keep any of us from dying.  3 and 4 are, I think, the point of the post.  To the extent that work on immortality rather than alignment, we narrow the window of #2, and risk getting neither.

I appreciated your post, (indeed, I found it very moving) and found some of the other comments frustrating as I believe you did. I think, though, that I can see a part of where they are coming from. I'll preface by saying I don't have strong beliefs on this myself, but I'll try to translate (my guess at) their world model.

I think the typical EA/LWer thinks that most charities are ineffective to the point of uselessness, and this is due to them not being smart/rational about a lot of things (and are very familiar with examples like the millennium village).... (read more)

1Lyrongolem
Thanks so much for your comment!  Hm... yes, upon further reflection your summarization seems accurate, or at least highly plausible. I am not too sure what the mindset of the average LWer or EA looks like myself. (although I've frequented the site for some time, I'm mainly reading random frontpage posts that pique my interest, I don't attend meetups, participate in group activities, or much other things of that nature) It's not merely reading like I haven't engaged much in their world. The truth is I simply haven't, I have no intention of hiding it. I tagged the post EA because my points on aid address charities in general quite broadly, and so I thought it would be of interest to EA adjacent individuals. I also hoped that they might be able to enlighten me a bit on the many parts of EA I still don't fully understand. The post was never meant to critique or even focus on EA.  This may have gotten lost in everything else I was attempting to do in the post, but one of the central motivations was to disprove a point I saw in a RA fundraiser that unconditional cash transfers could 'eradicate' global poverty. I found the initiative commendable, but unrealistic for a variety of reasons, many of which I detailed in the post. I never meant to say the aid wouldn't help, but rather, it was likely insufficient to meet their goal of ending long term poverty.  That said, yes, you are right. My evidence does not support the claim that aid is completely ineffective in ending long term poverty. But rather, that aid requires much higher volumes to solve the long term issues, in conjunction with many other things. In my mind this was still meant aid was an inadequate solution since I didn't believe the volumes required to solve the issue would be a reasonable demand upon charity or foreign aid (just look at the enormous price tag of millennium villages). Thinking back, I probably exaggerated a bit in the title and in some of my claims. While the logical points may have been sound

This is more a tangent than a direct response--I think I fundamentally agree with almost everything you wrote--but I dont think virtue ethics requires tossing out the other two (although I agree both of the others require tossing out each other). 

I view virtue ethics as saying, roughly, "the actually important thing almost always is not how you act in contrived edge case thought experiments, but rather how how habitually act in day to day circumstances. Thus you should worry less, probably much much less, about said thought experiments, and worry more... (read more)

1Matt Goldenberg
I think virtue ethics is a practical solution, but if you just say "if corner cases show up, don't follow it" means you're doing something else other than being a virtue ethicist.

I agree with the first paragraph, but strongly disagree with the idea this is "basically just trying to align to human values directly".

Human values are a moving target in a very high dimensional space, which needs many bits to specify. At a given time, this needs one bit. A coinflip has a good shot. Also, to use your language, I think "human is trying to press the button" is likely to form a much cleaner natural abstraction than human values generally.

Finally, we talk about getting it wrong being really bad. But there's a strong asymmetry --one direction ... (read more)

3EJT
Here's a problem that I think remains. Suppose you've got an agent that prefers to have the button in the state that it believes matches my preferences. Call these 'button-matching preferences.' If the agent only has these preferences, it isn't of much use. You have to give the agent other preferences to make it do useful work. And many patterns for these other preferences give the agent incentives to prevent the pressing of the button. For example, suppose the other preferences are: 'I prefer lottery X to lottery Y iff lottery X gives a greater expectation of discovered facts than lottery Y.' An agent with these preferences would be useful (it could discover facts for us), but it also has incentives to prevent shutdown: it can discover more facts if it remains operational.  And it seems difficult to ensure that the agent's button-matching preferences will always win out over these incentives.  In case you're interested, I discuss something similar here and especially in section 8.2.

If I had clear lines in my mind between AGI capabilities progress, AGI alignment progress, and narrow AI progress, I would be 100% with you on stopping AGI capabilities. As it is, though, I don't know how to count things. Is "understanding why neural net training behaves as it does" good or bad? (SLT's goal). Is "determining the necessary structures of intelligence for a given architecture" good or bad? (Some strands of mech interp). Is an LLM narrow or general?

How do you tell, or at least approximate? (These are genuine questions, not rhetorical)

In the spirit of "no stupid questions", why not have the AI prefer to have the button in the state that it believes matches my preferences?

I'm aware this fails against AIs that can successfully act highly manipulative towards humans, but such an AI is already terrifying for all sorts of other reasons, and I think the likelihood of this form of corrigibility making a difference given such an AI is quite low.

Is the answer roughly "we don't care about the off-button specifically that much, we care about getting the AI to interact with human preferences which are changeable without changing them"?

2johnswentworth
Trying to change the human's preference to match the button is one issue there. The other issue is that if the AI incorrectly estimates the human's preferences (or, more realistically, we humans building the AI fail to operationalize "our preference re:button state" in such a way that the thing the AI is aimed at doesn't match what we intuitively mean by that phrase), then that's really bad. Another frame: this would basically just be trying to align to human values directly, and has all the usual problems with directly aligning to human values, which is exactly what all this corrigibility-style stuff was meant to avoid.

Question for Jacob: suppose we end up getting a single, unique, superintelligent AGI, and the amount it cares about, values, and prioritizes human welfare relative to its other values is a random draw with probability distribution equal to how much random humans care about maximizing their total number of direct descendents.

Would you consider that an alignment success?

3jacob_cannell
I actually answered that towards the end: So it'd be a random draw with a fairly high p(doom), so no not a success in expectation relative to the futures I expect. In actuality I expect the situation to be more multipolar, and thus more robust due to utility function averaging. If power is distributed over N agents each with a utility function that is variably but even slightly aligned to humanity in expectation, that converges with increasing N to full alignment at the population level[1]. ---------------------------------------- 1. As we expect the divergence in agent utility functions to all be from forms of convergent selfish empowerment, which are necessarily not aligned with each other (ie none of the AIs are inter aligned except through variable partial alignment to humanity). ↩︎

Thanks for writing this! I strongly appreciate a well-thought out post in this direction.

My own level of worry is pretty dependent on a belief that we know and understand shaping NN behaviors much better than we do (values/goals/motivations/desires) (although I don't think eg chatGPT has any of the latter in the first place). Do you have thoughts on the distinction between behaviors and goals? In particular, do you feel like you have any evidence we know how to shape/create/guide goals and values, rather than just behaviors?

I don't think the end result is identical. If you take B, you now have evidence that, if a similar situation arises again, you won't have to experience excruciating pain. Your past actions and decisions are relevant evidence of future actions and decisions. If you take drug A, your chance of experiencing excruciating pain at some point in the future goes up (at least your subjective estimation of the probability should probably go up at least a bit.) I would pay a dollar to lower my best rational estimate of the chance of something like that happening to me--wouldn't you?

In the dual interest of increasing your pleasantness to interact with and your epistemic rationality, I will point out that your last paragraph is false. You are allowed to care about anything and everything you may happen to care about or choose to care about. As an aspiring epistemic rationalist, the way in which you are bound is to be honest with yourself about message-description lengths, and your own values and your own actions, and the tradeoffs they reflect.

If a crazy person holding a gun said to you (and you believed) "i will shoot you unless you t... (read more)

My understanding of the etymology of "toe the line" is that it comes from the military--all the recuits in a group lining up , with their toes touching (but never over!) a line. Hence "I need you all to toe the line on this" means "do exactly this, with military precision"

1Dweomite
Yes.  (Which is very different from "stay out of this one forbidden zone, while otherwise doing whatever you want.")

I think I would describe both of those as deceptive, and was premising on non-deceptive AI.

If you think "nondeceptive AI" can refer to an AI which has a goal and is willing to mislead in service of that goal, then I agree; solving deception is insufficient. (Although in that case I disagree with your terminology).

3Charlie Steiner
Fair point (though see also the section on how the training+deployment process can be "deceptive" even if the AI itself never searches for how to manipulate you). By "Solve deception" I mean that in a model-based RL kind of setting, we can know the AI's policy and its prediction of future states of the world (it doesn't somehow conceal this from us). I do not mean that the AI is acting like a helpful human who wants to be honest with us, even though that that's a fairly natural interpretation.

I think the people I know well over 65 (my parents, my surviving grandparent, some professors) are trying to not get COVID--they go to stores only in off-peak hours, avoid large gatherings, don't travel much. These seem like basically worth-it decisions to me (low benefit, but even lower cost). This means that their chance of getting COVID is much much higher when, eg, seeing relatives who just took a plane flight to see them.

I agree that the flu is comparably worrisome, and it wouldn't make sense to get a COVID booster but not a flu vaccine.

Those doesn't necessarily seem correct to me. If, eg, OpenAI develops a super intelligent, non deceptive AI, then I'd expect some of the first questions they'd ask it to be of the form "are there questions which we would regret asking you, according to our own current values? How can we avoid asking you those while still getting lots of use and insight from you? What are some standard prefaces we should attach to questions to make sure following through on your answer is good for us? What are some security measures that we can take to make sure our users ... (read more)

6Joe Collman
I think it's very important to be clear you're not conditioning on something incoherent here. In particular, [an AI that never misleads the user about anything (whether intentional or otherwise)] is incoherent: any statement an AI can make will update some of your expectations in the direction of being more correct, and some away from being correct. (it's important here that when a statement is made you don't learn [statement], but rather [x made statement]; only the former can be empty) I say non-misleading-to-you things to the extent that I understand your capabilities and what you value, and apply that understanding in forming my statements. [Don't ever be misleading] cannot be satisfied. [Don't ever be misleading in ways that we consider important] requires understanding human values and optimizing answers for non-misleadingness given those values. NB not [answer as a human would], or [give an answer that a human would approve of]. With a fuzzy notion of deception, it's too easy to do a selective, post-hoc classification and say "Ah well, that would be deception" for any outcome we don't like. But the outcomes we like are also misleading - just in ways we didn't happen to notice and care about. This smuggles in a requirement that's closer in character to alignment than to non-deception. Conversely, non-fuzzy notions of deception don't tend to cover all the failure modes (e.g. this is nice, but avoiding deception-in-this-sense doesn't guarantee much).
3Daniel Kokotajlo
I tentatively agree and would like to see more in-depth exploration of failure modes + fixes, in the setting where we've solved deception. It seems important to start thinking about this now, so we have a playbook ready to go...
4Charlie Steiner
EDIT: I should acknowledge that conditioning on a lot of "actually good" answers to those questions would indeed be reassuring. The point is more that humans are easily convinced by "not actually good" answers to those questions, if the question-answerer has been optimized to get human approval.   ORIGINAL REPLY: Okay, suppose you're a AI that wants something bad (like maximizing pleasure), and also has been selected to produce text that is honest and that causes humans to strongly approve of you. Then you're asked What honest answer can you think of would cause humans to strongly approve of you, and will let you achieve your goals? How about telling the humans they would regret asking about how to construct biological weapons or similar dangerous technologies? How about appending text explaining your answer that changes the humans' minds to be more accepting of hedonic utilitarianism? If the question is extra difficult for you, like , dissemble! Say the question is unclear (all questions are unclear) and then break it down in a way that causes the humans to question whether they really want their own current desires to be stamped on the entire future, or whether they'd rather trust in some value extrapolation process that finds better, more universal things to care about.

Surely your self-estimated chance of exposure and number of high-risk people you would in turn expose should factor in somewhere? I agree with you for people who aren't traveling, but someone who, eg, flies into a major conference and then is visiting a retirement home the week after is doing a different calculation.

2johnhalstead
I don't think that makes much difference because I don't think it has much effect on the total number of infections - you would really be changing the time at which someone gets the virus given that we're not trying to contain it anymore.  One way round the concern about visiting the retirement home would be to do a lateral flow test before you go in. If you're seeing extremely vulnerable people a lot, then it might be worth getting the vaccine. But the IFR is now lower than the flu for all ages and I think should be treated accordingly

When I started trying to think rigorously about this a few months ago, I realized that I don't have a very good definition of world model. In particular, what does it mean to claim a person has a world model? Given a criteria for an LLM to have one, how confident am I that most people would satisfy the criteria?

I think it is 2-way, which is why many (almost all?) Alignment researchers have spent a significant amount of time looking at ML models and capabilities, and have guesses about where those are going.

In that case, I believe your conjecture is trivially true, but has nothing to do with human intelligence or Bengio's statements. In context, he is explicitly discussing low dimensional representations of extremely high dimensional data, and the things human brains learn to do automatically (I would say analogously to a single forward pass).

If you want to make it a fair fight, you either need to demonstrate a human who learns to recognize primes without any experience of the physical world (please don't do this) or allow an ML model something more analogous to the data humans actually receive, which includes math instruction, interacting with the world, many brain cycles, etc

1Alexander Kolpakov
I also believe my conjecture is true, however non-trivially. At least, mathematically non-trivially. Otherwise,  all is trivial when the job is done. 
1Alexander Kolpakov
I also believe my conjecture is true, however non-trivially. At least, mathematically non-trivially. Otherwise,  all is trivial while the job is done. 
1Aidan Rocke
Regarding your remark on finding low-dimensional representations, I have added a section on physical intuitions for the challenge. Here I explain how the prime recognition problem corresponds to reliably finding a low-dimensional representation of high-dimensional data. 

I agree with your entire first paragraph. It doesn't seem to me that you have addressed my question though. You are claiming that this hypothesis "implies that machine learning alone is not a complete path to human-level intelligence." I disagree. If I try to design an ML model which can identify primes, is it fair for me to give it some information equivalent to the definition (no more information than a human who has never heard of prime numbers has)?

If you allow that it is fair for me to do so, I think I can probably design an ML model which will do thi... (read more)

1Alexander Kolpakov
Does any ML model that tells cats from dogs get definitions thereof? I think the only input it gets is "picture:(dog/cat)label". It does learn to tell them apart, to some degree, at least. One would expect the same approach here. Otherwise you can ask right away for the sieve of Eratosthenes as a functional and inductive definition, in which case things get easy ...

"implies that machine learning alone is not a complete path to human-level intelligence."

I don't think this is even a little true, unless you are using definitions of human level intelligence and machine learning which are very different than the ideas I have of them.

If you have a human who has never heard of the definition of prime numbers, how do you think they would do on this test? Am I allowed to.supply my model with something equivalent to the definition?

1Ilio
Let P, P’, P’’= « machine learning alone », « machine learning + noise », « is not a complete path to human-level intelligence » A few follow up questions: do you also think that P+P’’ == P’+P’’ ? Is your answer proven or more or less uncontroversial ? (ref welcome!)
5Aidan Rocke
The best physicists on Earth, including Edward Witten and Alain Connes, believe that the distribution of primes and Arithmetic Geometry encode mathematical secrets that are of fundamental importance to mathematical physics. This is why the Langlands program and the Riemann Hypothesis are of great interest to mathematical physicists. If number theory, besides being of fundamental importance to modern cryptography, allows us to develop a deep understanding of the source code of the Universe then I believe that such advances are a critical part of human intelligence, and would be highly unlikely if the human brain had a different architecture.

Have you looked into new angeles? Action choices are cooperative, with lots of negotiation. Each player is secretly targeting another player, and wins if they end with more points than their target (so you could have a 6 player game where the people who ended with most, and 4th and 5th most win, while 2nd, 3rd, and 6th lose.)

4mako yass
Hmm isn't that a situation where the the number of people who will win is normally distributed with mean n/2 when people don't know who's targeting who, but under transparency you could reliably have n-1 people win? (By picking one scapegoat, then allocating points to people in ascending order from the one who must beat the scapegoat, and then the one who must beat them, and so on, until the one who the scapegoat was to beat?) I often get the sense that these games are broken for enlightened players, the way to win is to coordinate, but the game implicitly communicates that you're not supposed to, which is so wrong.
2mako yass
I'd like to try this game, but it's extraordinary that they managed to make a multi-winner game that is still all about outscoring others.

This comment confuses me.

  1. Why is Tristan in quotes? Do you not believe it's his real name?
  2. What is the definition of the community you're referring to?
  3. I don't think I see any denigration happening --what are you referring to?
  4. What makes someone an expert or an imposter in your eyes? In the eyes of the community?

I clicked the link in the second email quite quickly--i assumed it was a game/joke, and wanted to see what would happen. If I'd actually thought I was overriding people's preferences, I... probably would have still clicked because I don't think I place enormous value on people's preferences for holiday reasons, and I would have enjoyed being the person who determined it.

There are definitely many circumstances where I wouldn't unilaterally override a majority. I should probably try to figure out what the principles behind those are.

1CBiddulph
Came here to say this - I also clicked the link because I wanted to see what would happen. I wouldn't have done it if I hadn't already assumed it was a social experiment.

I have a strong preference for non-ironic epistemic status. Can you give one?

-2rogersbacon
then read it again but non-ironically 

If the review panel recommends a paper for a spotlight, there is a better than 50% chance a similarly-constituted review panel would have rejected the paper from the conference entirely:

https://blog.neurips.cc/2021/12/08/the-neurips-2021-consistency-experiment/

Not OP, but relevant -- I spent the last ~6 months going to meetings with [biggest name at a top-20 ML university]'s group. He seems to me like a clearly very smart guy (and very generous in allowing me to join), but I thought it was quite striking that almost all his interests were questions of the form "I wonder if we can get a model to do x", or "if we modify the training in way y, what will happen?" A few times I proposed projects about "maybe if we try z, we can figure out why b happens" and he was never very interested --a near exact quote of his in ... (read more)

To me, it sounds like A is a member of a community which A wants to have certain standards and B is claiming membership in that community while not meeting those. In that circumstance, I think a discussion between various members of the community about obligations to be part of that community and the community's goals and beliefs and how these things relate is very very good. Do you

A) disagree with that framing of the situation in the dialogue

B) disagree that in the situation I described a discussion is virtuous, verging on necessary

C) other?

8Said Achmiz
Indeed, I disagree with that characterization of the situation in the dialogue. For one thing, there’s no indication that Bob is claiming to be a member of anything. He’s “interested in Effective Altruism”, and he “want[s] to help others and … genuinely care[s] about positive impact, and ethical obligations, and utilitarian considerations”, and he also (according to Alice!) “claim[s] to really care about improving the world”, and (also according to Alice!) “claim[s] to be a utilitarian”. But membership in some community? I see no such claim on Bob’s part. But also, and perhaps more importantly: suppose for a moment that “Effective Altruism” is, indeed, properly understood as a “community”, membership in which it is reasonable to gatekeep in the sort of way you describe.[1] It might, then, make sense for Alice to have a discussion with Carol, Dave, etc.—all of whom are members-in-good-standing of the Effective Altruism community, and who share Alice’s values, as well as her unyielding commitment thereto—concerning the question of whether Bob is to be acknowledged as “one of us”, whether he’s to be extended whatever courtesies and privileges are reserved for good Effective Altruists, and so on. However, the norm that Bob, himself, is answerable to Alice—that he owes Alice a justification for his actions, that Alice has the right to interrogate Bob concerning whether he’s living up to his stated values, etc.—that is a deeply corrosive norm. It ought not be tolerated. Note that this is different from, say, engaging a willing Bob in a discussion about what his behavior should be (or about any other topic whatsoever)! This is a key aspect of the situation: Bob has expressed that he considers his behavior none of Alice’s business, but Alice asserts the standing to interrogate Bob anyway, on the reasoning that perhaps she might convince him after all. It’s that which makes Bob’s failure to stand up for his total lack of obligation to answer to Alice for his actions dep

Lots of your comments on various posts seem rude to me--should I be attempting to severely punish you?

7Said Achmiz
The behavior I was referring to, specifically, is not rudeness (or else I’d have quoted Alice’s first comment, not her second one), but rather Alice taking as given the assumption that she has some sort of claim on Bob’s reasons for his actions—that Bob has some obligation to explain himself, to justify his actions and his reasons, to Alice. It is that assumption which must be firmly and implacably rejected at once. Bob should make clear to Alice that he owes her no explanations and no justifications. By indulging Alice, Bob is giving her power over himself that he has no reason at all to surrender. Such concessions are invariably exploited by those who wish to make use of others as tools to advance their own agenda. Bob’s first response was correct. But—out of weakness, lack of conviction, or some other flaw—he didn’t follow up. Instead, he succumbed to the pressure to acknowledge Alice’s claim to be owed a justification for his actions, and thus gave Alice entirely undeserved power. That was a mistake—and what’s more, it’s a mistake that, by incentivizing Alice’s behavior, has anti-social consequences, which degrade the moral fabric of Bob’s community and society.

I am genuinely confused why this is on lesswrong instead of EA. What do you think the distribution of giving money is like on each place, and what do you think the distribution of responses to drowning child is like on each?

9Firinn
Hmm, I think I could be persuaded into putting it on the EA Forum, but I'm mildly against it:  * It is literally about rationality, in the sense that it's about the cognitive biases and false justifications and motivated reasoning that cause people to conclude that they don't want to be any more ethical than they currently are; you can apply the point to other ethical systems if you want, like, Bob could just as easily be a religious person justifying why he can't be bothered to do any pilgrimages this year while Alice is a hotshot missionary or something. I would hope that lots of people on LW want to work harder on saving the world, even if they don't agree with the Drowning Child thing; there are many reasons to work harder on x-risk reduction.  * It's the sort of spicy that makes me worried that EAs will consider it bad PR, whereas rationalists are fine with spicy takes because we already have those in spades. I think people can effectively link to it no matter where it is, so posting it in more places isn't necessarily beneficial?  * I don't agree with everything Alice says but I do think it's very plausible that EA should be a big tent that welcomes everyone - including people who just want to give 10% and not do anything else - whereas my personal view is that the rationality community should probably be more elitist; we're supposed to be a self-improve-so-hard-that-you-end-up-saving-the-world group, damnit, not a book club for insight porn. Also it's going to be part of a sequence (conditional on me successfully finishing the other posts), and I feel like the sequence overall belongs more on LW.  I genuinely don't really know how the response to the Drowning Child differs between LW and EA! I guess I would probably say more people on the EA Forum probably donate money to charity for Drowning-Child-related reasons, but more people on LW are probably interested in philosophy qua philosophy and probably more people on LW switched careers to directly work

Minor semantic quibble: I would say we always want positive expected utility, but how that translates into money/time/various intangibles can vary tremendously both situationally and from person to person.

This was very interesting, thanks for writing it :)

My zero-knowledge instinct is that sound-wave communication would be very likely to evolve in most environments. Motion -> pressure differentials seems pretty inevitable, so would almost always be a useful sensory modality. And any information channel that is easy to both sense and affect seems likely to be used for communication. Curious to hear your thoughts if your intuition is that it would be rare.

3mruwnik
This depends on the size and distances involved, but it's a good intuition. You need a mechanism to generate the pressure differentials, which can be an issue in very small organisms, which can be an issue. Small and sedentary organisms tend to use chemical gradients (i.e. smell), but anything bigger than a mouse (and quite a few smaller things) usually has some kind of sound signals, which are really good for quick notifications in a radius around you, regardless of the light level (so you can pretty much always use it). Also, depending on the medium, sound can travel really far - like whales which communicate with each other over thousands of miles, or elephants stomping to communicate with other elephants 20 miles away.

Do you have candidates for intermediate views? Many-drafts which seem convergent, or fuzzy Cartesian theatres? (maybe graph-theoretically translating to nested subnetworks of neurons where we might say "this set is necessarily core, this larger set is semicore/core in frequent circumstances, this still larger set is usually un-core, but changeable, and outside this is nothing?)

6TAG
There's an argument that a distributed mind needs to have some sort of central executive, even if fuzzily defined, in order to make decisions about actions ... just because there ultimately have one body to control ...and it can't do contradictory things, and it can't rest in endless indecision. Consider the Lamprey: "How does the lamprey decide what to do? Within the lamprey basal ganglia lies a key structure called the striatum, which is the portion of the basal ganglia that receives most of the incoming signals from other parts of the brain. The striatum receives “bids” from other brain regions, each of which represents a specific action. A little piece of the lamprey’s brain is whispering “mate” to the striatum, while another piece is shouting “flee the predator” and so on. It would be a very bad idea for these movements to occur simultaneously – because a lamprey can’t do all of them at the same time – so to prevent simultaneous activation of many different movements, all these regions are held in check by powerful inhibitory connections from the basal ganglia. This means that the basal ganglia keep all behaviors in “off” mode by default. Only once a specific action’s bid has been selected do the basal ganglia turn off this inhibitory control, allowing the behavior to occur. You can think of the basal ganglia as a bouncer that chooses which behavior gets access to the muscles and turns away the rest. This fulfills the first key property of a selector: it must be able to pick one option and allow it access to the muscles." (Scott Alexander) But how can a selector make a decision on the basis of multiple drafts which are themselves equally weighted? If inaction is not an option , a coin needs to be flipped. Maybe it's flipped in the theatre, maybe it's cast in the homunculus, maybe there is no way of telling. But you can tell it works that way because of things like the Necker Cube illusion...your brain, as they say, can switch between two interpretations, b
2Rafael Harth
I think the philosophical component of the camps is binary, so intermediate views aren't possible. On the empirical side, the problem that it's not clear what evidence for one side over the other looks like. You kind of need to solve this first to figure out where on the spectrum a physical theory falls.
Answer by Nathaniel Monson144

The conversations I've had with people at Deepmind, OpenAI, and in academia make me very sure that lots of ideas on capabilities increases are already out there so there's a high chance anything you suggest would be something people are already thinking about. Possibly running your ideas past someone in those circles, and sharing anything they think is unoriginal would be safe-ish?

I think one of the big bottlenecks is a lack of ways to predict how much different ideas would help without actually trying them at costly large scale. Unfortunately, this is also a barrier to good alignment work. I don't have good ideas on making differential progress on this.

I think lots of people would say that all three examples you gave are more about signalling than about genuinely attempting to accomplish a goal.

3junk heap homotopy
I wouldn’t say that. Signalling the way you seem to have used it implies deception on their part, but each of these instances could just be a skill issue on their end, an inability to construct the right causal graph with sufficient resolution. For what it’s worth whatever this pattern is pointing at also applies to how wrongly most of us got the AI box problem, i.e., that some humans by default would just let the damn thing out without needing to be persuaded.

This seems like kinda a nonsense double standard. The declared goal of journalism is usually not to sell newspapers, that is your observation of the incentive structure. And while the declared goal of LW is to arrive at truth (or something similar--hone the skills which will better allow people to arrive at truth, or something), there are comparable parallel incentive structures to journalism. 

It seems better to compare declared purpose to declared purpose, or inferred goal to inferred goal, doesn't it?

5Adam Zerner
Yes, but in my judgement -- and I suspect if you averaged out the judgement of reasonable others (not limited to LessWrongers) -- LW has an actual goal that is much, much closer to arriving at the truth than journalism.
Load More