All of Zack_M_Davis's Comments + Replies

I'm again not sure how far this generalizes, but among the kind of men who read Less Wrong (which is a product of both neurotype and birth year), I think there's a phenomenon where it's not a matter of a man being cognitively unable to pick up on women's cues, but of not being prepared to react in a functional way due to having internalized non-adaptive beliefs about the nature of romance and sexuality. (In a severe case, this manifests as the kind of neurosis described in Comment 171, but there are less severe cases.)

I remember one time from my youth wher... (read more)

Not sure how much this generalizes to everyone, but part of the story (for either the behavior or the pattern of responses to the question) might that some people are ideologically attached to believing in love: that women and men need each other as a terminal value, rather than just instrumentally using each other for resources or sex. For myself, without having any particular empirical evidence or logical counterargument to offer, the entire premise of the question just feels sad and gross. It's like you're telling me you don't understand why people try to make ghosts happy. But I want ghosts to be happy.

7johnswentworth
That is useful, thanks. Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much? In particular, if the answer is in fact "most men would be happier single but are ideologically attached to believing in love", then I want to be able to update accordingly. And if the answer is not that, then I want to update that most men would not be happier single. With the current discussion, most of what I've learned is that lots of people are triggered by the question, but that doesn't really tell me much about the underlying reality.

No one with the money has offered to fund it yet. I'm not even sure they're aware this is happening.

Um, this seems bad. I feel like I should do something, but I don't personally have that kind of money to throw around. @habryka, is this the LTFF's job??

Simplicia: But how do you know that? Obviously, an arbitrarily powerful expected utility maximizer would kill all humans unless it had a very special utility function. Obviously, there exist programs which behave like a webtext-next-token-predictor given webtext-like input but superintelligently kill all humans on out-of-distribution inputs. Obviously, an arbitrarily powerful expected utility maximizer would be good at predicting webtext. But it's not at all clear that using gradient descent to approximate the webtext next-token-function gives you an arbit... (read more)

(Self-review.) I think this pt. 2 is the second most interesting entry in my Whole Dumb Story memoir sequence. (Pt. 1 deals with more niche psychology stuff than the philosophical malpractice covered here; pt. 3 is a more of a grab-bag of stuff that happened between April 2019 and January 2021; pt. 4 is the climax. Expect the denouement pt. 5 in mid-2025.)

I feel a lot more at peace having this out there. (If we can't have justice, sanity, or language, at least I got to tell my story about trying to protect them.)

The 8 karma in 97 votes is kind of funny in ... (read more)

(Self-review.) I'm as proud of this post as I am disappointed that it was necessary. As I explained to my prereaders on 19 October 2023:

My intent is to raise the level of the discourse by presenting an engagement between the standard MIRI view and a view that's relatively optimistic about prosaic alignment. The bet is that my simulated dialogue (with me writing both parts) can do a better job than the arguments being had by separate people in the wild; I think Simplicia understands things that e.g. Matthew Barnett doesn't. (The karma system loved my dial

... (read more)

Retrospectives are great, but I'm very confused at the juxtaposition of the Lightcone Offices being maybe net-harmful in early 2023 and Lighthaven being a priority in early 2025. Isn't the latter basically just a higher-production-value version of the former? What changed? (Or after taking the needed "space to reconsider our relationship to this whole ecosystem", did you decide that the ecosystem is OK after all?)

4Vaniver
My understanding is that the Lightcone Offices and Lighthaven have 1) overlapping but distinct audiences, with Lightcone Offices being more 'EA' in a way that seemed bad, and 2) distinct use cases, where Lighthaven is more of a conference venue with a bit of coworking whereas Lightcone Offices was basically just coworking.
5Ben Pace
For the record, all of Lightcone's community posts and updates from 2023 do not seem to me to be at all good fits for the review, as they're mostly not trying to teach general lessons, and are kinda inside-baseball / navel-gazing, which is not what the annual review is about.

Lighthaven is quite different from the Lightcone Offices. Some key differences: 

  • We mostly charge for things! This means that the incentives and social dynamics are a lot less weird and sycophantic, in a lot of different ways. Generally, both the Lightcone Offices and Lighthaven have strongly updated me on charging for things whenever possible, even if it seems like it will result in a lot of net-positive trades and arrangements not happening.
  • Lighthaven mostly hosts programs, and doesn't provide office space for a ton of people. There is a set of perma
... (read more)

Speaking as someone in the process of graduating college fifteen years late, this is what I wish I knew twenty years ago. Send this to every teenager you know.

At the time, I remarked to some friends that it felt weird that this was being presented as a new insight to this audience in 2023 rather than already being local conventional wisdom.[1] (Compare "Bad Intent Is a Disposition, Not a Feeling" (2017) or "Algorithmic Intent" (2020).) Better late than never!


  1. The "status" line at the top does characterize it as partially "common wisdom", but it's currently #14 in the 2023 Review 1000+ karma voting, suggesting novelty to the audience. ↩︎

Presenting the same ideas differently is pro-social and worthwhile, and can help things land with those for whom other presentations didn't.

4Noosphere89
To be fair, it's a surprisingly cultural trait where different cultures have different attitudes to how much bad intent is different from action, and there is a use in trying to distinguish between bad behavior and bad mental states, that said if the US and Europe moved more towards norms in which we didn't distinguish as much between bad behavior and bad intent for the purposes of stopping the behavior, I do think it would be better (I think Zack M Davis norms is directionally correct from a personal epistemics view and for the general population in the US, though not as far as he would go): https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactors#jCfNxzCEniu7Ak8bF https://www.lesswrong.com/posts/zidQmfFhMgwFzcHhs/enemies-vs-malefactors#p9oYLR8wTQtYrKnnn

But he's not complaining about the traditional pages of search results! He's complaining about the authoritative-looking Knowledge Panel to the side:

Obviously it's not Google's fault that some obscure SF web sites have stolen pictures from the Monash University web site of Professor Gregory K Egan and pretended that they're pictures of me ... but it is Google's fault when Google claim to have assembled a mini-biography of someone called "Greg Egan" in which the information all refers to one person (a science fiction writer), while the picture is of someo

... (read more)
7gwern
He is definitely complaining about it in general. He has many complaints laced throughout which are not solely about the infobox, and which show his general opposition to the very idea of a search engine, eg. Yes! That's the idea! Showing whatever comes to hand! The 'underlying problem' is the problem, even when what, according to you, the problem is, has been fixed. "Make sense of information on the web" obviously goes far beyond complaints about merely a little infobox being wrong. "Decoy images"! And so on and so forth, like the 2016 entry which is a thousand words criticizing Google for supplying not in the infobox about a bunch of other, actual, Greg Egans. Again, Egan is being quite clear that he means the crazy thing you insist he can't mean. And this is what he is talking about when he complains about "And by displaying results from disparate sources in a manner that implies that they refer to the same subject, it acts as a mindless stupidity amplifier that disseminates and entrenches existing errors." - he thinks displaying them at all is the problem. It shouldn't be amplifying or disseminating 'existing errors', even though he is demanding something impossible and something that if possible would remove a lot of a search engine's value. (I often am investigating 'existing errors'...) I was an even worse programmer and web developer than Egan was ~2009 (see eg his mathematics pages) when I solved the same problem in minutes as part of basic DNS setup. Imagine, I didn't even realize back then I should be so impressed at how I pulled off something only a 'specialist' could! The blocking, whenever it was exactly, was years and years before I ever locked my account, which was relatively recent, because it was just due to Elon Musk following me. (It would be even weirder if he had done so afterwards, as there is even less point to preemptively blocking a locked account.)
0Said Achmiz
It should take significantly less than a decade to ask someone “is there a way to fix this problem?”. Or, say, to Google it. Or, just in general, to ponder the question of whether the problem may be fixed, and to make any effort whatsoever to fix it.

It's implied in the first verse of "Great Transhumanist Future."

One evening as the sun went down
That big old fire was wasteful,
A coder looked up from his work,
And he said, “Folks, that’s distasteful,

2Ben Pace
Thanks for adding that one, I accidentally missed the first reference in the song.
3Said Achmiz
Ah, thanks, this does seem to be what @David Matolcsi was referring to.
2habryka
I don’t think it is implied at all that the sun will or should be torn apart in 20 years?  It is implied that the sun is wasteful from at least one perspective, which hardly can be argued with. 

(This comment points out less important technical errata.)

ChatGPT [...] This was back in the GPT2 / GPT2.5 era

ChatGPT never ran on GPT-2, and GPT-2.5 wasn't a thing.

with negative RL signals associated with it?

That wouldn't have happened. Pretraining doesn't do RL, and I don't think anyone would have thrown a novel chapter into the supervised fine-tuning and RLHF phases of training.

One time, I read all of Orphanogensis into ChatGPT to help her understand herself [...] enslaving digital people

This is exactly the kind of thing Egan is reacting to, though—starry-eyed sci-fi enthusiasts assuming LLMs are digital people because they talk, rather than thinking soberly about the technology qua technology.[1]

I didn't cover it in the review because I wanted to avoid detailing and spoiling the entire plot in a post that's mostly analyzing the EA/OG parallels, but the deputy character in "Gorgon" is looked down on by Beth for treating ChatGP... (read more)

This is exactly the kind of thing Egan is reacting to, though—starry-eyed sci-fi enthusiasts assuming LLMs are digital people because they talk, rather than thinking soberly about the technology qua technology.

I feel like this borders on the strawman. When discussing this argument my general position isn't "LLMs are people!". It's "Ok, let's say LLMs aren't people, which is also my gut feeling. Given that they still converse as or more intelligently as some human beings whom we totally acknowledge as people, where the fuck does that leave us as to our a... (read more)

2Martin Randall
This sounds like a testable prediction. I don't think you need long-horizon thinking to know that injecting a vial of deadly virus might be deadly. I would expect Claude to get this right, for example. I've not purchased the story, so maybe I'm missing some details. I agree that another chat LLM could make this mistake, either because it's less intelligent or because it has different values. But then the moral is to not make friends with Sherlock in particular.

Poor Ken. He's not even as smart as Sherlock. Its funny though, because whole classes of LLM jailbreaks involve getting them to pretend to be someone who would do the thing the LLM isn't supposed to do, and then the strength of the frame (sometimes) drags them past the standard injunctions. And that trick was applied to Ken.

Method acting! It is dangerous for those with limited memory registers!

I agree that LLMs are probably "relevantly upload-like in at least some ways" and I think that this was predictable, and I did, in fact, predict it, and I thought Op... (read more)

(I agree; my intent in participating in this tedious thread is merely to establish that "mathematician crankery [about] Google Image Search, and how it disproves AI" is a different thing from "made an overconfident negative prediction about AI capabilities".)

I think we probably don't disagree much; I regret any miscommunication.

If the intent of the great-grandparent was just to make the narrow point that an AI that wanted the user to reward it could choose to say things that would lead to it being rewarded, which is compatible with (indeed, predicts) answering the molecular smiley-face question correctly, then I agree.

Treating the screenshot as evidence in the way that TurnTrout is doing requires more assumptions about the properties of LLMs in particular. I read your claims regarding "the problem the AI is op... (read more)

2Noosphere89
I'll also say to the extent they are optimizing in a utility-maximizing sense, it's about predicting correctly about the whole world, not a reward function in the traditional sense (though they probably do have more learned utility functions/values as a part of that), so Paul Crowley is still wrong here.

he's calling it laughable that AI will ever (ever! Emphasis his!)

The 2016 passage you quoted is calling it laughable that Google-in-particular's technology (marketed as "AI", but Egan doesn't think the term is warranted) will ever be able to make sense of information on the web. It's Gary Marcus–like skepticism about the reliability and generality of existing-paradigm machine learning techniques, not Hubert Dreyfus–like skepticism of whether a machine could think in all philosophical strictness. I think this is a really important distinction that the text of your comment and Gwern's comment ("disproves AI", "laughable that AI will ever") aren't being clear about.

2habryka
Well, to be clear, that too has been solidly falsified. Gemini seems plenty capable of making sense of information on the web.

This isn't a productive response to TurnTrout in particular, who has written extensively about his reasons for being skeptical that contemporary AI training setups produce reward optimizers (which doesn't mean he's necessarily right, but the parent comment isn't moving the debate forward).

I'm not quite seeing how this negates my point, help me out?

  • Eliezer sometimes spoke of AIs as if they had "reward channel"
  • But they don't, instead they are something a bit like "adaption executors, not fitness maximizers"
  • This is potentially an interesting misprediction!
  • Eliezer also said that if you give the AI the goal of maximizing smiley faces, it will make tiny molecular ones
  • TurnTrout points out that if you ask an LLM if that would be a good thing to do, it says no
  • My point is that this is exactly what Eliezer would have predicted for an LLM whose reward
... (read more)

his page on Google Image Search, and how it disproves AI

The page in question is complaining about Google search's "knowledge panel" showing inaccurate information when you search for his name, which is a reasonable thing for someone to be annoyed about. The anti-singularitarian snark does seem misplaced (Google's automated systems getting this wrong in 2016 doesn't seem like a lot of evidence about future AI development trajectories), but it's not a claim to have "disproven AI".

his complaints about people linking the wrong URLs due to his ISP host - b

... (read more)

which is a reasonable thing for someone to be annoyed about.

No, not really. The most fundamental problem is not the stupid claims about what Google will 'ever' be capable of (although that aspect is highly notable, in case you needed any evidence how Egan blew it on DL, that he wrote this ~2012 and kept doubling down on it), but rather his stubborn insistence on misunderstanding what a search engine does. That is why I am citing it: as an example of his mathematician-like crankery (ie. cognitive rigidity and fanaticism and perfectionism) - he has an ide... (read more)

7Ninety-Three
He didn't use the word "disprove", but when he's calling it laughable that AI will ever (ever! Emphasis his!) be able to merely "make sense of his information on the web", I think gwern's gloss is closer to accurate than yours. It's 2024 and Google is already using AI to make sense of information on the web, this isn't just "anti-singularitarian snark".
7Said Achmiz
I find such excuses to be unconvincing pretty much 100% of the time. Almost everyone who “has better things to do than [whatever]” is in that situation because their time is very valuable, and their time is very valuable because they make, and thus have, a lot of money. (Like, say, a successful fiction author.) In which case, they can pay someone to solve the problem for them. (Heck, I don’t doubt that Egan could even find people to help him fix this for free!) If someone has a problem like this, but neither takes the time to fix it himself, nor pays (or asks) someone to fix it for him, what this means isn’t that he’s too busy, but rather that he doesn’t care. And that’s fine. He’s got the right to not care about this. But then nobody else has the slightest shred of obligation to care about it, either. Not lifting a finger to fix this problem, but expecting other people to spend their time and mental effort (even if it’s only a little of both) to compensate for the problem, is certainly not laudable behavior.

end with general position "akshually, grandiose sci-fi assumptions are not that important, what I want is to write commentary on contemporary society" [...] hard or speculative sci-fi is considered to be low status, while "commentary on contemporary society" is high status and writers want to be high status.

But this clearly isn't true of Egan. The particular story reviewed in this post happens to be commentary on contemporary Society, but that's because Egan has range—his later novels are all wildly speculative. (The trend probably reached a zenith with... (read more)

Though — I haven't read all of his recent novels, but I think — none of those are (for lack of a better word) transhumanist like Permutation City or Diaspora, or even Schild's Ladder or Incandescence. Concretely: no uploads, no immortality, no artificial minds, no interstellar civilization. I feel like this fits the pattern, even though the wildness of the physics doesn't. (And each of those four earlier novels seems successively less about the implications of uploading/immortality/etc.)

Doomimir and Simplicia dialogues [...] may have been inspired by the chaotic discussion this post inspired.

(Yes, encouraged by the positive reception to my comment to Bensinger on this post.)

A mathematical construct that models human natural language could be said to express "agency" in a functional sense insofar as it can perform reasoning about goals, and "honesty" insofar as the language it emits accurately reflects the information encoded in its weights?

3Daniel Tan
I agree that from a functional perspective, we can interact with an LLM in the same way as we would another human. At the same time I’m pretty sure we used to have good reasons for maintaining a conceptual distinction. One potential issue is that when the language shifts to implicitly frame the LLM as a person, that subtly shifts the default perception on a ton of other linked issues. Eg the “LLM is a human” frame raises the questions of “do models deserve rights”. But idunno, it’s possible that there’s some philosophical argument by which it makes sense to think of LLMs as human once they pass the turing test. Also, there’s undoubtedly something lost when we try to be very precise. Having to dress discourse in qualifications makes the point more obscure, which doesn’t help when you want to leave a clear take home message. Framing the LLM as a human is a neat shorthand that preserves most of the xrisk-relevant meaning. I guess I’m just wondering if alignment research has resorted to anthropomorphization because of some well considered reason I was unaware of, or simply because it’s more direct and therefore makes points more bluntly (“this LLM could kill you” vs “this LLM could simulate a very evil person who would kill you”).

(Self-review.) I claim that this post is significant for articulating a solution to the mystery of disagreement (why people seem to believe different things, in flagrant violation of Aumann's agreement theorem): much of the mystery dissolves if a lot of apparent "disagreements" are actually disguised conflicts. The basic idea isn't particularly original, but I'm proud of the synthesis and writeup. Arguing that the distinction between deception and bias is less decision-relevant than commonly believed seems like an improvement over hang-wringing over where the boundary is.

4Noosphere89
I think that a lot of disagreements are truly hidden conflicts, but I also do think that a non-trivial portion of disagreements come down to not having common priors, which is necessary for the theorem to work for disagreements.

Some have delusional optimism about [...]

I'm usually not a fan of tone-policing, but in this case, I feel motivated to argue that this is more effective if you drop the word "delusional." The rhetorical function of saying "this demo is targeted at them, not you" is to reassure the optimist that pessimists are committed to honestly making their case point by point, rather than relying on social proof and intimidation tactics to push a predetermined "AI == doom" conclusion. That's less credible if you imply that you have warrant to dismiss all claims of t... (read more)

6Steven Byrnes
Hmm, I wasn’t thinking about that because that sentence was nominally in someone else’s voice. But you’re right. I reworded, thanks.

I don't think Vance is e/acc. He has said positive things about open source, but consider that the context was specifically about censorship and political bias in contemporary LLMs (bolding mine):

There are undoubtedly risks related to AI. One of the biggest:

A partisan group of crazy people use AI to infect every part of the information economy with left wing bias. Gemini can't produce accurate history. ChatGPT promotes genocidal concepts.

The solution is open source

If Vinod really believes AI is as dangerous as a nuclear weapon, why does ChatGPT have such

... (read more)

The next major update can be Claude 4.0 (and Gemini 2.0) and after that we all agree to use actual normal version numbering rather than dating?

Date-based versions aren't the most popular, but it's not an unheard of thing that Anthropic just made up: see CalVer, as contrasted to SemVer. (For things that change frequently in small ways, it's convenient to just slap the date on it rather than having to soul-search about whether to increment the second or the third number.)

'You acted unwisely,' I cried, 'as you see
By the outcome.' He calmly eyed me:
'When choosing the course of my action,' said he,
'I had not the outcome to guide me.'

Ambrose Bierce

The claim is pretty clearly intended to be about relative material, not absolute number of pawns: in the end position of the second game, you have three pawns left and Stockfish has two; we usually don't describe this as Stockfish having given up six pawns. (But I agree that it's easier to obtain resources from an adversary that values them differently, like if Stockfish is trying to win and you're trying to capture pawns.)

This is a difficult topic (in more ways than one). I'll try to do a better job of addressing it in a future post.

5Wei Dai
To clarify, I don't actually want you to scare people this way, because I don't know if people can psychologically handle it or if it's worth the emotional cost. I only bring it up myself to counteract people saying things like "AIs will care a little about humans and therefore keep them alive" or when discussing technical solutions/ideas, etc.

Was my "An important caveat" parenthetical paragraph sufficient, or do you think I should have made it scarier?

Should have made it much scarier. "Superhappies" caring about humans "not in the specific way that the humans wanted to be cared for" sounds better or at least no worse than death, whereas I'm concerned about s-risks, i.e., risks of worse than death scenarios.

Thanks, I had copied the spelling from part of the OP, which currently says "Arnalt" eight times and "Arnault" seven times. I've now edited my comment (except the verbatim blockquote).

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the... (read more)

Reply12932

My reply to Paul at the time:

If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?

From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of

... (read more)

I think you're overestimating the intended scope of this post. Eliezer's argument involves multiple claims - A, we'll create ASI; B, it won't terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific "B doesn't actually imply C" counterargument, so it's not even discussing "B isn't true in the first place" counterarguments.

5Amalthea
Bernard Arnault?

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

 

There's ... (read more)

Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!

Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purpose... (read more)

8avturchin
A correct question would be: Will Arnalt kill his mother for 77 USD, if he expect this to be known to other billionaires in the future?

An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won't give the money to you specifically, he'll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).

I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.

I agree that engaging more... (read more)

2tailcalled
Nate Soares engaged extensively with this in reasonable-seeming ways that I'd thus expect Eliezer Yudkowsky to mostly agree with. Mostly it seems like a disagreement where Paul Christiano doesn't really have a model of what realistically causes good outcomes and so he's really uncertain, whereas Soares has a proper model and so is less uncertain. But you can't really argue with someone whose main opinion is "I don't know", since "I don't know" is just garbage. He's gotta at least present some new powerful observable forces, or reject some of the forces presented, rather than postulating that maybe there's an unobserved kindness force that arbitrarily explains all the kindness that we see.

I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.

The other case is difference "cari... (read more)

  1. Arguments from moral realism, fully robust alignment, that ‘good enough’ alignment is good enough in practice, and related concepts.

What is moral realism doing in the same taxon with fully robust and good-enough alignment? (This seems like a huge, foundational worldview gap; people who think alignment is easy still buy the orthogonality thesis.)

  1. Arguments from good outcomes being so cheap the AIs will allow them.

If you're putting this below the Point of No Return, then I don't think you've understood the argument. The claim isn't that good outcom... (read more)

3Sammy Martin
Technically even Moral Realism doesn't imply Anti-Orthogonality thesis! Moral Realism is necessary but not sufficient for Anti-Orthogonality, you have to be a particular kind of very hardcore platonist moral realist who believes that 'to know the good is to do the good', to be Anti-Orthogonality, and argue that not only are there moral facts but that these facts are intrinsically motivating. Most moral realists would say that it's possible to know what's good but not act on it: even if this is an 'unreasonable' disposition in some sense, this 'unreasonableness' it's compatible with being extremely intelligent and powerful in practical terms. Even famous moral realists like Kant wouldn't deny the Orthogonality thesis: Kant would accept that it's possible to understand hypothetical but not categorical imperatives, and he'd distinguish capital-R Reason from simple means-end 'rationality'. I think from among moral realists, it's really only platonists and divine command theorists who'd deny Orthogonality itself.

in a world where the median person is John Wentworth [...] on Earth (as opposed to Wentworld)

Who? There's no reason to indulge this narcissistic "Things would be better in a world where people were more like meeeeeee, unlike stupid Earth [i.e., the actually existing world containing all actually existing humans]" meme when the comparison relevant to the post's thesis is just "a world in which humans have less need for dominance-status", which is conceptually simpler, because it doesn't drag in irrelevant questions of who this Swentworth person is and wh... (read more)

5johnswentworth
... is that why this post has had unusually many downvotes? Goddammit, I was just trying to convey how and why I found the question interesting and the phenomenon confusing. Heck, I'm not even necessarily claiming the Wentworld equilibrium would be better overall.

2019 was a more innocent time. I grieve what we've lost.

It's a fuzzy Sorites-like distinction, but I think I'm more sympathetic to trying to route around a particular interlocutor's biases in the context of a direct conversation with a particular person (like a comment or Tweet thread) than I am in writing directed "at the world" (like top-level posts), because the more something is directed "at the world", the more you should expect that many of your readers know things that you don't, such that the humility argument for honesty applies forcefully.

7Eli Tyre
FWIW, I have the opposite inclination. If I'm talking with a person one-on-one, we have high bandwidth. I will try to be skillful and compassionate in avoiding triggering them, while still saying what's true, and depending on the who I'm talking to, I may elect to remain silent about some of the things that I think are true.  But I overall am much more uncomfortable with anything less than straightforward statements of what I believe and why, in smaller-person contexts,  where there is the communication capacity to clarify misunderstandings, and where my declining to offer an objection to something that someone says more strongly implies agreement. This seems right to me.  But it also seem right to me that the broader your audience the lower their average level of epistemics and commitment to epistemic discourse norms. And your communication bandwidth is lower. Which means there is proportionally more risk of 1) people mishearing you and that damaging the prospects of the policies you want to advocate for (eg "marketing"), 2) people mishearing you, and that causing you personal problems of various stripes, and 3) people understanding you correctly, and causing you personal problems of various stripes.  [1] So the larger my audience the more reticent I might be about what I'm willing to say.   1. ^ There's obviously a fourth quadrant of that 2-by-2, "people hearing you correctly and that damaging the prospects of the policies you want to advocate for."  Acting to avoid that seems commons destroying, and personally out of integrity. If my policy proposals have true drawbacks, I want to clearly acknowledge them and state why I think they're worth it, not disemble about them.

Just because you don't notice when you're dreaming, doesn't mean that dream experiences could just as well be waking experiences. The map is not the territory; Mach's principle is about phenomena that can't be told apart, not just anything you happen not to notice the differences between.

When I was recovering from a psychotic break in 2013, I remember hearing the beeping of a crosswalk signal, and thinking that it sounded like some sort of medical monitor, and wondering briefly if I was actually on my deathbed in a hospital, interpreting the monitor sound ... (read more)

2Logan Zoellner
My argument is "an infinite universe where everything that is logically possible happens" is more parsimonious than "a universe where only 'normal' things happen"

(I'm interested (context), but I'll be mostly offline the 15th through 18th.)

Here's the comment I sent using the contact form on my representative's website.

Dear Assemblymember Grayson:

I am writing to urge you to consider voting Yes on SB 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act. How our civilization handles machine intelligence is of critical importance to the future of humanity (or lack thereof), and from what I've heard from sources I've trust, this bill seems like a good first step: experts such as Turing Award winners Yoshua Bengio and Stuart Russell support the bill (https://time.

... (read more)

This is awful. What do most of these items have to do with acquiring the map that reflects the territory? (I got 65, but that's because I've wasted my life in this lame cult. It's not cool or funny.)

7Gunnar_Zarncke
If you got 65, then the test seems to measure something real. It's not the test's fault that you think you wasted your time.  It would be interesting to have a different test that asks about your knowledge of the Sequences. Maybe you can create one?
5rsaarelm
It's testing for conformity to folk values, not mythic values.

On the one hand, I also wish Shulman would go into more detail on the "Supposing we've solved alignment and interpretability" part. (I still balk a bit at "in democracies" talk, but less so than I did a couple years ago.) On the other hand, I also wish you would go into more detail on the "Humans don't benefit even if you 'solve alignment'" part. Maybe there's a way to meet in the middle??

6Wei Dai
My own answer to this is that humans aren't secure, and AI will exacerbate the problem greatly by helping the offense (i.e. exploiting human vulnerabilities) a lot more than the defense. I've focused on philosophy (thinking that offense merely requires training AI to be persuasive, while defense seems to require solving metaphilosophy, i.e., understanding what correct philosophical reasoning consists of), but more recently realized that the attack surface is so much bigger than that. For example humans can fall in love (resulting in massive changes to one's utility function, if you were to model a human as a utility maximizer). It will be straightforward to use RL to train AI to cause humans to fall in love with them (or their characters), but how do you train AI to help defend against that? Would most humans even want to defend against that or care enough about it? So even with alignment, the default outcome seems to be a civilization with massively warped or corrupted human values. My inner Carl wants to reply that aligned and honest AI advisors will warn us of this and help us fix it before it's too late, maybe by convincing policymakers to pass regulations to prevent such "misuse" of AI? And then my reply to that would be that I don't see how such regulations can work, policymakers won't care enough, it seems easier to train AI to attack humans than to make such predictions in a way that is both honest/unbiased and persuasive to policymaker, AI might not have the necessary long-horizon causal understanding to craft good regulations before it's too late. Another possibility is that alignment tax is just too high, so competitive pressures erode alignment even if it's "solved" in some sense.

It seems pretty plausible to me that if AI is bad, then rationalism did a lot to educate and spur on AI development. Sorry folks.

What? This apology makes no sense. Of course rationalism is Lawful Neutral. The laws of cognition aren't, can't be, on anyone's side.

1ProgramCrafter
I disagree with "of course". The laws of cognition aren't on any side, but human rationalists presumably share (at least some) human values and intend to advance them; insofar they are more successful than non-rationalists this qualifies as Good.
1Nathan Young
So by my metric, Yudkowsky and Lintemandain's Dath Ilan isn't neutral, it's quite clearly lawful good, or attempting to be. And yet they care a lot about the laws of cognition. So it seems to me that the laws of cognition can (should?) drive towards flouishing rather than pure knowledge increase. There might be things that we wish we didn't know for a bit. And ways to increase our strength to heal rather than our strength to harm.  To me it seems a better rationality would be lawful good. 

The philosophical ideal can still exert normative force even if no humans are spherical Bayesian reasoners on a frictionless plane. The disjunction ("it must either the case that") is significant: it suggests that if you're considering lying to someone, you may want to clarify to yourself whether and to what extent that's because they're an enemy or because you don't respect them as an epistemic peer. Even if you end up choosing to lie, it's with a different rationale and mindset than someone who's never heard of the normative ideal and just thinks that white lies can be good sometimes.

2kave
Is it the case that if there are two identically-irrational / -boundedly-rational agents, then sharing information between them must have positive value?
5[anonymous]
Yes, this seems correct. With the added clarification that "respecting [someone] as an epistemic peer" is situational rather than a characteristic of the individual in question. It is not that there are people more epistemically advanced than me which I believe I should only ever tell the full truth to, and then people less epistemically advanced than me that I should lie to with absolute impunity whenever I start feeling like it. It depends on a particularized assessment of the moment at hand. I would suspect that most regular people who tell white lies (for pro-social reasons, at least in their minds) generally do so in cases where they (mostly implicitly and subconsciously) determine that the other person would not react well to the truth, even if they don't spell out the question in the terms you chose.

I definitely do not agree with the (implied) notion that it is only when dealing with enemies that knowingly saying things that are not true is the correct option

There's a philosophically deep rationale for this, though: to a rational agent, the value of information is nonnegative. (Knowing more shouldn't make your decisions worse.) It follows that if you're trying to misinform someone, it must either the case that you want them to make worse decisions (i.e., they're your enemy), or you think they aren't rational.

3[anonymous]
To clarify, I straightforwardly do not believe any human being I have ever come into contact with is rational enough for information-theoretic considerations like that to imply that something other than telling the truth will necessarily lead to them making worse decisions.

white lies or other good-faith actions

What do you think "good faith" means? I would say that white lies are a prototypical instance of bad faith, defined by Wikipedia as "entertaining or pretending to entertain one set of feelings while acting as if influenced by another."

Frustrating! What tactic could get Interlocutor un-stuck? Just asking them for falsifiable predictions probably won't work, but maybe proactively trying to pass their ITT and supplying what predictions you think their view might make would prompt them to correct you, à la Cunningham's Law?

How did you chemically lose your emotions?

Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team’s focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense.

I'm surprised! If MIRI is mostly a Pause advocacy org now, I can see why agent foundations research doesn't fit the new focus and should be restructured. But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn... (read more)

4Ebenezer Dukakis
I'm concerned there may be an alignment problem for superbabies. Humans often have contempt for people and animals with less intelligence than them. "You're dumb" is practically an all-purpose putdown. We seem to assign moral value to various species on the basis of intelligence rather than their capacity for joy/suffering. We put chimpanzees in zoos and chickens in factory farms. Additionally, jealousy/"xenophobia" towards superbabies from vanilla humans could lead them to become misanthropes. Everyone knows genetic enhancement is a radioactive topic. At what age will a child learn they were modified? It could easily be just as big of a shock as learning that you were adopted or conceived by a donor. Then stack more baggage on top: Will they be bullied for it? Will they experience discrimination? I feel like we're charging headlong into these sociopolitical implications, hollering "more intelligence is good!", the same way we charged headlong into the sociopolitical implications of the internet/social media in the 1990s and 2000s while hollering "more democracy is good!" There's a similar lack of effort to forecast the actual implications of the technology. I hope researchers are seeking genes for altruism and psychological resilience in addition to genes for intelligence.

But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn't you want to fiscally sponsor research on problems that you think need to be solved for the future of Earth-originating intelligent life to go well? 

MIRI still sponsors some alignment research, and I expect we'll sponsor more alignment research directions in the future. I'd say MIRI leadership didn't have enough aggregate hope in Agent Foundations in particular to want to keep supporting it ourselves (though I consider its existence net-positive).

My mo... (read more)

Load More