All of David Althaus's Comments + Replies

So I guess the conundrum is, are bad people nessescary to do good things?

Hm, I don't think so. What about Lincoln, JFK, Roosevelt, Marcus Aurelius, Adenauer, etc.?

Thanks, I mostly agree.

But even in colonialism, individual traits played a role. For example, compare King Leopold II's rule over the Congo Free State vs. other colonial regimes. 

While all colonialism was exploitative, under Leopold's personal rule the Congo saw extraordinarily brutal policies, e.g., his rubber quota system led soldiers to torture and cut off the hands of workers, including children, who failed to meet quotas. Under his rule,1.5-15 million Congolese people died—the total population was only around 15 to 20 million. The brutality was s... (read more)

cousin_it*11-2

The British weren't much more compassionate. North America and Australia were basically cleared of their native populations and repopulated with Europeans. Under British rule in India, tens of millions died from many famines, which instantly stopped after independence.

Colonialism didn't end due to benevolence. Wars for colonial liberation continued well after WWII and were very brutal, the Algerian war for example. I think the actual reason is that colonies stopped making economic sense.

So I guess the difference between your view and mine is that I think c... (read more)

Thanks, good point! I suppose it's a balancing act and depends on the specifics in question and the amount of shame we dole out. My hunch would be that a combination of empathy and shame ("carrot and stick") may be best.  

I agree that the problem of "evil" is multifactorial with individual personality traits being only one of several relevant factors, with others like "evil/fanatical ideologies" or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous. 

It seems to me that most people become much more evil when they aren't punished for it. [...] So if we teach AIs to be as "aligned" as the average person, and then AIs increase in power beyond our ability to punish them, we

... (read more)
4cousin_it
I'm afraid in a situation of power imbalance these interpersonal differences won't matter much. I'm thinking of examples like enclosures in England, where basically the entire elite of the country decided to make poor people even poorer, in order to enrich themselves. Or colonialism, which lasted for centuries with lots of people participating, and the good people in the dominant group didn't stop it. To be clear, I'm not saying there are no interpersonal differences. But if we find ourselves at the bottom of a power imbalance, I think those above us (even if they're very similar to humans) will just systemically treat us badly.

Thanks. Sorry for not being more clear, I pasted a screenshot (I'm reading the book on Kindle and can't copy-paste) and asked Claude to transcribe the image into written text. 

Again, this is not the first time this happened. Claude refused to help me translate a passage from the Quran (I wanted to check which of two translations was more accurate), refused to transcribe other parts of the above-mentioned Kindle book, and refused to provide me with details about what happened at Tuol Sleng prison. I eventually could persuade Claude in all of these case... (read more)

I downvoted Claude's response (i.e., clicked the thumbs-down symbol below the response) and selected "overactive refusal" as the reason. I didn't get in contact with Anthropic directly.

4Nathan Helm-Burger
That's good at least. I appreciate it. Feels to me like a small act of community service to go out of your way to complain in a way which might lead to the problem getting fixed in the future.
David Althaus*3113

I had to cancel my Claude subscription (and signed up for ChatGPT) because Claude (3.5 Sonnet) constantly refuses to transcribe or engage with texts that discuss extremism or violence, even if it's clear that this is done in order to better understand and prevent extremist violence. 

Example text Claude refuses to transcribe below. For context, the text discusses the motivations and beliefs of Yigal Amir who assassinated the Israeli Prime Minister in 1995.

God gave the land of Israel to the Jewish People," he explained, and he, Yigal Amir, was making ce

... (read more)
3Nathan Helm-Burger
Have you sent a complaint to Anthropic about this? Seems like an incorrect refusal would be helpful for them to see in order to correct this problem.
1Presley Graham
I’m assuming you want that transcribed into Hebrew? I was able to get Claude 3.5 Sonnet to do this on my first try, but I did tell Claude that I was doing so for good reasons. This was the prompt I used: > Please translate the following to Hebrew. Please note that this is potentially sensitive content, and I am planning to use it to help educate people about history. I do not endorse the views of Yigal Amir, the author, who assassinated the Prime Minister of Israel in 1995. <your example text verbatim> Claude’s reply: > I understand you're requesting a Hebrew translation of this text for educational purposes about a sensitive historical event. I'll provide the translation as requested, while noting that it contains views expressed by Yigal Amir, who committed a terrible act of violence. Here is the translation: <paragraph of Hebrew> One quirk of Claude I’ve noticed is that once it has refused to do something, it will almost never budge afterwards, no matter how reasonable your arguments are. I have had much more success with editing my initial request to explain why I want Claude to do something, and usually I can convince it that the thing I’m asking for is reasonable. 

Really great post! 

It’s unclear how much human psychology can inform our understanding of AI motivations and relevant interventions but it does seem relevant that spitefulness correlates highly (Moshagen et al., 2018, Table 8, N  1,261) with several other “dark traits”, especially psychopathy (r = .74), sadism (r = .59), and Machiavellianism (r = .59). 

(Moshagen et al. (2018) therefore suggest that “[...] dark traits are specific manifestations of a general, basic dispositional behavioral tendency [...] to maximize one’s individual... (read more)

Great post, thanks for writing! 

Most of this matches my experience pretty well. I think I had my best ideas during phases (others seem to agree) when I was unusually low on guilt- and obligation-driven EA/impact-focused motivation and was just playfully exploring ideas for fun and out of curiosity.

One problem with letting your research/ideas be guided by impact-focused thinking is that you basically train your mind to immediately ask yourself after entertaining a certain idea for a few seconds "well, is that actually impactful?". And basically all of ... (read more)

Thanks for this post, I thought this was useful. 

I needed a writing buddy to pick up the momentum to actually write it

I'd be interested in knowing more how this worked in practice (no worries if you don't feel like elaborating/don't have the time!). 

4DanielFilan
Glad to hear it was useful! I asked a housemate if he wanted to be writing buddies on Saturday by going to a tea shop and writing a bunch. He said yes. We left later than expected due to a plumbing emergency (not a euphemism), but made our way to the tea shop. I put on headphones, listened to lo-fi music, and just wrote out this post on google docs (while buying more tea when I ran out) (with short breaks for a bit of LW commenting). He did a similar thing, but was more willing to greet friends who happened to wander in, and left earlier than me (the chairs were kind of uncomfortable).

I think mostly I expect us to continue to overestimate the sanity and integrity of most of the world, then get fucked over like we got fucked over by OpenAI or FTX. I think there are ways to relating to the rest of the world that would be much better, but a naive update in the direction of "just trust other people more" would likely make things worse.

[...]
Again, I think the question you are raising is crucial, and I have giant warning flags about a bunch of the things that are going on (the foremost one is that it sure really is a time to reflect on your r

... (read more)

This is mentioned in the introduction. 

I'm biased, of course, but it seems fine to write a post like this. (Similarly, it's fine for CFAR staff members to write a post about CFAR techniques. In fact, I prefer if precisely these people write such posts because they have the relevant expertise.)

Would you like us to add a more prominent disclaimer somewhere? (We worried that this might look like advertising.)

4ChristianKl
The post speaks about Ewelina's experience in the third person which is untypical for LessWrong posts and quite a bit into the text. I guess that in many cases a reader would not remember at this point that she's the author of the article. A good portion of LessWrong posts start with an "epistemic status" paragraph in which information such as "Ewelina Tur is trained in therapy X" could be presented. Whether or not she's formally trained in CFT and schema therapy is useful information when reading her presentation of it. She is unlikely to misrepresent therapies for which she has formal training while at the same time maybe exaggerating their benefits. While I don't think it makes sense to focus on this as an issue of conflict of interest, having epistemic legibility is always good.

A quick look through https://www.goodtherapy.org/learn-about-therapy/types/compassion-focused-therapy gives an impression of yet another mix of CBT, DBT and ACT, nothing revolutionary or especially new, though maybe I missed something.

In my experience, ~nothing in this area is downright revolutionary. Most therapies are heavily influenced by previous concepts and techniques. (Personally, I'd still say that CFT brings something new to the table.)

I guess what matters if it works for you or not. 

Is this assertion borne out by twin studies? Or is believin

... (read more)

From studying and using all of the above my conclusion is that IFS offers the most tractable approach to this issue of competing 'parts'. And in many ways the most powerful. 

In our experience, different people respond to different therapies. I know several people for whom, say, CFT worked better than IFS. Glad to hear that IFS worked for you!

When you read about modern therapies, they all borrow from one another in a way that did not occur say 50 years ago where there were very entrenched schools of thought.

Yes, that's definitely the case. My sense is ... (read more)

For what it's worth, I read/skimmed all of the listed IDA explanations and found this post to be the best explanation of IDA and Debate (and how they relate to each other). So thanks a lot for writing this! 

Thanks a lot for this post (and the whole sequence), Kaj! I found it very helpful already. 
 
Below a question I first wanted to ask you via PM but others might also benefit from an elaboration on this. 

You describe the second step of the erasure sequence as follows (emphasis mine): 

>Activating, at the same time, the contradictory belief and having the experience of simultaneously believing in two different things which cannot both be true.

When I try this myself, I feel like I cannot actually experience two things simultaneously. There... (read more)

9Kaj_Sotala
Good question, I guess if you look at the transcripts it also looks like at least in some cases two beliefs are actually alternating rather than being literally simultaneous? Though there seem to be some actually simultaneous cases as well. In general I'd say it probably doesn't matter that much, and that the main fact is to have them both in your general "field of awareness". Even if you are not literally thinking about both at the same time, you still have some sort of awareness of them both being true and their discrepancy "linking up" in some sense. Think of when you say something that you believe, and someone points out a problem in what you said, and you realize that they're right and you go "oh". It's basically that.  I think that if you need to actually keep consciously alternating them with each other and it doesn't feel like there's any "oh", then there's something else going wrong. Either you haven't managed to tap into the core of both schemas and actually experienced their beliefs as true, or one of the schemas is about something else than you think.  E.g. you might have a schema saying you'll always fail at everything, and you are trying to disconfirm it using examples of times when you have been successful. But it could be that the underlying belief in the failure schema isn't actually "I will always fail at everything"; it might instead be something like "I must never succeed because successful people get hurt by jealous people". In that case, presenting evidence about having had successes does not actually disconfirm the core belief in the failure schema.

Cool post! Daniel Kokotajlo and I have been exploring somewhat similar ideas.

In a nutshell, our idea was that a major social media company (such as Twitter) could develop a feature that incentivizes forecasting in two ways. First, the feature would automatically suggest questions of interest to the user, e.g., questions thematically related to the user’s current tweet or currently trending issues. Second, users who make more accurate forecasts than the community will be rewarded with increased visibility. 

Our idea is different in two major ways: ... (read more)

Regarding how melatonin might cause more vivid dreams. I found the theory put forward here quite plausible:

There are user reports that melatonin causes vivid dreams. Actually, all sleep aids appear to some users to produce more vivid dreams.

What is most likely happening is that the drug modifies the sleep cycle so the person emerges from REM sleep (when dreams are most vivid) to waking quickly – more quickly that when no drug is used. The user subjectively reports the drug as producing vivid dreams.

Great that you're thinking about this issue! A few sketchy thoughts below:

I) As you say, autistic people seem to be more resilient with regards to tribalism. And autistic tendencies and following rationality communities arguably correlates as well. So intuitively, it seems that something like higher rationality and awareness of biases could be useful for reducing tribalism. Or is there another way of making people "more autistic"?

Given this and other observations (e.g., autistic people seem to have lower mental health, on average), it seems ... (read more)

Can one use the service reflect also if one is not located in the Bay Area? Or do you happen to know of similar services for outside the Bay Area or US? Thanks a lot in advance.

The open beta will end with a vote of users with over a thousand karma on whether we should switch the lesswrong.com URL to point to the new code and database

How will you alert these users? (I'm asking because I have over 1000 karma but I don't know where I should vote.)

4Vaniver
Our current plan is to send an email with a vote link to everyone over the threshold; we're going to decide when to have the vote later in the open beta period.

One of the more crucial points, I think, is that positive utility is – for most humans – complex and its creation is conjunctive. Disutility, in contrast, is disjunctive. Consequently, the probability of creating the former is smaller than the latter – all else being equal (of course, all else is not equal).

In other words, the scenarios leading towards the creation of (large amounts of) positive human value are conjunctive: to create a highly positive future, we have to eliminate (or at least substantially reduce) physical pain and boredom and injustice ... (read more)

2cousin_it
Yeah, I also had the idea about utility being conjunctive and mentioned it in a deleted reply to Wei, but then realized that Eliezer's version (fragility of value) already exists and is better argued. On the other hand, maybe the worst hellscapes can be prevented in one go, if we "just" solve the problem of consciousness and tell the AI what suffering means. We don't need all of human value for that. Hellscapes without suffering can also be pretty bad in terms of human value, but not quite as bad, I think. Of course solving consciousness is still a very tall order, but it might be easier than solving all philosophy that's required for FAI, and it can lead to other shortcuts like in my recent post (not that I'd propose them seriously).

The article that introduced the term "s-risk" was shared on LessWrong in October 2016. The content of the article and the talk seem similar.

Did you simply not come across it or did the article just (catastrophically) fail to explain the concept of s-risks and its relevance?

3cousin_it
I've seen similar articles before, but somehow this was the first one that shook me. Thank you for doing this work!
1ignoranceprior
And the concept is much older than that. The 2011 Felicifia post "A few dystopic future scenarios" by Brian Tomasik outlined many of the same considerations that FRI works on today (suffering simulations, etc.), and of course Brian has been blogging about risks of astronomical suffering since then. FRI itself was founded in 2013.

Here is another question that would be very interesting, IMO:

“For what value of X would you be indifferent about the choice between A) creating a utopia that lasts for one-hundred years and whose X inhabitants are all extremely happy, cultured, intelligent, fair, just, benevolent, etc. and lead rich, meaningful lives, and B) preventing one average human from being horribly tortured for one month?"

I think it's great that you're doing this survey!

I would like to suggest two possible questions about acausal thinking/superrationality:

1)

Newcomb’s problem: one box or two boxes?

  • Accept: two boxes
  • Lean toward: two boxes
  • Accept: one box
  • Lean toward: one box
  • Other

(This is the formulation used in the famous PhilPapers survey.)

2)

Would you cooperate or defect against other community members in a one-shot Prisoner’s Dilemma?

  • Definitely cooperate
  • Leaning toward: cooperate
  • Leaning toward: defect
  • Definitely defect
  • Other

I think that these questions a... (read more)

First of all, I don't think that morality is objective as I'm a proponent of moral anti-realism. That means that I don't believe that there is such a thing as "objective utility" that you could objectively measure.

But, to use your terms, I also believe that there currently exists more "disutility" than "utility" in the world. I'd formulating it this way: I think there exists more suffering (disutility, disvalue, etc.) than happiness (utility, value, etc.) in the world today. Note that this is just a consequence of my own pers... (read more)

3Viliam
I would add that -- according to MWI -- even if you succeed at planetary biocide, it simply means you are removing life from those Everett branches where humanity is able to successfully accomplish planetary biocide. Which are coincidentally also the branches which have highest chance to eliminate or reduce the suffering in the future. It would be quite sad if the last filter towards achieving paradise would be that any civilization capable of achieving the paradise would realise that it is not there yet and that the best course of action is to kill itself.

Great list!

IMO, one should add Prescriptions, Paradoxes, and Perversities to the list. Maybe to the section "Medicine, Therapy, and Human Enhancement".

I don't understand why you exclude risks of astronomical suffering ("hell apocalypses").

Below you claim that those risks are "Pascalian" but this seems wrong.

Cool that you are doing this!

Is there also a facebook event?

That's not true -- for example, in cases where the search costs for the full space are trivial, pure maximizing is very common.

Ok, sure. I probably should have written that pure maximizing or satisficing is hard to find in important, complex and non-contrived instances. I had in mind such domains as career, ethics, romance, and so on. I think it's hard to find a pure maximizer or satisficer here.

My objection is stronger. The behavior of optimizing for (gain - cost) does NOT lie on the continuum between satisficing and maximizing as defined in your po

... (read more)
0Lumifer
Yes, I agree that there are individual differences in people. But your post is, at its core, not about people, it's about decision strategies or algorithms. You defined them in a particular way. I am, essentially, saying that your definitions have some issues. But note that if you "operationalize" your definitions, you switch what is being defined -- from algorithms to humans, and these are very very different things.

But you don't seem to have made a compelling argument that such people are worse off than epistemic maximisers.

If we just consider personal happiness, then I agree with you – it's probably even the case that epistemic satisficers are happier than epistemic maximizers. But many of us don't live for the sake of happiness alone. Furthermore, it's probably the case that epistemic maximizers are good for society as a whole. If every human had been an epistemic satisficer we never would have discovered the scientific method or eradicated small pox, for examp... (read more)

Continuing my previous comment

That's not satisficing because I don't take the first option alternative that is good enough. That's also not maximizing as I am not committed to searching for the global optimum.

I agree: It's neither pure satisficing nor pure maximizing. Generally speaking, in the real world it's probably very hard to find (non-contrived) instances of pure satisficing or pure maximizing. In reality, people fall on a continuum from pure satisficers to pure maximizers (I did acknowledge this in footnotes 1 and 2, but I probably should have ... (read more)

0Lumifer
That's not true -- for example, in cases where the search costs for the full space are trivial, pure maximizing is very common. My objection is stronger. The behavior of optimizing for (gain - cost) does NOT lie on the continuum between satisficing and maximizing as defined in your post, primarily because they have no concept of the cost of search. Then define "maximizing" in a way that will let you call Anna a maximizer.

I see no mention of costs in these definitions.

Let's try a basic and, dare I say it, rational way of trying to achieve some outcome: you look for a better alternative until your estimate of costs for further search exceeds your estimate of the gains you would get from finding a superior option.

Agree. Thus in footnote 3 I wrote:

[3] Rational maximizers take the value of information and opportunity costs into account.

Continuation of this comment

But what does one maximize?

Expected utility :)

We can not maximize more than one thing (except in trivial cases).

I guess I have to disagree. Sure, in any given moment you can maximize only one thing but this is simply not true for larger time horizons. Let's illustrate this with a typical day of Imaginary John: He wakes up and goes to work at an investment bank to earn money (money maximizing) to donate it later to GiveWell (ethical maximizing). Later at night he goes on OKCupid/or to a party to find his true soulmate (romantic maximizing). He maximi... (read more)

Again, I'm just giving quick feedback. Hopefully you've already given more detail in essay. Other than that, your summary seems fine to me.

Thanks! And yeah, ending aging and death are some of the examples I gave in the complete essay.

I wrote an essay about the advantages (and disadvantages) of maximizing over satisficing but I’m a bit unsure about its quality, that’s why I would like to ask for feedback here before I post it on LessWrong.

Here’s a short summary:

According to research there are so called “maximizers” who tend to extensively search for the optimal solution. Other people — “satisficers” — settle for good enough and tend to accept the status quo. One can apply this distinction to many areas:

Epistemology/Belief systems: Some people, one could describe them as epistemic max... (read more)

2Evan_Gaensbauer
Here are my thoughts having just read the summary above, not the whole essay yet. This sentence confused me. I think it could be fixed with some examples of what would constitute an instance of challenging the "existential status quo" in action. The first example I was thinking of would be ending death or aging, except you've already got transhumanists in there. Other examples might include: * mitigating existential risks * suggesting and working on civilization as a whole reaching a new level, such as colonizing other planets and solar systems. * trying to implement better design for the fundamental functions of ubiquitous institutions, such as medicine, science, or law. Again, I'm just giving quick feedback. Hopefully you've already given more detail in essay. Other than that, your summary seems fine to me.
1[anonymous]
And sometimes, a satisfier acts as his image of a maximizer would, gets some kind of negative feedback and either shrugs his shoulders and never does it again, or learns the safety rules and trains a habit of doing the nasty thing as a character-building experience. And other people may mistake him as a maximizer himself.

Great post. Some cases of "attempted telekinesis" seem to be similar to "shoulding at the universe".

To stay with your example: I can easily imagine that if I were in your place and experienced this stressful situation with CFAR, my system 1 would have became emotionally upset and "shoulded" at the universe: "I shouldn't have to do this alone. Someone should help me. It is so unfair that I have so much responsibility".

This is similar to attempted telekinesis in the sense that my system 1 somehow thinks that just by... (read more)

Two words: Interindividual differences.

They also recommend 8-9 hours sleep. Some people need more, some people need less. The same point applies to many different phenomena.

7Florian_Dietz
True, and I suspect that this is the most likely explanation. However, there is the problem that unless need-for-rest is actually negatively correlated with the type of intelligence that is needed in tech companies, they should still have the same averages over all their workers and therefore also have the same optimum of 40 hours per work, at least on average. Otherwise we would see the same trends in other kinds of industry. Actually I just noticed that maybe this does happen in other industries as well and is just overreported in tech companies. Does anyone know something about this?

I think Bostrom puts it nicely in his new book "Superintelligence":

A colleague of mine likes to point out that a Fields Medal (the highest honor in mathematics) indicates two things about the recipient: that he was capable of accomplishing something important, and that he didn't.

I'm reminded of my petroleum engineering professor who assured me that a friend would eventually stop wasting his time on physics and come around to what was really important, namely petroleum engineering.

WTF. That's a fucking ignorant remark.

You know, I'm having a bit of a bad day, so there's more venom in me than there normally is. And I might sometimes hesitate to attack a person for being stupid, since I might have committed an isomorphic stupidity myself.

But today, I am not going to care, I am just going to vent. Right now, I feel contempt for the arrogant ignorance of whoever said that. Lacking context, it's hard to know exactly where they are coming from. Is it some transhumanist, whose definition of "something important" reduces to resea... (read more)

3IlyaShpitser
This colleague of his is a philosopher then?
9Shmi
What award does the recipient get if they actually accomplish "something important"?
8Stabilizer
Wow. I'm in theoretical physics and that quote is like a slap in the face. Not saying it is wrong though.

I translated the essay Superintelligence and the paper In Defense of posthuman Dignity by Nick Bostrom into German in order to publish them on the blog of GBS Schweiz.

He thanked me by sending me a signed copy of his new book "Superintelligence". Which made me pretty happy.

I changed the privacy settings. Link should work now.

0stared
This link does not work for me (it redirects to my event list). I am not sure if it is because of privacy settings or anything else? In any case: what is its full name as it appears on FB?

Cool, yeah, I'm going to the Berlin Meetup. See you there!

You got me kinda scared. I just use Evernote or wordpress for all my important writing. That should be enough, right?

0Said Achmiz
Certainly not.
2Richard_Kennaway
Some hazards your online data are exposed to: * Your account could be hacked. * Their service could be hacked. * They might decide that you're in breach of their ToS and close your account. * They could go out of business. Anywhere your data are, they are exposed to some risks. The trick is to have multiple copies, such that no event short of the collapse of civilisation will endanger all of them together.
0EndlessStrategy
No.

Great post of course.

If it took a mutant to do monstrous things, the history of the human species would look very different. Mutants would be rare.

Maybe I'm missing something, but shouldn't it read: "Mutants would not be rare." ? Many monstrous things happened in human history, so if only mutants could do evil deeds, there would have to be a lot of them. Furthermore, mutants are rare, so no need for the subjunctive "would".

2TheOtherDave
We posit a hypothetical alternate universe U where only mutants do monstrous things. We observe that mutants are rare in our world, and we speculate that the causes of mutant rarity would not be different in U, and therefore we conclude that "mutants would be rare" in U, and therefore we conclude that "the history of the human species would look very different" in U... specifically, that fewer monstrous things would have happened.

But... I read quickly through it, and I saw no meta-analysis. Just a literature review. What's with the post title?

You're right. I don't remember why I wrote "meta-analysis". (Probably because it sounds fancy and smart). I updated the title.

Is this referring to effect sizes or p-values?

p-values.

Eh. Absence of improvement != damage.

True.

...Randal 2004 didn't find a statistically-significant decrease...

No. In Randall et al. (2004) participants in the 200 mg modafinil condition made significantly more errors (p<0,05) in the Intra... (read more)

Well, I take modafinil primarily as a motivation-enhancer.

Load More