jbkjr

4mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

jbkjr4moQuick Take

I often like to have Claude summarize longer LessWrong posts for me if I'm unsure whether I want to commit to reading the entire thing. Lately, however, I've noticed that probably 75+% of the time, it fails to fetch the page because of rate limits. Maybe LW would just be overloaded by fetches from AIs, so it must limit them? Is there any solution to this on my end besides e.g. saving the page as a PDF and uploading it manually?

If this post wasn’t written by 4o, I’ll eat my hat.

Replying toLiterature Review: Risks of MDMA

jbkjr7mo

Literature Review: Risks of MDMA

Did the studies specify the dosage, frequency, and duration of use in the long-term users they studied? I would not be surprised if they show that taking MDMA e.g. once a week for months on end caused significant damage, but I would be much more surprised if there were significant long-term deleterious effects from reasonable doses spaced months or years apart.

Replying toSuffering Is Not Pain

jbkjr8mo

Suffering Is Not Pain

I think I was remembering Ingram sharing this same story in a different context (maybe a talk he gave or a group discussion), but the context here is interesting; thanks for sharing!

I'm happy to offer my take on what he's saying here, but I will also note that I'm slightly more uncertain about what Ingram's views/claims are after reading this.

First, I notice that the context for the quote is him critiquing the traditional account of the four-path model for implying arhats must have attained some kind of emotional perfection. (This is what he's talking about when he says "a tradition whose models of awakening contain some of the worst myths.")

In terms of... (read 365 more words →)

Replying toNo-self as an alignment target

jbkjr9mo

No-self as an alignment target

Anatta is not something to be achieved; it’s a characteristic of phenomena that needs to be recognized if one has not yet. Certainly agree that AI systems should learn/be trained to recognize this, but it’s not something you “ensure LLMs instantiate.” What you want to instantiate is a system that recognizes anatta.

-6

Replying toIs the mind a program?

jbkjr1y

Is the mind a program?

I assume that phenomenal consciousness is a sub-component of the mind.

I'm not sure what is meant by this; would you mind explaining?

Also, the in-post link to the appendix is broken; it's currently linking to a private draft.

jbkjr1y

It sounds to me like a problem of not reasoning according to Occam's razor and "overfitting" a model to the available data.

Ceteris paribus, H' isn't more "fishy" than any other hypothesis, but H' is a significantly more complex hypothesis than H or ¬H: instead of asserting H or ¬H, it asserts (A=>H) & (B=>¬H), so it should have been commensurately de-weighted in the prior distribution according to its complexity. The fact that Alice's study supports H and Bob's contradicts it does, in fact, increase the weight given to H' in the posterior relative to its weight in the prior; it's just that H' is prima facie less likely, according to Occam.

Given all... (read more)

jbkjr2y

Why should I include any non-sentient systems in my moral circle? I haven't seen a case for that before.

Replying toIndecision and internalized authority figures

jbkjr2y

Indecision and internalized authority figures

To me, "indecision results from sub-agential disagreement" seems almost tautological, at least within the context of multi-agent models of mind, since if all the sub-agents were in agreement, there wouldn't be any indecision. So, the question I have is: how often are disagreeing sub-agents "internalized authority figures"? I think I agree with you in that the general answer is "relatively often," although I expect a fair amount of variance between individuals.

Replying toSuffering Is Not Pain

jbkjr2y

Suffering Is Not Pain

I'd guess it's a problem of translation; I'm pretty confident the original text in Pali would just say "dukkha" there.

The Wikipedia entry for dukkha says it's commonly translated as "pain," but I'm very sure the referent of dukkha in experience is not pain, even if it's mistranslated as such, however commonly.

Suffering Is Not Pain

jbkjr

“Pain is inevitable; suffering is optional.”

The motivation of this post is to address the persistent conflation between suffering and pain I have observed from members of the EA community, even amongst those who purport to be “suffering-focused” in their ethical motivations. In order to best address the problem of suffering, it is necessary to be clear about the difference between suffering and mere pain or ordinary displeasure.

The parable of the second arrow

In the Buddhist parable of the second arrow, the Buddha illustrates the distinction between suffering and pain with the tale of a man struck by two arrows. The first arrow represents the pain that life inevitably brings. The second arrow, however,... (read 1461 more words →)

Question about Lewis' counterfactual theory of causation

jbkjr

jbkjr, Cleo Nardo

In reading the SEP entry on counterfactual theories of causation, I had the following question occur, and I haven't been able to satisfactorily resolve it for myself.

An event e is said to causally depend on an event c if and only if e would occur if c were to occur and e would not occur if c were not to occur.

The article makes a point of articulating that causal dependence entails causation (if e causally depends on c, c is a cause of e) but not vice versa. It then defines a causal chain as a fine sequence of events c, d, e,... where d causally depends on c, e on d,... (read more)

Why I stopped working on AI safety

jbkjr

Here’s a description of a future which I understand Rationalists and Effective Altruists in general would endorse as an (if not the) ideal outcome of the labors of humanity: no suffering, minimal pain/displeasure, maximal ‘happiness’ (preferably for an astronomical number of intelligent, sentient minds/beings). (Because we obviously want the best future experiences possible, for ourselves and future beings.)

Here’s a thought experiment. If you (anyone - everyone, really) could definitely stop suffering now (if not this second then reasonably soon, say within ~5-10 years) by some means, is there any valid reason for not doing so and continuing to suffer? Is there any reason for continuing to do anything else other than stop suffering (besides... (read 1069 more words →)

-5

Integrating Three Models of (Human) Cognition

jbkjr

You may have heard a few things about “predictive processing” or “the global neuronal workspace,” and you may have read some of Steve Byrnes’ excellent posts about what’s going on computationally in the human brain. But how does it all fit together? How can we begin to arrive at a unified computational picture of human intelligence, and how can it inform our efforts in AI safety and alignment? In this post, I endeavor to take steps towards these ends by integrating insights from three computational frameworks for modeling what’s going on in the human brain, with the hope of laying an adequate foundation for more precisely talking about how the brain cognitively... (read 9340 more words →)

Grokking the Intentional Stance

jbkjr

Considering how much I’ve been using “the intentional stance" in my thinking about the nature of agency and goals and discussions of the matter recently, I figured it would be a good idea to, y’know, actually read what Dan Dennett originally wrote about it. While doing so, I realized that he was already considering some nuances in the subject that the Wikipedia summary of the intentional stance leaves out but that are nonetheless relevant to the issues we face when attempting to e.g. formalize the approach, or think more clearly about the nature of agency in the context of alignment. I don’t expect many LessWrongers will read the original book in full,... (read 5771 more words →)

Discussion: Objective Robustness and Inner Alignment Terminology

jbkjr

jbkjr, Lauro Langosco

In the alignment community, there seem to be two main ways to frame and define objective robustness and inner alignment. They are quite similar, mainly differing in the manner in which they focus on the same basic underlying problem. We’ll call these the objective-focused approach and the generalization-focused approach. We don’t delve into these issues of framing the problem in Empirical Observations of Objective Robustness Failures, where we present empirical observations of objective robustness failures. Instead, we think it is worth having a separate discussion of the matter. These issues have been mentioned only infrequently in a few comments on the Alignment Forum, so it seemed worthwhile to write a post describing... (read 2650 more words →)

Empirical Observations of Objective Robustness Failures

jbkjr

jbkjr, Lauro Langosco

Inner alignment and objective robustness have been frequently discussed in the alignment community since the publication of “Risks from Learned Optimization” (RFLO). These concepts identify a problem beyond outer alignment/reward specification: even if the reward or objective function is perfectly specified, there is a risk of a model pursuing a different objective than the one it was trained on when deployed out-of-distribution (OOD). They also point to a different type of robustness problem than the kind usually discussed in the OOD robustness literature; typically, when a model is deployed OOD, it either performs well or simply fails to take useful actions (a capability robustness failure). However, there exists an alternative OOD failure... (read 2409 more words →)

Old post/writing on optimization daemons?

jbkjr

I'm having trouble locating an old post about optimization daemons that I know I read at one point. I believe it was written by Eliezer. It featured a lot of visual imagery of the daemon steering the path of the model through parameter-space by e.g. forcing it through narrow paths towards low loss surrounded by walls of high loss. Does anyone know which post I'm remembering?

Mapping the Conceptual Territory in AI Existential Safety and Alignment

jbkjr

(Crossposted from my blog)

Throughout my studies in alignment and AI-related existential risks, I’ve found it helpful to build a mental map of the field and how its various questions and considerations interrelate, so that when I read a new paper, a post on the Alignment Forum, or similar material, I have some idea of how it might contribute to the overall goal of making our deployment of AI technology go as well as possible for humanity. I’m writing this post to communicate what I’ve learned through this process, in order to help others trying to build their own mental maps and provide them with links to relevant resources for further, more detailed... (read 7920 more words →)

LESSWRONG
LW

LESSWRONG
LW

Discussion: Objective Robustness and Inner Alignment Terminology

Empirical Observations of Objective Robustness Failures

Grokking the Intentional Stance

Integrating Three Models of (Human) Cognition

jbkjr

jbkjr

jbkjr's Shortform

Suffering Is Not Pain

Question about Lewis' counterfactual theory of causation

Why I stopped working on AI safety

Integrating Three Models of (Human) Cognition

Grokking the Intentional Stance

Discussion: Objective Robustness and Inner Alignment Terminology

jbkjr

Discussion: Objective Robustness and Inner Alignment Terminology

Empirical Observations of Objective Robustness Failures

Grokking the Intentional Stance

Integrating Three Models of (Human) Cognition

jbkjr

jbkjr

jbkjr's Shortform

Suffering Is Not Pain

Question about Lewis' counterfactual theory of causation

Why I stopped working on AI safety

Integrating Three Models of (Human) Cognition

Grokking the Intentional Stance

Discussion: Objective Robustness and Inner Alignment Terminology

The parable of the second arrow