All of Joey KL's Comments + Replies

Joey KL10

You mean this substance? https://en.wikipedia.org/wiki/Mesembrine

Do you have a recommended brand, or places to read more about it?

2Self
Yes. The product I bought identifies itself as "Sceletium tortuosum". I've only tried 1 brand/product, and haven't seen any outstanding sources on it either, so I can't offer much guidance there.  I can anecdotally note that the effects seem quite strong for a legal substance at 0.5g, that it has short term effects + potentially also weaker long term effects (made me more relaxed? hard to say) (probs comparable to MDMA used in trauma therapy)
Joey KL32

I would love to hear the principal’s take on your conversation.

gwern111

I'm sure it would be less flattering to me than my version, because people never remember these sorts of conversations the same way. If you think that it might not have happened like that, then just treat it as a hypothetical discussion that could have happened and ponder how contemporary Western lower-education systems can make truly transformative, rather than minor tinkering around the edges, use of AGI which preserves all existing compensation/status/prestige/job/political arrangements and which the teachers' unions and pension plans would not be impla... (read more)

Joey KL30

Interesting, I can see why that would be a feature. I don't mind the taste at all actually. Before, I had some of their smaller citrus flavored kind, and they dissolved super quick and made me a little nauseous. I can see these ones being better in that respect. 

Joey KL10

I ordered some of the Life Extension lozenges you said you were using; they are very large and take a long time to dissolve. It's not super unpleasant or anything, I'm just wondering if you would count this against them?

4Drake Thomas
On my model of what's going on, you probably want the lozenges to spend a while dissolving, so that you have fairly continuous exposure of throat and nasal tissue to the zinc ions. I find that they taste bad and astringent if I actively suck on them but are pretty unobtrusive if they just gradually dissolve over an hour or two (sounds like you had a similar experience). I sometimes cut the lozenges in half and let each half dissolve so that they fit into my mouth more easily, you might want to give that a try?
Joey KL10

Thank you for your extended engagement on this! I understand your point of view much better now.

Joey KL30

Oh, I think I get what you’re asking now. Within-lifetime learning is a process that includes something like a training process for the brain, where we learn to do things that feel good (a kind of training reward). That’s what you’re asking about if I understand correctly?

I would say no, we aren’t schemers relative to this process, because we don’t gain power by succeeding at it. I agree this is subtle and confusing question, and I don’t know if Joe Carlsmith would agree, but the subtlety to me seems to belong more to the nuances of the situation & ana... (read more)

7Matthew Barnett
I think the question here is deeper than it appears, in a way that directly matters for AI risk. My argument here is not merely that there are subtleties or nuances in the definition of "schemer," but rather that the very core questions we care about—questions critical to understanding and mitigating AI risks—are being undermined by the use of vague and imprecise concepts. When key terms are not clearly and rigorously defined, they can introduce confusion and mislead discussions, especially when these terms carry significant implications for how we interpret and evaluate the risks posed by advanced AI. To illustrate, consider an AI system that occasionally says things it doesn't truly believe in order to obtain a reward, avoid punishment, or maintain access to some resource, in pursuit of a long-term goal that it cares about. For example, this AI might claim to support a particular objective or idea because it predicts that doing so will prevent it from being deactivated or penalized. It may also believe that expressing such a view will allow it to gain or retain some form of legitimate influence or operational capacity. Under a sufficiently strict interpretation of the term "schemer," this AI could be labeled as such, since it is engaging in what might be considered "training-gaming"—manipulating its behavior during training to achieve specific outcomes, including acquiring or maintaining power. Now, let’s extend this analysis to humans. Humans frequently engage in behavior that is functionally similar. For example, a person might profess agreement with a belief or idea that they don't sincerely hold in order to fit in with a social group, avoid conflict, or maintain their standing in a professional or social setting. In many cases, this is done not out of malice or manipulation but out of a recognition of social dynamics. The individual might believe that aligning with the group’s expectations, even insincerely, will lead to better outcomes than speaking their h
Joey KL20

If you're talking about this report, it looks to me like it does contain a clear definition of "schemer" in section 1.1.3, pg. 25: 

It’s easy to see why terminally valuing reward-on-the-episode would lead to training-gaming (since training-gaming just is: optimizing for reward-on-the-episode). But what about instrumental training-gaming? Why would reward-on-the-episode be a good instrumental goal?

In principle, this could happen in various ways. Maybe, for example, the AI wants the humans who designed it to get raises, and it knows that getting high rew

... (read more)
5Matthew Barnett
Let's consider the ordinary process of mental development, i.e., within-lifetime learning, to constitute the training process for humans. What fraction of humans are considered schemers under this definition? Is a "schemer" something you definitely are or aren't, or is it more of a continuum? Presumably it depends on the context, but if so, which contexts are relevant for determining if one is a schemer? I claim these questions cannot be answered using the definition you cited, unless given more precision about how we are drawing the line.
Joey KL32

I think this post would be a lot stronger with concrete examples of these terms being applied in problematic ways. A term being vague is only a problem if it creates some kind of miscommunication, confused conceptualization, or opportunity for strategic ambiguity. I'm willing to believe these terms could pose these problems in certain contexts, but this is hard to evaluate in the abstract without concrete cases where they posed a problem.

6Matthew Barnett
I think one example of vague language undermining clarity can be found in Joseph Carlsmith's report on AI scheming, which repeatedly uses the term "schemer" to refer to a type of AI that deceives others to seek power. While the report is both extensive and nuanced, and I am definitely not saying the whole report is bad, the document appears to lack a clear, explicit definition of what exactly constitutes a "schemer". For example, using only the language in his report, I cannot determine whether he would consider most human beings schemers, if we consider within-lifetime learning to constitute training. (Humans sometimes lie or deceive others to get control over resources, in ways both big and small. What fraction of them are schemers?) This lack of definition might not necessarily be an issue in some contexts, as certain words can function informally without requiring precise boundaries. However, in this specific report, the precise delineation of "schemer" is central to several key arguments. He presents specific claims regarding propositions related to AI schemers, such as the likelihood that stochastic gradient descent will find a schemer during training. Without a clear, concrete definition of the term "schemer," it is unclear to me what exactly these arguments are referring to, or what these credences are meant to represent.
Joey KL65

I'm not sure I can come up with a distinguishing principle here, but I feel like some but not all unpleasant emotions feel similar to physical pain, such that I would call them a kind of pain ("emotional pain"), and cringing at a bad joke can be painful in this way.

Huh! For me, physical and emotional pain are two super different clusters of qualia.

Joey KL10

More reasons: people wear sunglasses when they’re doing fun things outdoors like going to the beach or vacationing so it’s associated with that, and also sometimes just hiding part of a picture can cause your brain to fill it in with a more attractive completion than is likely.

Joey KL82

This probably does help capitalize AI companies a little bit, demand for call options will create demand for the underlying. This is probably a relatively small effect (?), but I'm not confident in my ability to estimate this at all.

2ESRogs
It doesn't differentially help capitalize them compared to everything else though, right? (Especially since some of them are private.)
Joey KL10

I'm confused about what you mean & how it relates to what I said.

Joey KL179

It's totally wrong that you can't argue against someone who says "I don't know", you argue against them by showing how your model fits the data and how any plausible competing model either doesn't fit or shares the salient features of yours. It's bizarre to describe "I don't know" as "garbage" in general, because it is the correct stance to take when neither your prior nor evidence sufficiently constrain the distribution of plausibilities. Paul obviously didn't posit an "unobserved kindness force" because he was specifically describing the observation that humans are kind. I think Paul and Nate had a very productive disagreement in that thread and this seems like a wildly reductive mischaracterization of it.

2tailcalled
But this assumes a model should aim to fit all data, which is a waste of effort.
Joey KL55

I don’t think this is accurate, I think most philosophy is done under motivated reasoning but is not straightforwardly about signaling group membership

Joey KL10

Hi, any updates on how this worked out? Considering trying this...

Joey KL20

This is the most interesting answer I've ever gotten to this line of questioning. I will think it over!

Joey KL*10

What observation could demonstrate that this code indeed corresponded to the metaphysical important sense of continuity across time? What would the difference be between a world where it did or it didn't?

6Ape in the coat
Good question. Consider a simple cloning experiment. You are put to sleep, a clone of you is created, after awakening you are not sure whether you are the original or the clone. Now consider this modification: after the original and the clone awakens they are told their code. And then each of them participate in the next iteration of the same experiment, untill there are 2^n people each of whom has participated in n iterations of experiment. Before the whole chain of the experiments starts you know your code and that there are 2^n possible paths that your subjective experience can go through this iterated cloning experiment. Only one of these path will be yours in the end. You go through all the chain of the experiments and turns out you have preserved your initial code. Now lets consider two hypothesises: 1) The code does not correspond to your continuity across time; 2) The code does correspond to your continuity across time. Under 1) you've experienced a rare event with probability 1/2^n. Under 2) it was the only possibility. Therefore you update in favor of 2). 
Joey KL10

Say there is a soul. We inspect a teleportation process, and we find that, just like your body and brain, the soul disappears on the transmitter pad, and an identical soul appears on the receiver. What would this tell you that you don't already know?

What, in principle, could demonstrate that two souls are in fact the same soul across time?

2Ape in the coat
By "soul" here I mean a carrier for identity across time. A unique verification code of some sort. So that after we conduct a cloning experiment we can check and see that one person has the same code, while the other has a new code. Likewise, after the teleportation we can check and see whether the teleported person has the same code as before.  It really doesn't seem like our universe works like that, but knowing this doesn't help much to understand how exactly our reality is working.
Joey KL32

It is epistemic relativism.

Question 1 and 3 are explicitly about values, so I don't think they do amount to epistemic relativism.

There seems to be a genuine question about what happens and which rules govern it, and you are trying to sidestep it by saying "whatever happens - happens".

I can imagine a universe with such rules that teleportation kills a person and a universe in which it doesn't. I'd like to know how does our universe work.

There seems to be a genuine question here, but it is not at all clear that there actually is one. It is pretty hard to cha... (read more)

1Ape in the coat
They are formulated as such but the crux is not about values. People tend to agree that one should care about the successor of your subjective experience. The question is whether there will be one or not.And this is the question of fact. Not really? We can easily do so if there exist some kind of "soul". I can conceptualize a world where a soul always stays tied to the initial body, and as soon as its destroyed, its destroyed as well. Or where it always goes to a new one if there is such opportunity, or where it chooses between the two based one some hidden variable so for us it appears to be at random.
Joey KL10

You may find it helpful to read the relevant sections of The Conscious Mind by David Chalmers, the original thorough examination of his view:

Those considerations aside, the main way in which conceivability arguments can go wrong is by subtle conceptual confusion: if we are insufficiently reflective we can overlook an incoherence in a purported possibility, by taking a conceived-of situation and misdescribing it. For example, one might think that one can conceive of a situation in which Fermat's last theorem is false, by imagining a situation in which leadi

... (read more)
Joey KL1-1

Iterated Amplification is a fairly specific proposal for indefinitely scalable oversight, which doesn't involve any human in the loop (if you start with a weak aligned AI). Recursive Reward Modeling is imagining (as I understand it) a human assisted by AIs to continuously do reward modeling; DeepMind's original post about it lists "Iterated Amplification" as a separate research direction. 

"Scalable Oversight", as I understand it, refers to the research problem of how to provide a training signal to improve highly capable models. It's the problem which... (read more)

I am very surprised that "Iterated Amplification" appears nowhere on this list. Am I missing something?

1technicalities
It's under "IDA". It's not the name people use much anymore (see scalable oversight and recursive reward modelling and critiques) but I'll expand the acronym.
Joey KL166

More generally, I think that if mere-humans met very-alien minds with similarly-coherent preferences, and if the humans had the opportunity to magically fulfill certain alien preferences within some resource-budget, my guess is that the humans would have a pretty hard time offering power and wisdom in the right ways such that this overall went well for the aliens by their own lights (as extrapolated at the beginning), at least without some sort of volition-extrapolation.

Isn't the worst case scenario just leaving the aliens alone? If I'm worried I'm going t... (read more)

Joey KL*10

I feel like you say this because you expect your values-upon-reflection to be good by light of your present values--in which case, you're not so much valuing reflection, as just enacting your current values. 

If Omega told me if I reflected enough, I'd realize what I truly wanted was to club baby seals all day, I would take action to avoid ever reflecting that deeply!

It's not so much that I want to lock in my present values as it is I don't want to lock in my reflective values. They seem equally arbitrary to me.

This kind of thing seems totally backwards to me. In what sense do I lose if I "bulldoze my values"? It only makes sense to describe me as "having values" insofar as I don't do things like bulldoze them! It seems like a way to pretend existential choices don't exist--just assume you have a deep true utility function, and then do whatever maximizes it.

Why should I care about "teasing out" my deep values? I place no value on my unknown, latent values at present, and I see no reason to think I should!

7David Udell
You might not value growing and considering your values, but that would make you pretty unusual, I think. Most people think that they have a pretty-good-but-revisable understanding of what it is that they care about; e.g., people come to value life relatively more than they did before after witnessing the death of a family member for the first time. Most people seem open to having their minds changed about what is valuable in the world. If you're not like that, and you're confident about it, then by all means lock in your values now against your future self. I don't do that because I think I might be wrong about myself, and think I'll settle into a better understanding of myself after putting more work into it.