LESSWRONG
LW

All of Shiroe's Comments + Replies

The existence of God and Free Will feel like religious problems that philosophers took interest in, and good riddance to them.

Whether the experience of suffering/pain is fictional or not is a hot topic in some circles, but both sides are quite insistent about being good church-going "materialists" (whatever that means).

As for "knowledge", I agree that question falls apart into a million little subproblems. But it took the work of analytic philosophers to pull it apart, and after much labor. You're currently reaping the rewards of that work and the simplicity of hindsight.

Settled questions in philosophy

Shiroe6mo10

Do you know if Plato was claiming Euclidean geometry was physically true in that sense? Doesn't sound like something he would say.

the case for CoT unfaithfulness is overstated

Shiroe6mo20

I'd like to see how this would compare to a human organization. Suppose individual workers or individual worker-interactions are all highly faithful in a tech company. Naturally, though, the entire tech company will begin exhibiting misalignment, tend towards blind profit seeking, etc. Despite the faithfulness of its individual parts.

Is that the kind of situation you're thinking of here? Is that why having mind-reading equipment that forced all the workers to dump their inner monologue wouldn't actually be of much use towards aligning the overall system, because the real problem is something like the aggregate or "emergent" behavior of the system, rather than the faithfulness of the individual parts?

2tailcalled6mo

My threat model is entirely different: Even if human organizations are misaligned today, the human organizations rely primarily on human work, and so they pass tons of resources and discretion on to humans, and try to ensure that the humans are vigorous. Meanwhile, under @mattmacdermott 's alignment proposal, one would get rid of the humans and pass tons of resources and discretion on to LLMs. Whether one values the LLMs is up to you, but if human judgement and resource ownership is removed, obviously that means humans lose control, unless one can change organizations to give control to non-employees instead of employees (and it's questionable how meaningful that is).

the case for CoT unfaithfulness is overstated

Shiroe6mo10

What do you mean by "over the world"? Are you including human coordination problems in this?

2tailcalled6mo

CoT is generally done for one level of data processing, e.g. responding to a user in a chat, or performing one step of a task for an autonomous LLM agent. However usually when AIs are deployed, they are asked to do many levels of data processing, e.g. respond to many users or perform many steps sequentially. It doesn't matter if the chains of thoughts are highly faithful for pretty much all of these levels individually, what matters is if they are faithful in aggregate, i.e. for the few most important levels of data processing as well as in how the overall wave of smaller interactions add up.

Increasing IQ is trivial

Shiroe7mo30

Did you end up writing the list of interventions? I'd like to try some of them. (I also don't want to commit to doing 3 hours a day for two weeks until I know what the interventions are.)

Nick Bostrom’s new book, “Deep Utopia”, is out today

Shiroe1y32

It's very surprising to me that he would think there's a real chance of all humans collectively deciding to not build AGI, and successfully enforcing the ban indefinitely.

Do not delete your misaligned AGI.

Shiroe1y10

Patternism is usually defined as a belief about the metaphysics of consciousness, but that boils down to incoherence, so it's better defined as a property of a utility function of agents not minding being subjected to major discontinuities in functionality, ie, being frozen, deconstructed, reduced to a pattern of information, reconstructed in another time and place, and resumed.

That still sounds like a metaphysical belief, and less empirical since consciousness experience isn't involved in it (instead it sounds like it's just about personal identity).

If you weren't such an idiot...

Shiroe1y20

Any suggestions for password management?

2Sinclair Chen1y

Whatever you use, remember to backup your vault regularly. A cautionary tale: I lost access to my bitwarden vault containing a private key to a few thousand $ worth of crypto, after changing my master password to something that I was then not able to recall perfectly. And bitwarden's website / extension start to rate limit you client-side after failed attempts. So instead, after a lot of research I was able to find the bitwarden hashfile on my computer where chrome stores data for its extensions. I then downloaded hashcat and tried to do a dictionary attack and some other clever attacks that made use of what I thought my password was supposed to be, but to no success. Don't be me. Bitwarden lets you download your encrypted vault from the website or CLI. do that.

3kave1y

I really like 1Password, but my understanding is that Bitwarden has less frequent reported vulnerabilities

8JanGoergens1y

I recommend Keepass, but you might have different requirements. This video serves as a good comparison of your options and on this website you can find a list of recommendation for password managers and other privacy/security tools.

6hazel1y

I've been well served by Bitwarden: https://bitwarden.com/ It has a dark theme, apps for everything (including Linux commandline), the Firefox extension autofills with a keyboard shortcut, plus I don't remember any large data breaches.

Increasing IQ is trivial

Shiroe1y20

Because it's an individualized approach that is a WIP and if I just write it down 99% of people will execute it badly.

Why is that a problem? Do you mean this in the sense of "if I do this, it will lead to people making false claims that my experiment doesn't replicate" or "if I do this, nothing good will come of it so it's not even worth the effort of writing".

Elizabeth1y183

As someone who runs a lot of self-experiments and occasionally helps others, I'm disappointed in but sympathetic to this approach. People are complicated: the right thing to do probably is try a bunch of stuff and see what sticks. But people really, really want the answer to be simple, and will round down complicated answers until they are simple enough, then declare the original protocol a failure when their simplification doesn't work.

I think it would be valuable for George to write up the list of interventions they considered, and a case report o... (read more)

2George3d61y

I mean if I write this it will sound very weird and not be followable because it includes things like: I am trying to replicate this with more people right now so I'd rather not dilute the intervention specifically -- hence why this post was not about what I did as much as why one ought to expect increasing IQ, in general, works.

Increasing IQ is trivial

Shiroe1y30

I'm confused whether:

the point of this article is that the IQ tests are broken, because some trivial life improvements (like doing yoga and eating blueberries) will raise your IQ score or whether:
the point of this article is that you actually raised your "g" by doing trivial life improvements, and we should be excited by how easy it is to become more intelligent

Skimming it again I'm pretty sure you mean (2).

0George3d61y

Somewhere in between actually, I tried to do something like (2) but in part I'm sure it's (1) I avoided any conceptual/learning tasks and just did brain stimulation, non-stimulant drugs and various physical practices to avoid (1) as much as possible You can toally do n-back training or take IQ tests to increase your IQ, and it's pretty boring.

Shiroe1y20

If I understand right the last sentence should say "does not hold".

2FlorianH1y

Thanks, corrected

Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.

Shiroe1y10

It's not easy to see the argument for treating your vales as incomparable with the values of other people, but seeing your future self's values as identical to your own. Unless you've adopted some idea of a personal soul.

Deep atheism and AI risk

Shiroe1y71

The suffering and evil present in the world has no bearing on God's existence. I've always failed to buy into that idea. Sure, it sucks. But it has no bearing on the metaphysical reality of a God. If God does not save children--yikes I guess? What difference does it make? A creator as powerful as has been hypothesised can do whatever he wants; any arguments from rationalism be damned.

Of course, the existence of pointless suffering isn't an argument against the existence of a god. But it is an old argument against the existence of a god who deserves to b... (read more)

Terminology: <something>-ware for ML?

Answer by ShiroeJan 03, 202444

"tensorware" sprang to mind

2johnswentworth1y

This one independently sprang to mind for me too.

5. Moral Value for Sentient Animals? Alas, Not Yet

Shiroe1y10

Yeah, it's hard to say whether this would require restructuring the whole reward center in the brain or if the needed functionality is already there, but just needs to be configured with different "settings" to change the origin and truncate everything below zero.

My intuition is that evolution is blind to how our experiences feel in themselves. I think it's only the relative differences between experiences that matter for signaling in our reward center. This makes a lot of sense when thinking about color and "qualia inversion" thought experiments, but it's trickier with valence. My color vision could become inverted tomorrow, and it would hardly affect my daily routine. But not so if my valences were inverted.

5. Moral Value for Sentient Animals? Alas, Not Yet

Shiroe1y10

What about our pre-human ancestors? Is the twist that humans can't have negative valences either?

5. Moral Value for Sentient Animals? Alas, Not Yet

Shiroe1y42

I agreed up until the "euthanize everything that remains" part. If we actually get to the stage of having aligned ASI, there are probably other options with the same or better value. The "gradients of bliss" that I described in another comment may be one.

5. Moral Value for Sentient Animals? Alas, Not Yet

Shiroe1y51

Pearce has the idea of "gradients of bliss", which he uses to try to address the problem you raised about insensitivity to pain being hazardous. He thinks that even if all of the valences are positive, the animal can still be motivated to avoid danger if doing so yields an even greater positive valence than the alternatives. So the prey animals are happy to be eaten, but much more happy to run away.

To me, this seems possible in principle. When I feel happy, I'm still motivated at some low level to do things that will make me even happier, even though I was... (read more)

2RogerDearnaley1y

Whatever positive valence stopped when you were injured would need to be as extremely strong a motivator as pain is. So somewhere on the level of "I orgasm continuously unless I get hurt, then it stops!" That's just shifting the valence scale: I think by default it would fail due to hedonic adaptation — brains naturally reset their expectations. That's the same basic mechanism as opiate addition, and it's pretty innate to how the brain (or any complex set of biochemical pathways) works: they're full of long-term feedback loops evolved to try to keep them working even if one component is out-of-whack, say due to a genetic disease. This is related to a basic issue in the design of Utilitarian ethical systems. As is hopefully well-known, you need your AI to maximize the amount of positive utility (pleasure) not minimize the amount of negative utility (pain), otherwise it will just euthanize everyone before they can next stub their toe. (Obviously getting this wrong is an x-risk, as with pretty-much everything in basic ethical system design,) So you need to carefully set a suitable zero utility level, and that level needs to be low enough that you actually would want the AI to euthanize you if your future utility level for the rest of you life was going to be below that level. So that means the negative utility region is the sort of agonizing pain level where we put animals down, or allow people to sign paperwork for voluntary medical euthanasia. That's a pretty darned low valence level, well below what we start calling 'pain'. On a hospital numerical "how much pain are you in?" scale, it's probably somewhere around spending the rest of your life at an eight or worse: enough pain that you can't pay much attention to anything else ever. So my point is, if you just stubbed your toe and are in pain (say a six on the hospital pain scale), then by that offset scale of valence levels (which is what our AIs have to be using for utility in their ethical systems), your utility

5. Moral Value for Sentient Animals? Alas, Not Yet

Shiroe1y40

What are your thoughts on David Pearce's "abolitionist" project? He suggests genetically engineering wild animals to not experience negative valences, but still show the same outward behavior. From a sentientist stand-point, this solves the entire problem, without visibly changing anything.

2Shankar Sivarajan1y

Good news! That's already the way the world works.

7RogerDearnaley1y

I think it's basically impossible, using just genetic engineering. There are documented cases of humans born without the ability to feel pain, and they don't usually live long: they tend to die in stupid accidents, like jumping off a building, because they didn't learn the lesson as a kid that that hurts so you shouldn't do it. Or similarly in leprosy, where the ability to feel pain is lost due to bacterial damage to the nerves (typically as an adult once you have learnt not to do that sort of dumb stunt), the slow progressive disfiguring damage to hands and face from leprosy isn't directly bacterial, it's caused by the cumulative effect of a great many minor injuries that the patient doesn't notice in time, because they can't feel pain any more. So, producing the same behavior without negative valences would require a much larger, more detailed world model, able to correctly predict everything that would have hurt or been unpleasant and how the creature would have reacted to it, and then trigger that reaction. Even assuming you can somehow achieve that modelling task in a nervous system as a "philosophical zombie" involving no actual negative valences, just a prediction of their effects on an animal (it's very unclear to me how to even tell, I suspect "philosophical zombies" are a myth, and if they're not then they're a-priori indistinguishable), then we currently have no idea how to bioengineer something like that, and clearly the extra nervous tissue required to do all the extra processing would add a lot to physiological needs. The most plausible approach I can think of to achieve this would be some sort of nanotech cyborging where the extra processing was done in the cyborg parts, which would need to be much more compact and energy efficient than nervous tissue (i.e. roughly Borg-level technology). So it's an emotionally appealing idea, but I suspect actually even harder to implement than what I proposed. For largish animals, it might actually be technological

Here's the exit.

Shiroe1y43

Same. I feel somewhat jealous of people who can have a visceral in-body emotional reaction to X-risks. For most of my life I've been trying to convince my lizard brain to feel emotions that reflect my beliefs about the future, but it's never cooperated with me.

How to Control an LLM's Behavior (why my P(DOOM) went down)

Shiroe1y10

You can compress huge prompts into metatokens, too (just run inference with the prompt to generate the training data)

I'm very curious about this technique but couldn't find anything about it. Do you have any references I can read?

4porby1y

Alas, nope! To my knowledge it hasn't actually been tried at any notable scale; it's just one of those super simple things that would definitely work if you were willing to spend the compute to distill the behavior.

Justification for Induction

Shiroe1y10

I see. Yes, "philosophy" often refers to particular academic subcultures, with people who do their philosophy for a living as "philosophers" (Plato had a better name for these people). I misread your comment at first and thought it was the "philosopher" who was arguing for the instrumentalist view, since that seems like their more stereotypical way of thinking and deconstructing things (whereas the more grounded physicist would just say "yes, you moron, electrons exist. next question.").

2dr_s1y

From the discussion it seemed that most physicists do take the realist view on electrons, but in general the agreement was that either view works and there's not a lot to say about it past acknowledging what everyone's favorite interpretation is. A question that can have no definite answer isn't terribly interesting.

Justification for Induction

Shiroe1y10

Do you have any examples of the "certain philosophers" that you mentioned? I've often heard of such people described that way, but I can't think of anyone who's insulted scientists for assuming e.g. causality is real.

2dr_s1y

For example there's recently been a controversy adjacent to this topic on Twitter involving one Philip Goff (philosopher) who started feuding over it with Sabine Hossenfelder (physicist, albeit with some controversial opinions). Basically Hossenfelder took up an instrumentalist position of "I don't need to assume that things described in the models we use are real in whatever sense you care to give to the word, I only need to know that those models' predictions fit reality" and Goff took issue with how she was brushing away the ontological aspects. Several days of extremely silly arguments about whether electrons exist followed. To me Hossenfelder's position seemed entirely reasonable, and yes, a philosophical one, but she never claimed otherwise. But Goff and other philosophers' position seemed to be "the scientists are ignorant of philosophy of science, if only they knew more about it, they would be far less certain about their intuitions on this stuff!" and I can't understand how they can be so confident about that or in what way would that precisely impact the scientists' actual work. Whether electrons "exist" in some sense or they are just a convenient mathematical fiction doesn't really matter a lot to a physicist's work (after all, electrons are nothing but quantized fluctuations of a quantum field, just like phonons are quantized fluctuations of an elastic deformation field; yet people probably feel intuitively that electrons "exist" a lot more than phonons, despite them being essentially the same sort of mathematical object. So maybe our intuitions about existence are just crude and don't well describe the stuff that goes on at the very edge of matter).

Justification for Induction

Shiroe1y20

On the contrary, it is my intention to illustrate that assertions of instances that have not been experienced (with respect to their assertion at t₁) can be justified in the future in which they are observed (with respect to their observation at t₂).

Sorry, I may not be following this right. I had thought the point of the skeptical argument was that you can't justify a prediction about the future until it happens. Induction is about predicting things that haven't happened yet. You don't seem to be denying the skeptical argument here, if we still need to wait for the prediction to resolve before it can be justified.

2Krantz1y

This is a good question. I agree that you can't justify a prediction until it happens, but I'm urging us to consider what it actually means for a prediction to happen. This can become nuanced when you consider predictions that are statements which require multiple observations to be justified. If I predict that a box (that we all know contains 10 swans) contains 10 white swans (My prediction is 'There are ten white swans in this box.'). When does that prediction actually 'happen'? When does it become 'justified'? I think we all agree that after we've witnessed the 10th white swan, my assertion is justified. But am I justified at all to believe I am more likely to be correct after I've only witnessed 8 or 9 white swans? This is controversial.

What's the evidence that LLMs will scale up efficiently beyond GPT4? i.e. couldn't GPT5, etc., be very inefficient?

Shiroe1y41

I've also noticed that scaffolded LLM agents seem inherently safer. In particular, deceptive alignment would be hard for one such agent to achieve, if at every thought-step it has to reformulate its complete mind state into the English language just in order to think at all.

You might be interested in some work done by the ARC Evals team, who prioritize this type of agent for capability testing.