Russelian panpsychism doesn't postulate a new force - physics already accepts casual role of existence: only existing neurons can fire.
And it explains epistemic link - it's cogito ergo sum - you're always right, when you think that universe exists.
And rock's perception belongs to a rock.
Would anyone describe it as theirs? That access is reflective. It’s pretty difficult to retrieve data in a format you didn’t store it in.
But what if there is no access or self-description or retrieval? You just appear fully formed, stare at a wall for a couple of years and then disappear. Are you saying that describing your experience makes them retroactively conscious?
Even if I’m not thinking about myself consciously [ i.e., my self is not reflecting on itself ], I have some very basic perception of the wall as being perceived by me, a perceiver—some perception of the wall as existing in reference to me.
Is it you inspecting your experience or you making an inference from the "consciousness is self-awareness" theory? Because it doesn't feel reflective to me? I think I just have a perception of a wall without anything being about me. It seems to be implementable by just forward pass streamed into short-term memory or s...
The thing I don't understand about claimed connection between self-model and phenomenal consciousness is that I don't see much evidence for the necessity of self-model for conscious perception's implementation - when I just stare at a white wall without internal dialog or other thoughts, what part of my experience is not implementable without self-model?
"Death is fine if AI doesn't have self-preservation goal" or "suffering is bad" are also just human ethical assumptions.
You are talking about experience of certainty. I'm asking why do you trust it?
I know it's beyond doubt because I am currently experiencing something at this exact moment.
That's a description of a system, where your experience directly hijacks your feeling of certainty. You wouldn't say that "I know it's beyond doubt there is a blue sky, because blue light hits my eyes at this exact moment" is a valid justification for absolute certainty. Even if you feel certain about some part of reality, you can contemplate being wrong, right? Why don't say "I'm feeling ...
How do you know it's beyond doubt? Why is your experience of blue sky is not guaranteed to be right about the sky, but your experience of certainty of experience is always magically right?
What specifically is beyond doubt, if seeing-neurons of your brain are in the state of seeing red, but you are thinking and saying that you see blue?
If a doctor asks a patient whether he is in pain, and the patient says yes, the doctor may question whether the patient is honest. But he doesn’t entertain the hypothesis that the patient is honest but mistaken.
Nothing in this situation uses certain self-knowledge of moment of experience. Patient can't communicate it - communication takes time, so it can be spoofed. More importantly, if patient's knowledge of pain is wrong in the same sense it can be wrong later (that patient says and thinks that they are not in pain, but they actually are and so have p...
You've seen 15648917, but later you think it was 15643917. You're wrong, because actually the state of your neurons was of (what you are usually describe as) seeing 15648917. If in the moment of seeing 15648917 (in the moment, when your seeing-neurons are in the state of seeing 15648917) you are thinking that you see 15643917 (meaning your thinking-neurons are in the state of thinking that you see 15643917 ), then you are wrong in same way you may be wrong later. It works the same way the knowledge about everything works.
You can define "being in the state ...
it’s the only thing I can know for certain
You can't be certain in any specific quale: you can misremember what you were seeing, so there is external truth-condition (something like "these neurons did such and such things"), so it is possible in principle to decouple your thoughts of certainty from what actually happened with your experience. So illusionism is at least right that your knowledge of your qualia is imperfect and uncertain.
Even if it’s incomplete in that way, it doesn’t have metaphysical implications.
Therefore Mary's incomplete knowledge about consciousness doesn't have metaphysical implications, because it is incomplete in fundamentally same way.
Mary doesn’t know what colour qualms look.like, and therefore has an incomplete understanding of consciousness.
Mary doesn't know how to ride, and therefore has incomplete understanding of riding. What's the difference?
Both need instantiation for what?
For gaining potential utility from specific knowledge representations, f...
Bikes aren’t appearances , so there is no analogy.
The analogy is that they both need instantiation. That's the thing about appearances that is used in the argument.
Know-how, such as riding kills, is not an appearance, or physical.knowledge.
So physicalism is false, because physical knowledge is incomplete without know-how.
Nonetheless , there is a difference.
Sure, they are different physical processes. But what's the relevant epistemological difference? If you agree that Mary is useless we can discuss whether there are ontological differences.
...Ri
What it looks like is the representation! A different representation just isn’t a quale. #FF0000 just isnt a red quale!
But reading a book on riding a bike isn’t knowing how to tide a bike...you get the knowledge from mounting a bike and trying!
The knowledge of representation is the whole thing! Qualia are appearances!
If you want to define things that way, ok. So Mary's room implies that bikes are as unphysical as qualia.
It bypasses what you are calling representation … you have admitted that.
Mary also doesn't have all representations for all p...
If Mary looks at these equations ,in her monochrome room, does she go into the brain state that instantiates seeing something red?
No.
Does she somehow finds out what red looks like without that?
Yes.
What does that mean? Are you saying Mary already knew what red looks like, and instantiating the brain state adds no new knowledge?
She already knew what red looks like, the knowledge just was in a different representation. Just like with knowing how to ride a bike. "no new", like everything here, depends on definitions. But she definitely undergoes phy...
If you don’t think there is an HP, because of Mary’s Room, why do you think there is an HP?
Because of the Zombies Argument. "What part of physical equations says our world is not a zombie-world?" is a valid question. The answer to "What part of physical equations says what red looks like?" is just "the part that describes brain".
It’s supposed to indicate that there is a hard problem , ie. that that even a super scientist cannot come up with a reductive+predictive theory of qualia.
It doesn't indicate it independently of other assumptions. Mary's situ...
First, you can still infer meta-representation from your behavior. Second, why does it matter that you represent aversiveness, what's the difference? Representation of aversiveness and representation of damage are both just some states of neurons that model some other neurons (representation of damage still implies possibility of modeling neurons, not only external state, because your neurons are connected to other neurons).
I understand that, but I'm still asking why subliminal stimuli are not morally relevant for you? They may still create disposition to act in aversive way, so there is still mechanism in some part the brain/neural network that causes this behaviour and has access to the stimulus - what's the morally significant difference between a stimulus being in some neurons and being in others, such that you call only one location "awareness"?
Why does it matter that Gilbert infers something from the behavior of his neural network and not from the behavior of his body? Both are just subjective models of reality. Why does it matter whether he knows something about his pain? Why it doesn't count, if Gilbert avoids pain defined as the state of neural network that causes him to avoid it, even when he doesn't know something about it? Maybe you can model it as Gilbert himself not feeling pain, but why the neural network is not a moral patient?
The reference classes you should use work as a heuristic because there is some underlying mechanism that makes them work. So you should use reference classes in situations where their underlying mechanism is expected to work.
Maybe the underlying mechanism of doomsday predictions not working is that people predicting doom don't make their predictions based on valid reasoning. So if someone uses that reference class to doubt AI risk, this should be judged as them making a claim about reasoning of people predicting AI doom being similar to people in cults predicting Armageddon.
The fact that these physicalists feel it would be in some way necessary to instantiate colour, but not other things, like photosynthesis or fusion, means they subscribe to the idea that there is something epistemically unique about qualia/experience, even if they resist the idea that qualia are metaphysically unique.
No, it means they subscribe to the idea that there is something ethically different about qualia/experience. It's not unique, it's like riding a bike. Human sometimes call physical interactions, utility of which is not obtainable by just thi...
Endurist thinking treats reproduction as always acceptable or even virtuous, regardless of circumstances. The potential for suffering rarely factors into this calculation—new life is seen as inherently good.
Not necessary - you can treat creating new people differently from already existing and avoid creating bad (in Endurist sense - not enough positive experiences, regardless of suffering) lives without accepting death for existing people. I, for example, don't get why would you bring more death to the world by creating low-lifespan people, if you don't like death.
clearly the system is a lot less contextual than base models, and it seems like you are predicting a reversal of that trend?
The trend may be bounded, the trend may not go far by the time AI can invent nanotechnology - would be great if someone actually measured such things.
And there being a trend at all is not predicted by utility-maximization frame, right?
People are confused about the basics because the basics are insufficiently justified.
It is learning helpfulness now, while the best way to hit the specified ‘helpful’ target is to do straightforward things in straightforward ways that directly get you to that target. Doing the kinds of shenanigans or other more complex strategies won’t work.
Best by what metric? And I don't think it was shown, that complex strategies won't work - learning to change behaviour from training to deployment is not even that complex.
But it is important, and this post just isn’t going to get done any other way.
Speaking about streetlighting...
What makes it rational is that there is an actual underlying hypothesis about how weather works, instead of vague "LLMs are a lot like human uploads". And weather prediction outputs numbers connected to reality we actually care about. And there is no alternative credible hypothesis that implies weather prediction not working.
I don't want to totally dismiss empirical extrapolations, but given the stakes, I would personally prefer for all sides to actually state their model of reality and how they think evidence changed it's plausibility, as formally as possible.
There is no such disagreement, you just can't test all inputs. And without knowledge of how internals work, you may me wrong about extrapolating alignment to future systems.
Yes, except I would object to phrasing this anthropic stuff as "we should expect ourselves to be agents that exist in a universe that abstracts well" instead of "we should value universe that abstracts well (or other universes that contain many instances of us)" - there is no coherence theorems that force summation of your copies, right? And so it becomes apparent that we can value some other thing.
Also even if you consider some memories a part of your identity, you can value yourself slightly less after forgetting them, instead of only having threshold for death.
It doesn't matter whether you call your multiplier "probability" or "value" if it results in your decision to not care about low-measure branch. The only difference is that probability is supposed to be about knowledge, and Wallace's argument involving arbitrary assumption, not only physics, means it's not probability, but value - there is no reason to value knowledge of your low-measure instances less.
this makes decision theory and probably consequentialist ethics impossible in your framework
It doesn't? Nothing stops you from making decisions in a wor...
Things like lions, and chairs are other examples.
And counted branches.
This is how Wallace defines it (he in turn defines macroscopically indistinguishable in terms of providing the same rewards). It’s his term in the axiomatic system he uses to get decision theory to work. There’s not much to argue about here?
His definition leads to contradiction with informal intuition that motivates consideration of macroscopical indistinguishability in the first place.
...We should care about low-measure instances in proportion to the measure, just as in classical
How many notions of consciousness do you think are implementable by a short Python program?
Because scale doesn't matter - it doesn't matter if you are implemented on thick or narrow computer.
First of all, macroscopical indistinguishability is not fundamental physical property - branching indifference is additional assumption, so I don't see how it's not as arbitrary as branch counting.
But more importantly, branching indifference assumption is not the same as informal "not caring about macroscopically indistinguishable differences"! As Wallace showed, branching indifference implies the Born rule implies you almost shouldn't care about you in a br...
But why would you want to remove this arbitrariness? Your preferences are fine-grained anyway, so why retain classical counting, but deny counting in the space of wavefunction? It's like saying "dividing world into people and their welfare is arbitrary - let's focus on measuring mass of a space region". The point is you can't remove all decision-theoretic arbitrariness from MWI - "branching indifference" is just arbitrary ethical constraint that is equivalent to valuing measure for no reason, and without it fundamental physics, that works like MWI, does not prevent you from making decisions as if quantum immortality works.
...“Decoherence causes the Universe to develop an emergent branching structure. The existence of this branching is a robust (albeit emergent) feature of reality; so is the mod-squared amplitude for any macroscopically described history. But there is no non-arbitrary decomposition of macroscopically-described histories into ‘finest-grained’ histories, and no non-arbitrary way of counting those histories.”
Importantly though, on this approach it is still possible to quantify the combined weight (mod-squared amplitude) of all branches that share a certain mac
Even if we can’t currently prove certain axioms, doesn’t this just reflect our epistemological limitations rather than implying all axioms are equally “true”?
It doesn't and they are fundamentally equal. The only reality is the physical one - there is no reason to complicate your ontology with platonically existing math. Math is just a collection of useful templates that may help you predict reality and that it works is always just a physical fact. Best case is that we'll know true laws of physics and they will work like some subset of math and then axio...
It sure doesn't seem to generalize in GPT-4o case. But what's the hypothesis for Sonnet 3.5 refusing in 85% of cases? And CoT improving score and o1 being better in browser suggests the problem is in models not understanding consequences, not in them not trying to be good. What's the rate of capability generalization to agent environment? Are we going to conclude that Sonnet is just demonstrates reasoning, instead of doing it for real, if it solves only 85% of tasks it correctly talks about?
Also, what's the rate of generalization of unprompted problematic behaviour avoidance? It's much less of a problem if your AI does what you tell it to do - you can just don't give it to users, tell it to invent nanotechnology, and win.
GPT-4 is insufficiently capable, even if it were given an agent structure, memory and goal set to match, to pull off a treacherous turn. The whole point of the treacherous turn argument is that the AI will wait until it can win to turn against you, and until then play along.
I don't get why actual ability matters. It's sufficiently capable to pull it off in some simulated environments. Are you claiming that we can't decieve GPT-4 and it is actually waiting and playing along just because it can't really win?
Whack-A-Mole fixes, from RLHF to finetuning, are about teaching the system to not demonstrate problematic behavior, not about fundamentally fixing that behavior.
Based on what? Problematic behavior avoidance does actually generalize in practice, right?
Here is a way in which it doesn't generalize in observed behavior:
TLDR: There are three new papers which all show the same finding, i.e. the safety guardrails from chat models don’t transfer well from chat models to the agents built from them. In other words, models won’t tell you how to do something harmful, but they will do it if given the tools. Attack methods like jailbreaks or refusal-vector ablation do transfer.
Here are the three papers, I am the author of one of them:
Not at all. The problem is that their observations would mostly not be in a classical basis.
I phrased it badly, but what I mean is that there is a simulation of Hilbert space, where some regions contain patterns that can be interpreted as observers observing something, and if you count them by similarity, you won't get counts consistent with Born measure of these patterns. I don't think basis matters in this model, if you change basis for observer, observations and similarity threshold simultaneously? Change of basis would just rotate or scale patterns,...
https://mason.gmu.edu/~rhanson/mangledworlds.html
I mean that if turing machine is computing universe according to the laws of quantum mechanics, observers in such universe would be distributed uniformly, not by Born probability. So you either need some modification to current physics, such as mangled worlds, or you can postulate that Born probabilities are truly random.
Our observations are compatible with a world that is generated by a Turing machine with just a couple thousand bits.
Yes, but this is kinda incompatible with QM without mangled worlds.
Imagining two apples is a different thought from imagining one apple, right?
I mean, is it? Different states of the whole cortex are different. And the cortex can't be in a state of imagining only one apple and, simultaneously, be in a state of imagining two apples, obviously. But it's tautological. What are we gaining from thinking about it in such terms? You can say the same thing about the whole brain itself, that it can only have one brain-state in a moment.
I guess there is a sense in which other parts of the brain have more various thoughts relativ...
I still don't get this "only one thing in awareness" thing. There are multiple neurons in cortex and I can imagine two apples - in what sense there can only be one thing in awareness?
Or equivalently, it corresponds equally well to two different questions about the territory, with two different answers, and there’s just no fact of the matter about which is the real answer.
Obviously the real answer is the model which is more veridical^^. The latter hindsight model is right not about the state of the world at t=0.1, but about what you thought about the world at t=0.1 later.
If that’s your hope—then you should already be alarmed at trends
Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.
For some humans, the answer will be yes—they really would do zero things!
Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?
...Where an entity has never had the option to do a thing, we may not validly in
I genuinely think it's a "more dakha" situation - the difficulty of communication is often underestimated, but it is possible to reach a mutual understanding.
RLHF does not solve the alignment problem because humans can’t provide good-enough feedback fast-enough.
Yeah, but the point is that the system learns values before an unrestricted AI vs AI conflict.
...As mentioned in the beginning, I think the intuition goes that neural networks have a personality trait which we call “alignment”, caused by the correspondence between their values and our values. But “their values” only really makes sense after an unrestricted AI vs AI conflict, since without such conflicts, AIs are just gonna propagate energy to whichever
But also, if you predict a completion model where a very weak hash is followed by its pre-image, it will probably have learned to undo the hash, even though the source generation process never performed that (potentially much more complicated than the hashing function itself) operation, which means it’s not really a simulator.
I'm saying that this won't work with current systems at least for strong hash, because it's hard, and instead of learning to undo, the model will learn to simulate, because it's easier. And then you can vary the strength of hash to...
And I don’t think we’ve observed any evidence of that.
What about any time a system generalizes favourably, instead of predicting errors? You can say it's just a failure of prediction, but it's not like these failures are random.
That is the central safety property we currently rely on and pushes things to be a bit more simulator-like.
And the evidence for this property, instead of, for example, the inherent bias of NNs, being central is what? Why wouldn't predictor exhibit more malign goal-directedness even for short term goals?
I can see that this who...
Why wouldn't myopic bias make it more likely to simulate than predict? And does't empirical evidence about LLMs support the simulators frame? Like, what observations persuaded you, that we are not living in the world, where LLMs are simulators?
Neuron count intuitively seems to be a better proxy for the variety/complexity/richness of positive experience. Then you can have an argument about how you wouldn't want to just increase intensity of pleasure, that just a relative number. That what matters is that pleasure is interesting. And so you would assign lesser weights to less rich experience. You can also generalize this argument to negative experiences - maybe you don't want to consider pain to be ten times worse just because someone multiplied some number by 10.
... (read more)