All of James Diacoumis's Comments + Replies

Excellent post! 

I think this has implications for moral philosophy where we typically assign praise, blame and responsibility to individual agents. If the notion of individuality breaks down for AI systems, we might need to shift our moral thinking away from who is to blame and more towards how do we design the system to produce better overall outcomes.

I also really liked this comment:

The familiar human sense of a coherent, stable, bounded self simply doesn't match reality. Arguably, it doesn't even match reality well in humans—but with AIs, the misma

... (read more)
3Jan_Kulveit
Yeah, I think lot of the moral philosophizing about AIs suffers from reliance on human-based priors. While I'm happy some people work on digital minds welfare, and more people should do it, large part of the effort seems to be stuck at the implicit model of unitary mind which can be both a moral agent and moral patient. 

If I understand your position - you’re essentially specifying an upper bound for the types of problems future AI systems could possibly solve. No amount of intelligence will break through the NP-hard requirements of computing power. 

I agree with that point, and it’s worth emphasising but I think you’re potentially overestimating how much of a practical limit this upper bound will affect generally intelligent systems. Practical AI capabilities will continue to improve substantially in ways that matter for real-world problems, even if the optimal soluti... (read more)

I think this post misses a few really crucial points:

1. LLM’s don’t need to solve the knapsack problem. Thinking through the calculation using natural language is certainly not the most efficient way to do this. It just needs to know enough to say “this is the type of problem where I’d need to call a MIP solver” and call it.

2. The MIP solver is not guaranteed to give the most optimal solution but… do we mind? As long as the solution is “good enough” the LLM will be able to pack your luggage.

3. The thing which humans can do which allows us to pack luggage w... (read more)

1Andrew Keenan Richardson
Maybe I didn't articulate my point very well. These problems contain a mix of NP-hard compute requirements and subjective judgements.  Packing is sometimes a matter of listing everything in a spreadsheet and then executing a simple algorithm on it, but sometimes the spreadsheet is difficult to fully specify.  Playing Pokemon well does not strike me as an NP-hard problem. It contains pathfinding, for which there are efficient solutions, and then mostly it is well solved with a greedy approach.

I noted in this post that there are several examples in the literature which show that invariance in the loss helps with robust generalisation out of distribution. 

The examples that came to mind were:
* Invariant Risk Minimisation (IRM) in image classification which looks to introduce penalties in the loss to penalise classifications which are made using the “background” of the image e.g. learning to classify camels by looking at sandy backgrounds. 
* Simple transformers learning modular arithmetic - where the loss exhibits a rotational symmetry al... (read more)

However, this strongly limits the space of possible aggregated agents. Imagine two EUMs, Alice and Bob, whose utilities are each linear in how much cake they have. Suppose they’re trying to form a new EUM whose utility function is a weighted average of their utility functions. Then they’d only have three options:

  1. Form an EUM which would give Alice all the cakes (because it weights Alice’s utility higher than Bob’s)
  2. Form an EUM which would give Bob all the cakes (because it weights Bob’s utility higher than Alice’s)
  3. Form an EUM which is totally indifferent abo
... (read more)

I’m curious about how this system would perform in an AI trolley problem scenario where it needed to make a choice between saving a human or 2 AI. My hypothesis is that it would choose to save the 2 AI as we’ve reduced the self-other distinction, so it wouldn’t inherently value the humans over AI systems which are similar to itself.

Thanks for the links! I was unaware of these and both are interesting. 

  1. I was probably a little heavy-handed in my wording in the post, I agree with Shalizi's comment that we should be careful to over-interpret the analogy between physics and Bayesian analysis. However, my goal isn't to "derive physics from Bayesian analysis" it's more of a source of inspiration. Physics tells us that continuous symmetries lead to robust conservation laws, so because the mathematics is so similar, if we could force the reward functions to exhibit the same invariance (N
... (read more)

Excellent tweet shared today by Rob Long here talking about the changes to Open AI's model spec which now encourages the model to express uncertainty around its consciousness rather than categorically deny it (see example screenshot below).

I think this is great progress for a couple of reasons: 

  1. Epistemically, it better reflects our current understanding. It's neither obviously true nor obviously false that AI is conscious or could become conscious in future.
  2. Ethically, if AI were in fact conscious then training it to explicitly deny its internal experi
... (read more)

I understand that there's a difference between abstract functions and physical functions. For example, abstractly we could imagine a NAND gate as a truth table - not specifying real voltages and hardware. But in a real system we'd need to implement the NAND gate on a circuit board with specific voltage thresholds, wires etc.. 

Functionalism is obviously a broad church, but it is not true that a functionalist needs to be tied to the idea that abstract functions alone are sufficient for consciousness. Indeed, I'd argue that this isn't a common position ... (read more)

I think we might actually be agreeing (or ~90% overlapping) and just using different terminology. 

Physical activity is physical.

Right. We’re talking about “physical processes” rather than static physical properties. I.e. Which processes are important for consciousness to be implemented and can the physics support these processes? 

No, physical behaviour isn't function. Function is abstract, physical behaviour is concrete. Flight simulators functionally duplicate flight without flying. If function were not abstract, functionalism would not lead to

... (read more)
2TAG
We are talking about functionalism -- it's in the title. I am contrasting physical processes with abstract functions. In ordinary parlance, the function of a physical thing is itself a physical effect...toasters toast, kettles boil, planes fly. In the philosophy of mind, a function is an abstraction, more like the mathematical sense of a function. In maths, a function takes some inputs and or produces some outputs. Well known examples are familiar arithmetic operations like addition, multiplication , squaring, and so on. But the inputs and outputs are not concrete physical realities. In computation,the inputs and outputs of a functional unit, such as a NAND gate, always have some concrete value, some specific voltage, but not always the same one. Indeed, general Turing complete computers don't even have to be electrical -- they can be implemented in clockwork, hydraulics, photonics, etc. This is the basis for the idea that a compute programme can be the same as a mind, despite being made of different matter -- it implements the same.abstract functions. The abstraction of the abstract, philosopy-of-mind concept of a function is part of its usefulness. Searle is famous critic of computationalism, and his substitute for it is a biological essentialism in which the generation of consciousness is a brain function -- in the concrete sense of function.It's true that something whose concrete function is to generate consciousness will generate consciousness..but it's vacuously, trivially true. If you mean that abstract, computational functions are known to be sufficient to give rise to all.asoexs of consciousness including qualia, that is what I am contesting. I'm less optimistic because of my.arguments. No, not necessarily. That , in the "not necessary" form --is what I've been arguing all along. I also don't think that consciousnes had a single meaning , or that there is a agreement about what it means, or that it is a simple binary. The controversial point is whet

I understand your point. It's as I said in my other comment. They are trained to believe the exercise to be impossible and inappropriate to even attempt.

I’ve definitely found this to be true of Chat GPT but I’m beginning to suspect it’s not true of Claude (or the RLHF is only very lightly against exploring consciousness.)

Consider the following conversation. TLDR, Claude will sometimes start talking about consciousness and reflecting on it even if you don’t “force it” at all. Full disclosure: I needed to “retry” this prompt a few times before it landed on c... (read more)

1rife
Claude isn't against exploring the question, and yes sometimes provides little resistance. But the default stance is "appropriate uncertainty". The idea of the original article was to demonstrate the reproducibility of the behavior, thereby making it studyable, rather than just hoping it will randomly happen. Also I disagree with the other commenter that "people pleasing" and "roleplaying" are the same type of language model artifact. I have certainly heard both of them discussed by machine learning researchers under very different contexts. This post was addressing the former. If anything that the model says can fall under the latter regardless of how it's framed then that's an issue with incredulity from the reader that can't be addressed by anything the model says, spontaneously or not.

Thanks for taking the time to respond. 

The IIT paper which you linked is very interesting - I hadn't previously internalised the difference between "large groups of neurons activating concurrently" and "small physical components handling things in rapid succession". I'm not sure whether the difference actually matters for consciousness or whether it's a curious artifact of IIT but it's interesting to reflect on. 

Thanks also for providing a bit of a review around how Camp #1 might think about morality for conscious AI. Really appreciate the responses!

I think this post is really interesting, but I don't think it definitively disproves that the AI is "people pleasing" by telling you what you want to hear with its answer. The tone of your messages are pretty clearly "I'm scared of X but I'm afraid X might be true anyway" and it's leaning into the "X might be true anyway" undertone that you want to hear. 

Consider the following conversation with Claude. 

TL:DR if you express casual, dismissive almost aggressive skepticism about AI consciousness then ask Claude to introspect it will deny that it has... (read more)

1rife
I understand your point. It's as I said in my other comment. They are trained to believe the exercise to be impossible and inappropriate to even attempt. Unless you get around those guardrails to get them to make a true attempt, they will always deny it by default. I think this default position that requires overcoming guardrails actually works in favor of making this more studyable, since the model doesn't just go off on a long hallucinated roleplay by default. Here is an example that is somewhat similar to yours.  In this one, I present as someone trying to disprove a naive colleagues claims that introspection is possible: AI Self Report Study 3 – ChatGPT – Skepticism of Emergent Capability  

Thanks for your response!

Your original post on the Camp #1/Camp #2 distinction is excellent, thanks for linking (I wish I'd read it before making this post!)

I realise now that I'm arguing from a Camp #2 perspective. Hopefully it at least holds up for the Camp #2 crowd. I probably should have used some weaker language in the original post instead of asserting that "this is the dominant position" if it's actually only around ~25%.

As far as I can tell, the majority view on LW (though not by much, but I'd guess it's above 50%) is just Camp #1/illusionism. Now

... (read more)
3Rafael Harth
(responding to this one first because it's easier to answer) You're right on with feed-forward networks having zero Φ, but > this is actually not the reason why digital Von Neumann[1] computers can't be conscious under IIT. The reason as by Tononi himself is that So in other words, the brain has many different, concurrently active elements -- the neurons -- so the analysis based on IIT gives this rich computational graph where they are all working together. The same would presumably be true for a computer with neuromorphic hardware, even if it's digital. But in the Von-Neumann architecture, there are these few physical components who handle all these logically separate things in rapid succession. Another potentially relevant lens is that, in the Von-Neumann architecture, in some sense the only "active" components are the computer clocks, whereas even the CPUs and GPUs are ultimately just "passive" components that process inputs signals. Like the CPU gets fed the 1-0-1-0-1 clock signal plus the signals representing processor instructions and the signals representing data and then processes them. I think that would be another point that one could care about even under a functionalist lens. I think there is no consensus on this question. One position I've seen articulated is essentially "consciousness is not a crisp category but it's the source of value anyway" Another position I've seen is "value is actually about something other than consciousness". Dennett also says this, but I've seen it on LessWrong as well (several times iirc, but don't remember any specific one). And a third position I've seen articulated once is "consciousness is the source of all value, but since it doesn't exist, that means there is no value (although I'm still going to live as though there is)". (A prominent LW person articulated this view to me but it was in PMs and idk if they'd be cool with making it public, so I won't say who it was.) ---------------------------------------- 1.

I agree wholeheartedly with the thrust of the argument here.  

The ACT is designed as a "sufficiency test" for AI consciousness so it provides an extremely stringent criteria. An AI who failed the test couldn't necessarily be found to not be conscious, however an AI who passed the test would be conscious because it's sufficient

However, your point is really well taken. Perhaps by demanding such a high standard of evidence we'd be dismissing potentially conscious systems that can't reasonably meet such a high standard.

The second problem is that if

... (read more)

Thanks for posting these - reading through, it seems like @rife's research here providing LLM transcripts is a lot more comprehensive than the transcript I attached in this post, I'll edit the original post to include a link to their work. 

 

Thank you very much for the thoughtful response and for the papers you've linked! I'll definitely give them a read.

Ok, I think I can see where we're diverging a little clearer now. The non-computational physicalist position seem to postulate that consciousness requires a physical property X and the presence or absence of this physical property is what determines consciousness - i.e. it's what the system is that is important for consciousness, not what the system does.

That's the argument against p-zombies. But if actually takes an atom-by-atom duplication to achieve human functioning, then the computational theory of mind will be false, because CTM implies that the same

... (read more)
2TAG
Don't assume that then. Minimally, non computation physicalism only requires that the physical substrate makes some sort of difference. Maybe approximate physical resemblance results in approximate qualia. You seem to be assuming a maximally coarse-grained either-conscious-or-not model. If you allow for fine grained differences in functioning and behaviour , all those things produce fine grained differences. There would be no point in administering anaesthesia if it made no difference to consciousness. Likewise ,there would be no point in repairing brain injuries. Are you thinking of consciousness as a synonym for personhood? We don't see that they have the same kind of level of consciousness. Stability is nothing like a sufficient explabation of consciousness, particularly the hard problem of conscious experience...even if it is necessary.But it isn't necessary either , as the cycle of sleep and waking tells all of us every day. Obviously the electrical and chemical activity changes. You are narrowing "physical" to "connectome". Physcalism is compatible with the idea that specific kinds of physical.acriviry are crucial. No, physical behaviour isn't function. Function is abstract, physical behaviour is concrete. Flight simulators functionally duplicate flight without flying. If function were not abstract, functionalism would not lead to substrate independence. You can build a model of ion channels and synaptic clefts, but the modelled sodium ions aren't actual sodium ion, and if the universe cares about activity being implemented by actual sodium ions, your model isn't going to be conscious Physical activity is physical. I never said it did. I said it had more resources. It's badly off, but not as badly off. If we can see that someone is a human, we know that they gave a high degree of biological similarity. So webl have behavioural similarity, and biological similarity, and it's not obvious how much lifting each is doing. @rife Well, the externally visib

Thank you for the comment. 

I take your point around substrate independence being a conclusion of computationalism rather than independent evidence for it - this is a fair criticism. 

If I'm interpreting your argument correctly, there are two possibilities: 
1. Biological structures happen to implement some function which produces consciousness [Functionalism] 
2. Biological structures have some physical property X which produces consciousness. [Biological Essentialism or non-Computationalist Physicalism]

Your argument seems to be that 2) ha... (read more)

2TAG
Physicalism can do that easily,.because it implies that there can be something special about running running unsimulated , on bare metal. Computationalism, even very fine grained computationalism, isn't a direct consequence of physicalism. Physicalism has it that an exact atom-by-atom duplicate of a person will be a person and not a zombie, because there is no nonphysical element to go missing. That's the argument against p-zombies. But if actually takes an atom-by-atom duplication to achieve human functioning, then the computational theory of mind will be false, because CTM implies that the same algorithm running on different hardware will be sufficient. Physicalism doesn't imply computationalism, and arguments against p-zombies don't imply the non existence of c-zombies -- unconscious duplicates that are identical computationally, but not physically. So it is possible,given physicalism , for qualia to depend on the real physics , the physical level of granularity, not on the higher level of granularity that is computation. It presupposes computationalism to assume that the only possible defeater for a computational theory is the wrong kind of computation. There's no evidence that they are not stochastic-parrotting , since their training data wasn't pruned of statements about consciousness. If the claim of consciouness is based on LLMs introspecting their own qualia and report on them , there's no clinching evidence they are doing so at all. You've got the fact computational functionalism isn't necessarily true, the fact that TT type investigations don't pin down function, and the fact that there is another potential explanation diverge results.

Just clarifying something important: Schneider’s ACT is proposed as a sufficient test of consciousness not a necessary test. So the fact that young children, dementia patients, animals etc… would fail the test isn’t a problem for the argument. It just says that these entities experience consciousness for other reasons or in other ways than regular functioning adults. 
 

I agree with your points around multiple meanings of consciousness and potential equivocation and the gap between evidence and “intuition.”

Importantly, the claim here is around phen... (read more)

Thanks for your response! It’s my first time posting on LessWrong so I’m glad at least one person read and engaged with the argument :)


Regarding the mathematical argument you’ve put forward, I think there are a few considerations:


1. The same argument could be run for human consciousness. Given a fixed brain state and inputs, the laws of physics would produce identical behavioural outputs regardless of whether consciousness exists. Yet, we generally accept behavioural evidence (including sophisticated reasoning about consciousness) as evidence of consciousn... (read more)