All of deepthoughtlife's Comments + Replies

Math is definitely just a language. It is a combination of symbols and a grammar about how they go together. It's what you come up with when you maximally abstract away the real world, and the part about not needing any grounding was specifically about abstract math, where there is no real world.

Verifiable is obviously important for training (since we could give effectively infinite training data), but the reason it is verifiable so easily is because it doesn't rely on the world. Also, note that programming languages are also just that, languages (and quite simple ones) but abstract math is even less dependent on the real world than programming.

Kaj's shortform feed

Trying to translate when people talk past each other

Math is just a language (a very simple one, in fact). Thus, abstract math is right in the wheelhouse for something made for language. Large Language Models are called that for a reason, and abstract math doesn't rely on the world itself, just the language of math. LLMs lack grounding, but abstract math doesn't require it at all. It seems more surprising how badly LLMs did math, not that they made progress. (Admittedly, if you actually mean ten years ago, that's before LLMs were really a thing. The primary mechanism that distinguishes the transformer was only barely invented then.)

7Noosphere894mo

I disagree with this, in that good mathematics definitely requires at least a little understanding of the world, and if I were to think about why LLMs succeeded at math, I'd probably point to the fact that it's an unusually verifiable task, relative to the vast majority of tasks, and would also think that the fact that you can get a lot of high-quality data also helps LLMs. Only programming shares these traits to an exceptional degree, and outside of mathematics/programming, I expect less transferability, though not effectively 0 transferability.

2Kaj_Sotala4mo

Yeah I'm not sure of the exact date but it was definitely before LLMs were a thing.

Trying to translate when people talk past each other

For something to be a betrayal does not require knowing the intent of the person doing it, and is not necessarily modified if you do. I already brought up the fact that it would be perfectly fine if they had asked permission, it is in the not asking permission to alter the agreed upon course where the betrayal comes in. Saying 'I will do x' is not implicitly asking for permission at all, it is a statement of intent, that disregards entirely that there was even an agreement at all.

deepthoughtlife4mo3-4

'what made A experience this as a betrayal' is the fact that it was. It really is that simple. You could perhaps object that it is strange to experience vicarious betrayal, but since it sounds like the four of you were a team, it isn't even that. This is a very minor betrayal, but if someone were to even minorly betray my family, for instance, I would automatically feel betrayed myself, and would not trust that person anymore even if the family member doesn't actually mind what they did.

Analogy time (well, another one), 'what makes me experience being cold... (read more)

4Davidmanheim4mo

I don't think it was betrayal, I think it was skipping verbal steps, which left intent unclear. If A had said "I promised to do X, is it OK now if I do Y instead?" There would presumably have been no confusion. Instead, they announced, before doing Y, their plan, leaving the permission request implicit. The point that "she needed A to acknowledge that he’d unilaterally changed an agreement" was critical to B, but I suspect A thought that stating the new plan did that implicitly.

Trying to translate when people talk past each other

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

Obviously, translating between different perspectives is often a very valuable thing to do. While there are a lot of disagreements that are values based, very often people are okay with the other party holding different values as long as they are still a good partner, and failure to communicate really is just failure to communicate.

I dislike the assumption that 'B' was reacting that way due to past betrayal. Maybe they were, maybe they weren't (I do see that 'B' confirmed it for you in a reaction to another comment, but making such assumptions is still a b... (read more)

4Kaj_Sotala4mo

True, but that is assuming that everyone was perceiving this as a betrayal. A relevant question is also, what made A experience this as a betrayal, when there were four people present and none of the other three did? (It wasn't even B's own plan that was being affected by the changed move, it was my plan - but I was totally fine with that, and certainly didn't experience that as a betrayal.) Betrayal usually means "violating an agreement in a way that hurts one person so that another person can benefit" - it doesn't usually mean "doing something differently than agreed in order to get a result that's better for everyone involved". In fact, there are plenty of situations where I would prefer someone to not do something that we agreed upon, if the circumstances suddenly change or there is new information that we weren't aware of before. Suppose that I'm a vegetarian and strongly opposed to buying meat. I ask my friend to bring me a particular food from the store, mistakenly thinking it's vegetarian. At the store, my friend realizes that the food contains meat and that I would be unhappy if they followed my earlier request. They bring me something else, despite having previously agreed to bring the food that I requested. I do not perceive this as a betrayal, I perceive this as following my wishes. While my friend may not be following our literal agreement, they are following my actual goals that gave rise to that agreement, and that's the most important thing. In the board game, three of us (A, me, and a fourth person who I haven't mentioned) were perceiving the situation in those terms: that yes, A was doing something differently than we'd agreed originally. But that was because he had noticed something that actually got the game into a better state, and "getting the game into as good of a state as possible" was the purpose of the agreement. Besides, once B objected, A was entirely willing to go back to the original plan. Someone saying "I'm going to do things dif

deepthoughtlife4mo20

You might believe that the distinctions I make are idiosyncratic, though the meanings are in fact clearly distinct in ordinary usage, but I clearly do not agree with your misleading use of what people would be lead to think are my words and you should take care to not conflate things. You want people to precisely match your own qualifiers in cases where that causes no difference in the meaning of what is said (which makes enough sense), but will directly object to people pointing out a clear miscommunication of yours because you do not care about a differe... (read more)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

deepthoughtlife4mo00

And here you are trying to be pedantic about language in ways that directly contradict other things you've said in speaking to me. In this case, everything I said holds if we change between 'not different' and 'not that different' (while you actually misquote yourself as 'not very different'). That said, I should have included the extra word in quoting you.

Your point is not very convincing. Yes, people disagree if they disagree. I do not draw the lines in specific spots, as you should know based on what I've written, but you find it convenient to assume I do.

3dirk4mo

No, I authentically object to having my qualifiers ignored, which I see as quite distinct from disagreeing about the meaning of a word. Edit: also, I did not misquote myself, I accurately paraphrased myself, using words which I know, from direct first-person observation, mean the same thing to me in this context.

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

deepthoughtlife4mo10

Do you hold panpsychism as a likely candidate? If not, then you most likely believe the vast majority of things are not conscious. We have a lot of evidence that the way it operates is not meaningfully different in ways we don't understand from other objects. Thus, almost the entire reference class would be things that are not conscious. If you do believe in panpsychism, then obviously AIs would be too, but it wouldn't be an especially meaningful statement.

You could choose computer programs as the reference class, but most people are quite sure those aren'... (read more)

4dirk4mo

You in particular clearly find it to be poor communication, but I think the distinction you are making is idiosyncratic to you. I also have strong and idiosyncratic preferences about how to use language, which from the outside view are equally likely to be correct; the best way to resolve this is of course for everyone to recognize that I'm objectively right and adjust their speech accordingly, but I think the practical solution is to privilege neither above the other. I do think that LLMs are very unlikely to be conscious, but I don't think we can definitively rule it out. I am not a panpsychist, but I am a physicalist, and so I hold that thought can arise from inert matter. Animal thought does, and I think other kinds could too. (It could be impossible, of course, but I'm currently aware of no reason to be sure of that). In the absence of a thorough understanding of the physical mechanisms of consciousness, I think there are few mechanisms we can definitively rule out. Whatever the mechanism turns out to be, however, I believe it will be a mechanism which can be implemented entirely via matter; our minds are built of thoughtless carbon atoms, and so too could other minds be built of thoughtless silicon. (Well, probably; I don't actually rule out that the chemical composition matters. But like, I'm pretty sure some other non-living substances could theoretically combine into minds.) You keep saying we understand the mechanisms underlying LLMs, but we just don't; they're shaped by gradient descent into processes that create predictions in a fashion almost entirely opaque to us. AIUI there are multiple theories of consciousness under which it could be a process instantiable that way (and, of course, it could be the true theory's one we haven't thought of yet). If consciousness is a function of, say, self-modeling (I don't think this one's true, just using it as an example) it could plausibly be instantiated simply by training the model in contexts where it must

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

deepthoughtlife4mo00

This statement is obviously incorrect. I have a vague concept of 'red', but I can tell you straight out that 'green' is not it, and I am utterly correct. Now, where does it go from 'red' to 'orange'? We could have a legitimate disagreement about that. Anyone who uses 'red' to mean 'green' is just purely wrong.

That said, it wouldn't even apply to me if your (incorrect) claim about a single definition not being different from an extremely confident vague definition was right. I don't have 'extreme confidence' about consciousness even as a vague concept. I am... (read more)

1dirk4mo

That is not the claim I made. I said it was not very different, which is true. Please read and respond to the words I actually say, not to different ones. The definitions are not obviously wrong except to people who agree with you about where to draw the boundaries.

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

deepthoughtlife4mo00

Pedantically, 'self-evident' and 'clear' are different words/phrases, and you should not have emphasized 'self-evident' in a way that makes it seem like I used it, regardless of whether you care/make that distinction personally. I then explained why a lack of evidence should be read against the idea that a modern AI is conscious (basically, the prior probability is quite low.)

1dirk4mo

My emphasis implied you used a term which meant the same thing as self-evident, which in the language I speak, you did. Personally I think the way I use words is the right one and everyone should be more like me; however, I'm willing to settle on the compromise position that we'll both use words in our own ways. As for the prior probability, I don't think we have enough information to form a confident prior here.

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

deepthoughtlife4mo10

Your comment is not really a response to the comment I made. I am not missing the point at all, and if you think I have I suspect you missed my point very badly (and are yourself extremely overconfident about it). I have explicitly talked about there being a number of possible definitions of consciousness multiple times and I never favored one of them explicitly. I repeat, I never assumed a specific definition of consciousness, since I don't have a specific one I assume at all, and I am completely open to talking about a number of possibilities. I simply p... (read more)

2dirk4mo

Having a vague concept encompassing multiple possible definitions, which you are nonetheless extremely confident is the correct vague concept, is not that different from having a single definition in which you're confident, and not everyone shares your same vague concept or agrees that it's clearly the right one.

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

deepthoughtlife5mo0-1

I agree that people use consciousness to mean different things, but some definitions need to be ignored as clearly incorrect. If someone wants to use a definition of 'red' that includes large amounts of 'green', we should ignore them. Words mean something, and can't be stretched to include whatever the speaker wants them to if we are to speak the same language (so leaving aside things like how 'no' means 'of' in Japanese). Things like purposefulness are their own separate thing, and have a number of terms meant to be used with them, that we can meaningfull... (read more)

3Raemon5mo

I don't feel very hopeful about the conversation atm, but fwiw I feel like you are missing a fairly important point while being pretty overconfident about not having missed it. Putting a different way: is there a percent of people who could disagree with you about what consciousness means, which would convince you that you it's not as straightforward as assuming you have the correct definition of consciousness, and that you can ignore everyone else? If <50% of people agreed with you? If <50% of the people with most of the power? (This is not about whether your definition is good, or the most useful, or whatnot – only that, if lots of people turned out to be mean different things by it, would it still particularly matter whether your definition was the "right" one?)

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

deepthoughtlife5mo-1-2

I did not use the term 'self-evident' and I do not necessarily believe it is self-evident, because theoretically we can't prove anything isn't conscious. My more limited claim is not that it is self evident that LLMs are not conscious, it's that they just clearly aren't conscious. 'Almost no reliable evidence' in favor of consciousness is coupled with the fact that we know how LLMs work (with the details we do not know are probably not important to this matter), and how they work is no more related to consciousness than an ordinary computer program is. It ... (read more)

1dirk4mo

My dialect does not have the fine distinction between "clear" and "self-evident" on which you seem to be relying; please read "clear" for "self-evident" in order to access my meaning.

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

[+]deepthoughtlife5mo-19-9

7Raemon5mo

I think many of the things Critch has listed as definitions of consciousness are not "weak versions of some strong version", they're just different things. You bring up a few times that LLMs don't "experience" [various things Critch lists here]. I agree, they pretty likely don't (in most cases). But, part of what I interpreted Critch's point here to be was that there are many things that people mean by "consciousness" that aren't actually about "experience" or "qualia" or whatnot. For example, I'd bet (75%) that when Critch says they have introspection, he isn't making any claims about them "experiencing" anything at all – I think he's instead saying "in the same way that their information processing system knows facts about Rome and art and biology and computer programming, and can manipulate those facts, it can also know and manipulate facts about it's thoughts and internal states." (whereas other ML algorithms may not be able to know and manipulate their thoughts and internal states) A major point Critch was making in previous post is that when people say "consciousness", this is one of the things they sometimes mean. The point is not that LLMs are conscious the way you are using the word, but that when you see debates about whether they are conscious, it will include some people who think it means "purposefulness."

4dirk5mo

I agree LLMs are probably not conscious, but I don't think it's self-evident they're not; we have almost no reliable evidence one way or the other.

Making a conservative case for alignment

deepthoughtlife5mo50

As a (severe) skeptic of all the AI doom stuff and a moderate/centrist that has been voting for conservatives I decided my perspective on this might be useful here (which obviously skews heavily left). (While my response is in order, the numbers are there to separate my points, not to give which paragraph I am responding to.)

"AI-not-disempowering-humanity is conservative in the most fundamental sense"
1.Well, obviously this title section is completely true. If conservative means anything, it means being against destroying the lives o... (read more)

2Cameron Berg5mo

Agree with this—we do discuss this very idea at length here and also reference it throughout the piece. I think this is a good distillation of the key bottlenecks and seems helpful for anyone interacting with lawmakers to keep in mind.

Seven lessons I didn't learn from election day

deepthoughtlife5mo210

1. Kamala Harris did run a bad campaign. She was 'super popular' at the start of the campaign (assuming you can trust the polls, though you mostly can't), and 'super unpopular' losing definitively at the end of it. On September 17th, she was ahead by 2 points in polls, and in a little more than a month and a half she was down by that much in the vote. She lost so much ground. She had no good ads, no good policy positions, and was completely unconvincing to people who weren't guaranteed to vote for her from the start. She had tons of money to get out all of... (read more)

Seven lessons I didn't learn from election day

Basics of Handling Disagreements with People

Some people went into the 2024 election fearing that pollsters had not adequately corrected for the sources of bias that had plagued them in 2016 and 2020.

I mostly heard the opposite, that they had overcorrected.

Basics of Handling Disagreements with People

As it often does when I write, this ended up being pretty long (and not especially well written by the standards I wish I lived up to).

I'm sure I did misunderstand part of what you are saying (that we do misunderstand easily was the biggest part of what we appear to agree on), but also, my disagreements aren't necessarily things you don't actually mention yourself. I think we disagree mostly on what outcomes the advice itself will give if adopted overly eagerly, because I see the bad way of implementing them as being the natural outcome. Again, I think you... (read more)

I have a lot of disagreements with this piece, and just wrote these notes as I read it. I don't know if this will even be a useful comment. I didn't write it as a through line. 'You' and 'your' are often used nonspecifically about people in general.

The usefulness of things like real world examples seems to vary wildly.

Rephrasing is often terrible; rephrasing done carelessly actually often leads to basically lying about what your conversation partner is saying, especially since many people will double down on the rephasing when told that they are wrong, whi... (read more)

1Camille Berger 5mo

Hi! Thank you for writing this comment. I understand it can be a bit worrying to feel like your points might not be understood, but I'll give it a try nonetheless. I really genuinely want to fix any serious flaw in my approach. However, I find myself in a slightly strange situation. Part of your feedback is very valuable. But I also believe that you misunderstood part of what I was saying. I could apply the skills I described in the post on your comment as a performative example, but I'm sensing that you could see it as a form of implied sarcasm, and it'd be unethical, so I'll refrain from doing that. There is a last part of me that just feels like your point is "part of this post is poorly written". I've made some minor edits in the hope that it accomodates your criticism. My suggestion would be for you to watch real-life examples of the techniques I promote (say https://www.youtube.com/watch?v=d2WdbXsqj0M and https://www.youtube.com/watch?v=_tdjtFRdbAo ) then comment on those examples instead. Alternatively, you can just read my answers: Agree, I've added the detail on "genuinely asking your interlocutor if this is what they mean, and if not, feel free to offer a correction" (e.g. "If I got you right, and feel free to correct me if I didn't.... "). I think that this form makes it almost always a pleasant experience and I somehow forgot this important detail. You're referring to point 4, not 5, right ? If yes, I think this is extrapolating beliefs I don't actually have. I admit however I didn't choose a good example, you can refer to the Street Epistemology video above for a better one. I'll replace the example soonish. In the mean time, please note that I do not suggest to "attack" personal experiences. I suggest to ask "What helps us distinguish reliable personal experiences from unreliable ones ?". This is a valid question to ask, in my view. For a bunch of reasons, this question has more chances to bounce off, so I prefer to ask "How do you distinguish

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

deepthoughtlife5mo20

To be pedantic, my model is pretty obvious, and clearly gives this prediction, so you can't really say that you don't see a model here, you just don't believe the model. Your model with extra assumptions doesn't give this prediction, but the one I gave clearly does.

You can't find a person this can't be done to because there is something obviously wrong with everyone? Things can be twisted easily enough. (Offense is stronger than defense here.) If you didn't find it, you just didn't look hard/creatively enough. Our intuitions against people tricking u... (read more)

Should CA, TX, OK, and LA merge into a giant swing state, just for elections?

It does of course raise the difficulty level for the political maneuvering, but would make things far more credible which means that people could actually rely on it. It really is quite difficult to precommit to things you might not like, so structures that make it work seem interesting to me.

I think it would be a bad idea to actually do (there are so many problems with it in practice), but it is a bit of an interesting thing to note how being a swing state helps convince everyone to try to cater to you, and not just a little. This would be the swing state to end all swing states, I suppose.

The way to get this done that might actually work is probably to make it an amendment to each state's constitution that can only be repealed for future elections and not the one the constitutional change reverting this would be voted on in. (If necessary, you can always amend how the state constitution is amended to make this doable.)

3lemonhope5mo

I am impressed with how far you thought this through. Amend the constitution, including the constitution amendment section

Abstractions are not Natural

deepthoughtlife5mo12

I should perhaps have added something I thought of slightly later that isn't really part of my original model, but an intentional blindspot can be a sign of loyalty in certain cases.

The good thing about existence proofs is that you really just have to find an example. Sometimes, I can do that.

It seems I was not clear enough, but this is not my model. (I explain it to the person who asked if you want to see what I meant, but I was talking about parties turning their opponents into scissors statements.)

That said, I do believe that it is a possible partial explanation that sometimes having an intentional blind spot can be seen as a sign of loyalty by the party structure.

So, my model isn't about them making their candidate that way, it is the much more obvious political move... make your opponent as controversial as possible. There is something weird / off / wrong about your opponent's candidate, so find out things that could plausibly make the electorate think that, and push as hard as possible. I think they're good enough at it. Or, in other words, try to find the best scissors statements about your opponent, where 'best' is determined both in terms of not losing your own supporters, and in terms of losing your opponent ... (read more)

4AnnaSalamon5mo

I mean, I see why a party would want their members to perceive the other party's candidate as having a blind spot. But I don't see why they'd be typically able to do this, given that the other party's candidate would rather not be perceived this way, the other party would rather their candidate not be perceived this way, and, naively, one might expect voters to wish not to be deluded. It isn't enough to know there's an incentive in one direction; there's gotta be more like a net incentive across capacity-weighted players, or else an easier time creating appearance-of-blindspots vs creating visible-lack-of-blindspots, or something. So, I'm somehow still not hearing a model that gives me this prediction.

1deepthoughtlife5mo

I should perhaps have added something I thought of slightly later that isn't really part of my original model, but an intentional blindspot can be a sign of loyalty in certain cases.

Abstractions are not Natural

deepthoughtlife5mo51

While there are legitimate differences that matter quite a bit between the sides, I believe a lot of the reason why candidates are like 'scissors statements' is because the median voter theorem actually kind of works, and the parties see the need to move their candidates pretty far toward the current center, but they also know they will lose the extremists to not voting or voting third party if they don't give them something to focus on, so both sides are literally optimizing for the effect to keep their extremists engaged.

3AnnaSalamon5mo

I don't follow this model yet. I see why, under this model, a party would want the opponent's candidate to enrage people / have a big blind spot (and how this would keep the extremes on their side engaged), but I don't see why this model would predict that they would want their own candidate to enrage people / have a big blind spot.

Abstractions are not Natural

When reading the piece, it seemed to assume far too much (and many of the assumptions are ones I obviously disagree with). I would call many of the assumptions made to be a relative of the false dichotomy (though I don't know what it is called when you present more than two possibilities as exhaustive but they really aren't.) If you were more open in your writing to the idea that you don't necessarily know what the believers in natural abstractions mean, and that the possibilities mentioned were not exhaustive, I probably would have had a less negative rea... (read more)

deepthoughtlife5mo11

Honestly, this post seems very confused to me. You are clearly thinking about this in an unproductive manner. (Also a bit overtly hostile.)

The idea that there are no natural abstractions is deeply silly. To gesture at a brief proof, the counting numbers '1' '2' '3' '4' etc as applied to objects. There is no doubt these are natural abstractions. See also 'on land', 'underwater', 'in the sky' etc. Others include things like 'empty' vs 'full' vs 'partially full and partially empty' as well as 'bigger', 'smaller', 'lighter', 'heavier' etc.

The utility functions... (read more)

1papetoast5mo

Agreed on the examples of natural abstractions. I held a couple abstraction examples in my mind (e.g. atom, food, agent) while reading the post and found that it never really managed to attack these truly very general (dare I say natural) abstractions.

3Alfred Harwood5mo

Its late where I am now so I'm going to read carefully and respond to comments tomorrow, but before I go to bed I want to quickly respond to your claim that you found the post hostile because I don't want to leave it hanging. I wanted to express my disagreements/misunderstandings/whatever as clearly as I could but had no intention to express hostility. I bear no hostility towards anyone reading this, especially people who have worked hard thinking about important issues like AI alignment. Apologies to you and anyone else who found the post hostile.

Does the "ancient wisdom" argument have any validity? If a particular teaching or tradition is old, to what extent does this make it more trustworthy?

Answer by deepthoughtlifeNov 04, 2024137

It obviously has 'any' validity. If an instance of 'ancient wisdom' killed off or weakened the followers enough, it wouldn't be around. Also, said thing has been optimized for a lot of time by a lot of people, and the version we receive probably isn't the best, but still one of the better versions.

While some will weaken the people a bit and stick around for sounding good, they generally are just ideas that worked well enough. The best argument for 'ancient wisdom' is that you can actually just check how it has effected the people using it. If it has good e... (read more)

Big tech transitions are slow (with implications for AI)

deepthoughtlife5mo1-2

I definitely agree. No matter how useful something will end up being, or how simple it seems the transition will be, it always takes a long time because there is always some reason it wasn't already being used, and because everyone has to figure out how to use it even after that.

For instance, maybe it will become a trend to replace dialogue in videogames with specially trained LLMs (on a per character basis, or just trained to keep the characters properly separate). We could obviously do it right now, but what is the likelihood of any major trend toward th... (read more)

deepthoughtlife5mo21

No problem with the failure to respond. I appreciate that this way of communicating is asynchronous (and I don't necessarily reply to things promptly either). And I think it would be reasonable to drop it at any point if it didn't seem valuable.

Also, you're welcome.

LLMs can learn about themselves by introspection

deepthoughtlife5mo20

Sorry, I don't have a link for using actual compression algorithms, it was a while ago. I didn't think it would come up so I didn't note anything down. My recent spate of commenting is unusual for me (and I don't actually keep many notes on AI related subjects).

I definitely agree that it is 'hard to judge' 'more novel and more requiring of intelligence'. It is, after all, a major thing we don't even know how to clearly solve for evaluating other humans (so we use tricks that often rely on other things and these tricks likely do not generalize to other poss... (read more)

1eggsyntax5mo

Agreed, that's definitely a general failure mode.

deepthoughtlife6mo20

I obviously tend to go on at length about things when I analyze them. I'm glad when that's useful.

I had heard that OpenAI models aren't deterministic even at the lowest randomness, which I believe is probably due to optimizations for speed like how in image generation models (which I am more familiar with) the use of optimizers like xformers throws away a little correctness and determinism for significant improvements in resource usage. I don't know what OpenAI uses to run these models (I assume they have their own custom hardware?), but I'm pretty sure th... (read more)

1Felix J Binder6mo

That is a good point! Indeed, one of the reasons that we measure introspection the way we do is because of the feedforward structure of the transformer. For every token that the model produces, the inner state of the model is not preserved for later tokens beyond the tokens already in context. Therefore, if you are introspecting at time n+1 about what was going on inside you at point n, the activations you would be targeting would be (by default) lost. (You could imagine training a model so that its embedding of previous token carries some information about internal activations, but we don’t expect that this is the case by default). Therefore, we focus on introspection in a single forward pass. This is compatible with the model reasoning about the result of its introspection after it has written it into its context. I agree! One ways in which self-simulation is a useful strategy might be when the training data contains outputs that are similar to how the model would actually act: ie, for GPT N, that might be outputs of GPT N-1. Then, you might use your default behavior to stand in for that text. It seems plausible that people do this to some degree: if I know how I tend to behave, I can use this to stand in for predicting how other people might act in a particular situation. I take it that this is what you point out in the second paragraph. Ah apologies—this might be a confusion due to the examples we show in the figures. We use the same general prompt templates for hypothetical questions in training and test. The same general patterns of our results hold when evaluating on the test set (see the appendix).

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes

It seems like you are failing to get my points at all. First, I am defending the point that blue LEDs are unworthy because the blue LED is not worthy of the award, but I corrected your claiming it was my example. Second, you are the only one making this about snubbing at all. I explicitly told you that I don't care about snubbing arguments. Comparisons are used for other reasons than snubbing. Third, since this isn't about snubbing, it doesn't matter at all whether or not the LED could have been given the award.

deepthoughtlife6mo20

The point is that the 'Blue LED' is not a sufficient advancement over the 'LED' not that it is a snub. I don't care about whether or not it is a snub. That's just not how I think about things like this. Also, note that the 'Blue LED' was not originally my example at all, someone else brought it up as an example.

I talked about 'inventing LEDs at all' since that is the minimum related thing where it might actually have been enough of a breakthrough in physics to matter. Blue LEDs are simply not significant enough a change from what we already had. Even just ... (read more)

3gwern6mo

Then maybe you shouldn't be trying to defend it (or your other two examples of engines and programming languages, for that matter), especially given that you still have not explained how 'the LED' could have been given a Nobel ever inasmuch as everyone involved was dead.

LLMs can learn about themselves by introspection

deepthoughtlife6mo50

I find the idea of determining the level of 'introspection' an AI can manage to be an intriguing one, and it seems like introspection is likely very important to generalizing intelligent behavior, and knowing what is going on inside the AI is obviously interesting for the reasons of interpretability mentioned, yet this seems oversold (to me). The actual success rate of self-prediction seems incredibly low considering the trivial/dominant strategy of 'just run the query' (which you do briefly mention) should be easy for the machine to discover during traini... (read more)

2Owain_Evans6mo

I addressed this point here. Also see section 7.1.1 in the paper.

1Felix J Binder6mo

Thanks so much for your thoughtful feedback! To rule out that the model just simulates the behavior itself, we always ask it about some property of its hypothetical behavior (”Would the number that you would have predicted be even or odd?”). So it has to both simulate itself and then reason about it in a single forward pass. This is not trivial. When we ask models to just reproduce the behavior that they would have had, they achieve much higher accuracy. In particular, GPT3.5 can reproduce its own behavior pretty well, but struggles to extract a property of its hypothetical behavior. (another minor thing: it turns out that OpenAI API models are not in practice deterministic even at temperature=0, probably due to batching of mixture-of-experts. We try to account for this by repeatedly sampling, but this puts a ceiling on how high sel-prediction performance can be) It’s true that we only find evidence for introspection on toy tasks. Under the simulation account (models internally simulate what they would do and then reason about it), it could be that current models do not have enough power in a single forward pass to both self-simulate and do sophisticated reasoning on top of this. But having shown that, in some cases, models are capable of this, we might want to prepare for future models to be better at this ability. That’s a fair point—we certainly don’t want to claim that this shows that all self-reports by models are necessarily true. But we do think that our findings should update us in the direction of self-report of morally relevant properties being a promising research avenue. Had we found that models have no special access to information about themselves, we should consider it less likely that self-report about sentience etc. would be informative. Introspection training can be thought of as a form of elicitation. Self-prediction is weird task that models probably aren't trained on (but we don't know exactly what the labs are doing). So it could be that t

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes

deepthoughtlife6mo0-2

Substantial technical accomplishment' sure, but minor impact compared to the actual invention of LEDs. Awarding the 'blue LED' rather than the 'LED' is like saying the invention of the jet engine is more important than the invention of the engine at all. Or that the invention of 'C' is more important than the invention of 'not machine code'.

gwern6mo2515

One of the problems with the Nobel Prize as a measurement or criteria is that it is not really suited for that by nature, especially given criteria like no posthumous awards. This means that it is easy to critique awarding a Nobel Prize, but it is harder to critique not awarding one. You can't give a Nobel Prize to the inventor of the engine, because they probably died a long time ago; you could have for a recent kind of engine. Similarly, you could give a Turing Award to the inventors of C (and they probably did) but the first person who created a mnemoni... (read more)

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes

deepthoughtlife6mo60

Note that I am, in general, reluctant to claim to know how I will react to evidence in the future. There are things so far out there that I do know how I would react, but I like to allow myself to use all the evidence I have at that point, and not what I thought beforehand. I do not currently know enough about what would convince me of intelligence in an AI to say for sure. (In part because many people before me have been so obviously wrong.)

I wouldn't say I see intelligence as a boolean, but as many valued... but those values include a level below which t... (read more)

1eggsyntax6mo

Interesting, if you happen to have a link I'd be interested to learn more. I like the idea, but it seems hard to judge 'more novel and [especially] more requiring of intelligence' other than to sort completions in order of human error on each. I think there's a lot of work to be done on this still, but there's some evidence that in-context learning is essentially equivalent to gradient descent (though also some criticism of that claim). I continue to think so :). Thanks again!

1eggsyntax6mo

Hi, apologies for having failed to respond; I went out of town and lost track of this thread. Reading back through what you've said. Thank you!

deepthoughtlife6mo2-2

Huh, they really gave a Nobel in Physics specifically for the blue LED? It would have made sense for LED's at all, but specifically for blue? That really is ridiculous.

I should be clearer that AlphaFold seems like something that could be a chemistry breakthrough sufficient for a prize, I'd even heard about how difficult the problem was before in other contexts, and it was hailed as a breakthrough at the time in what seemed like a genuine way, but I can't evaluate its importance to the field as an outsider, and the terrible physics prize leads me to suspect their evaluations of the Chemistry prize might be flawed due to whatever pressures led to the selection of the Physics prize.

7Algon6mo

Inventing blue LEDs was a substantial technical accomplishment, had a huge impact on society, was experimentally verified and can reasonably be called work in solid state physics.

Demis Hassabis and Geoffrey Hinton Awarded Nobel Prizes

I think that the fact that they are technically separate people just makes it more likely for this to come into play. If it was all the same people, they could simply choose the best contribution of AI and be done with it, but they have the same setup, pressures, and general job, but have not themselves honored AI yet... and each wants to make their own mark.

I do think this is much more likely the reason that the physics one was chosen than chemistry, but it does show that the pressures that exist are to honor AI even when it doesn't make sense.

I do think ... (read more)

What would be a minimal-ish definitive test for LLM style AI? I don't really know. I could come up with tests for it most likely, but I don't really know how to make them fairly minimal. I can tell you that current AI isn't intelligent, but as for what would prove intelligence, I've been thinking about it for a while and I really don't have much. I wish I could be more helpful.

I do think your test of whether an AI can follow the scientific method in a novel area is intriguing.

Historically, a lot of people have come up with (in retrospect) really dumb tests... (read more)

1eggsyntax6mo

Thanks for the lengthy and thoughtful reply! I'm planning to make a LW post soon asking for more input on this experiment -- one of my goals here is to make this experiment one that both sides of the debate agree in advance would provide good evidence. I'd love to get your input there as well if you're so moved! I tend not to think of intelligence as a boolean property, but of an entity having some level of intelligence (like IQ, although we certainly can't blithely give IQ tests to LLMs and treat the results as meaningful, not that that stops people from doing it). I don't imagine you think of it as boolean either, but calling that out in case I'm mistaken. Agreed; at this point I assume that anything published before (or not long after) the knowledge cutoff may well be in the training data. The obfuscation method matters as well; eg I think the Kambhampati team's approach to obfuscation made the problems much harder in ways that are irrelevant or counterproductive to testing LLM reasoning abilities (see Ryan's comment here and my reply for details). I'd absolutely love that and agree it would help enormously to resolve these sorts of questions. But my guess is we won't see deliberate exclusions on frontier LLMs anytime in the next couple of years; it's difficult and labor-intensive to do at internet scale, and the leading companies haven't shown any interest in doing so AFAIK (or even in releasing comprehensive data about what the training data was). Very interesting idea! I think I informally anectested something similar at one point by introducing new mathematical operations (but can't recall how it turned out). Two questions: * Since we can't in practice train a frontier LLM without multiplication, would artificial new operations be equally convincing in your view (eg, I don't know, x # y means sqrt(x - 2y)? Ideally something a bit less arbitrary than that, though mathematicians tend to already write about the non-arbitrary ones). * Would providing few

MakoYass's Shortform

Any Trump Supporters Want to Dialogue?

To the best of my ability to recall, I never recognize which is which except by context, which makes it needlessly difficult sometimes. Personally I would go for 'subconscious' vs 'conscious' or 'associative' vs 'deliberative' (the latter pair due to how I think the subconscious works), but 'intuition' vs 'reason' makes sense too. In general, I believe far too many things are given unhelpful names.