All of brambleboy's Comments + Replies

Another idea: real photos have lots of tiny details to notice regularities in. Pixel art images, on the other hand, can only be interpreted properly by "looking at the big picture". AI vision is known to be biased towards textures rather than shape, compared to humans.

2ryan_greenblatt
I don't think it is specific to pixel art, I think it is more about general visual understanding, particularly when you have to figure out downstream consequences from the visual understanding (like "walk to here").

Probably because the dataset of images + captions scraped from the internet consists of lots of boring photos with locations attributed to them, and not a lot of labeled screenshots of pixel art games with by comparison. This is similar to how LLMs are very good at stylometry, because they have lots of experience making inferences about authors based on patterns in the text.

Another idea: real photos have lots of tiny details to notice regularities in. Pixel art images, on the other hand, can only be interpreted properly by "looking at the big picture". AI vision is known to be biased towards textures rather than shape, compared to humans.

I still think it's weird that many AI safety advocates will criticize labs for putting humanity at risk while simultaneously being paid users of their products and writing reviews of their capabilities. Like, I get it, we think AI is great as long as it's safe, we're not anti-tech, etc.... but is "don't give money to the company that's doing horrible things" such a bad principle?

"I find Lockheed Martin's continued production of cluster munitions to be absolutely abhorrent. Anyway, I just unboxed their latest M270 rocket system and I have to say I'm quite impressed..."

6MichaelDickens
The argument people make is that LLMs improve the productivity of people's safety research so it's worth paying. That kinda makes sense. But I do think "don't give money to the people doing bad things" is a strong heuristic. I'm a pretty big believer in utilitarianism but I also think people should be more wary of consequentialist justifications for doing bad things. Eliezer talks about this in Ends Don't Justify Means (Among Humans), he's also written some (IMO stronger) arguments elsewhere but I don't recall where. Basically, if I had a nickel for every time someone made a consequentialist argument for why doing a bad thing was net positive, and then it turned out to be net negative, I'd be rich enough to diversify EA funding away from Good Ventures. ---------------------------------------- I have previously paid for LLM subscriptions (I don't have any currently) but I think I was not giving enough consideration to the "ends don't justify means among humans" principle, so I will not buy any subscriptions in the future.

Presenting fabricated or cherry-picked evidence might have the best odds of persuading someone of something true, and so you could argue that doing so "maximizes the truth of the belief" they get, but that doesn't make it honest.

Just tried it. The description is in fact completely wrong! The only thing it sort of got right is that the top left square contains a rabbit.

7eggsyntax
Aha! Whereas I just asked for descriptions (same link, invalidating the previous request) and it got every detail correct (describing the koala as hugging the globe seems a bit iffy, but not that unreasonable). So that's pretty clear evidence that there's something preserved in the chat for me but not for you, and it seems fairly conclusive that for you it's not really parsing the image. Which at least suggests internal state being preserved (Coconut-style or otherwise) but not being exposed to others. Hardly conclusive, though. Really interesting, thanks for collaborating on it! Also Patrick Leask noticed some interesting things about the blurry preview images:  

Your 'just the image' link is the same as the other link that includes the description request, so I can't test it myself. (unless I'm misunderstanding something)

3eggsyntax
Oh, I see why; when you add more to a chat and then click "share" again, it doesn't actually create a new link; it just changes which version the existing link points to. Sorry about that! (also @Rauno Arike) So the way to test this is to create an image and only share that link, prior to asking for a description. Just as recap, the key thing I'm curious about is whether, if someone else asks for a description of the image, the description they get will be inaccurate (which seemed to be the case when @brambleboy tried it above). So here's another test image (borrowing Rauno's nice background-image idea): https://chatgpt.com/share/680007c8-9194-8010-9faa-2594284ae684 To be on the safe side I'm not going to ask for a description at all until someone else says that they have.

I see, I didn't read the thread you linked closely enough. I'm back to believing they're probably the same weights.

I'd like to point out, though, that in the chat you made, ChatGPT's description gets several details wrong. If I ask it for more detail within your chat, it gets even more details wrong (describing the notebook as white and translucent instead of brown, for example). In one of my other generations it also used a lot of vague phrases like "perhaps white or gray".

 When I sent the image myself it got all the details right. I think this is go... (read more)

7eggsyntax
That's absolutely fascinating -- I just asked it for more detail and it got everything precisely correct (updated chat). That makes it seem like something is present in my chat that isn't being shared; one natural speculation is internal state preserved between token positions and/or forward passes (eg something like Coconut), although that's not part of the standard transformer architecture, and I'm pretty certain that open AI hasn't said that they're doing something like that. It would be interesting if that's that's what's behind the new GPT-4.1 (and a bit alarming, since it would suggest that they're not committed to consistently using human-legible chain of thought). That's highly speculative, though. It would be interesting to explore this with a larger sample size, although I personally won't be able to take that on anytime soon (maybe you want to run with it?).

I think these sort of concerns will manifest in the near future, but it'll be confusing because AI's competence will continue to be unevenly distributed and unintuitive. I expect some AI systems will be superhuman, such as automated vehicles and some AI diagnosticians, and that incompetent AIs will gain unwarranted trust by association while the competent AIs get unwarranted distrust by association. Sometimes trusting AI will save lives, other times it will cost them.

This thread shows an example of ChatGPT being unable to describe the image it generated, though, and other people in the thread (seemingly) confirm that there's a call to a separate model to generate the image. The context has an influence on the images because the context is part of the tool call.

3eggsyntax
Interesting! When someone says in that thread, "the model generating the images is not the one typing in the conversation", I think they're basing it on the API call which the other thread I linked shows pretty conclusively can't be the one generating the image, and which seems (see responses to Janus here) to be part of the safety stack. In this chat I just created, GPT-4o creates an image and then correctly describes everything in it. We could maybe tell a story about the activations at the original-prompt token positions providing enough info to do the description, but then that would have applied to nearcyan's case as well.

We should always be able to translate latent space reasoning aka neuralese (see COCONUT) to a human language equivalent representation.

I don't think this is true at all. How do you translate, say, rotating multiple shapes in parallel into text? Current models already use neuralese as they refine their answer in the forward pass. Why can't we translate that yet? (Yes, we can decode the model's best guess at the next token, but that's not an explanation.)

Chain-of-thought isn't always faithful, but it's still what the model actually uses when it does serial c... (read more)

4gwern
At least for multimodal LLMs in the pure-token approach like Gato or DALL-E 1 (and probably GPT-4o and Gemini, although few details have been published), you would be able to do that by generating the tokens which embody an encoded image (or video!) of several shapes, well, rotating in parallel. Then you just look at them.

The rocket image with the stablediffusionweb watermark on it is interesting for multiple reasons:

  1. It shows they haven't eliminated watermarks randomly appearing in generated images yet, which is an old problem that seems like it should've been solved by now.
  2. It actually looks like it was generated by an older Stable Diffusion model, which means this model can emulate the look of other models.

I think some long tasks are like a long list of steps that only require the output of the most recent step, and so they don't really need long context. AI improves at those just by becoming more reliable and making fewer catastrophic mistakes. On the other hand, some tasks need the AI to remember and learn from everything it's done so far, and that's where it struggles- see how Claude Plays Pokémon gets stuck in loops and has to relearn things dozens of times.

7lc
I haven't read the METR paper in full, but from the examples given I'm worried the tests might be biased in favor of an agent with no capacity for long term memory, or at least not hitting the thresholds where context limitations become a problem:   For instance, task #3 here is at the limit of current AI capabilities (takes an hour). But it's also something that could plausibly be done with very little context; if the AI just puts all of the example files in its context window it might be able to write the rest of the decoder from scratch. It might not even need to have the example files in memory while it's debugging its project against the test cases. Whereas a task to fix a bug in a large software project, while it might take an engineer associated with that project "an hour" to finish, requires stretching the limits of the amount of information it can fit inside a context window, or recall beyond what we seem to be capable of doing today. 

Claude finally made it to Cerulean after the "Critique Claude" component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)

I'm glad you shared this, it's quite interesting. I don't think I've ever had something like that happen to me and if it did I'd be concerned, but I could believe that it's prevalent and normal for some people.

I don't think your truth machine would work because you misunderstand what makes LLMs hallucinate. Predicting what a maximum-knowledge author would write induces more hallucinations, not less. For example, say you prompted your LLM to predict text supposedly written by an omniscient oracle, and then asked "How many fingers am I holding behind my back?" The LLM would predict an answer like "three" or something, because an omniscient person would know that, even though it's probably not true.

In other words, you'd want the system to believe "this writer I'm p... (read more)

I've been trying to put all my long-form reading material in one place myself, and found a brand-new service called Reader which is designed specifically for this purpose. It has support for RSS, Newsletters, YouTube transcripts, and other stuff. $10 annually / $13 monthly.

1roland
I'm looking for something simpler that doesn't require understanding another concept besides probability. The article you posted is a bit confusing: So is Arbital: Fixed: the "Miss Scarlett" hypothesis assigns a probability of 20% to e

Thanks for responding.

I agree with what you're saying; I think you'd want to maintain your reward stream at least partially. However, the main point I'm trying to make is that in this hypothetical, it seems like you'd no longer be able to think of your reward stream as grounding out your values. Instead it's the other way around: you're using your values to dictate the reward stream. This happens in real life sometimes, when we try to make things we value more rewarding.

You'd end up keeping your values, I think, because your beliefs about what you value do... (read more)

3johnswentworth
No. An analogy: suppose I run a small messaging app, and all the users' messages are stored in a database. The messages are also cached in a faster-but-less-stable system. One day the database gets wiped for some reason, so I use the cache to repopulate the database. In this example, even though I use the cache to repopulate the database in this one weird case, it is still correct to say that the database is generally the source of ground truth for user messages in the system; the weird case is in fact weird. (Indeed, that's exactly how software engineers would normally talk about it.) Spelling out the analogy: in a human brain in ordinary operation, our values (I claim) ground out in the reward stream, analogous to the database. There's still a bunch of "caching" of values, and in weird cases like the one you suggest, one might "repopulate" the reward stream from the "cached" values elsewhere in the system. But it's still correct to say that the reward stream is generally the source of ground truth for values in the system; the weird case is in fact weird.

This conception of values raises some interesting questions for me.

Here's a thought experiment: imagine your brain loses all of its reward signals. You're in a depression-like state where you no longer feel disgust, excitement, or anything. However, you're given an advanced wireheading controller that lets you easily program rewards back into your brain. With some effort, you could approximately recreate your excitement when solving problems, disgust at the thought of eating bugs, and so on, or you could create brand-new responses. My questions:

  • What would
... (read more)
6johnswentworth
Good question. First and most important: if you know beforehand that you're at risk of entering such a state, then you should (according to your current values) probably put mechanisms in place to pressure your future self to restore your old reward stream. (This is not to say that fully preserving the reward stream is always the right thing to do, but the question of when one shouldn't conserve one's reward stream is a separate one which we can factor apart from the question at hand.) ... and AFAICT, it happens that the human brain already works in a way which would make that happen to some extent by default. In particular, most of our day-to-day planning draws on cached value-estimates which would still remain, at least for a time, even if the underlying rewards suddenly zeroed out. ... and it also happens that other humans, like e.g. your friends, would probably prefer (according to their values) for you to have roughly-ordinary reward signals rather than zeros. So that would also push in a similar direction. And again, you might decide to edit the rewards away from the original baseline afterwards. But that's a separate question. On the other hand, consider a mind which was never human in the first place, never had any values or rewards, and is given the same ability to modify its rewards as in your hypothetical. Then - I claim - that mind has no particular reason to favor any rewards at all. (Although we humans might prefer that it choose some particular rewards!) Your question touched on several different things, so let me know if that missed the parts you were most interested in.
brambleboy2715

While I don't have specifics either, my impression of ML research is that it's a lot of work to get a novel idea working, even if the idea is simple. If you're trying to implement your own idea, you'll be banging your head against the wall for weeks or months wondering why your loss is worse than the baseline. If you try to replicate a promising-sounding paper, you'll bang your head against the wall as your loss is worse than the baseline. It's hard to tell if you made a subtle error in your implementation or if the idea simply doesn't work for reasons you... (read more)

At the start of my Ph.D. 6 months ago, I was generally wedded to writing "good code". The kind of "good code" you learn in school and standard software engineering these days: object oriented, DRY, extensible, well-commented, and unit tested.

I think you'd like Casey Muratori's advice. He's a software dev who argues that "clean code" as taught is actually bad, and that the way to write good code efficiently is more like the way you did it intuitively before you were taught OOP and stuff. He advises "Semantic Compression" instead- essentially you just s... (read more)

2Oliver Daniels
just read both posts and they're great (as is The Witness). It's funny though, part of me wants to defend OOP - I do think there's something to finding really good abstractions (even preemptively),  but that its typically not worth it for self-contained projects with small teams and fixed time horizons (e.g. ML research projects, but also maybe indie games). 

Yeah, I think the mainstream view of activism is something like "Activism is important, of course. See the Civil Rights and Suffrage movements. My favorite celebrity is an activist for saving the whales! I just don't like those mean crazy ones I see on the news."

5Viliam
That mainstream is like one side of the American political spectrum, now also do the other side. ;) Seems to me there are three factors to how one perceives an activist, most important first: * Do I support their agenda, or do I oppose it? * If I oppose their agenda, how threatened do I feel by their activism? If I support their agenda, how devastating blow do I think they delivered to my enemies? * How do the activists actually behave? Do they politely express their opinions? Do they destroy public and private property? Do they attack other people? The problem is that the third point is the least important one. A typical person will excuse any violence on their side as "necessary" (and sometimes also as "cool"). On the other hand, even seemingly peaceful behavior cannot compensate for the fact that "their goals are evil". Basically, the third point mostly matters for people who don't have a dog in this fight. The more radicalized is the society, the fewer such people are.

Pacing is a common stimming behavior. Stimming is associated with autism / sensory processing disorder, but neurotypical people do it too.

This seems too strict to me, because it says that humans aren't generally intelligent, and that a system isn't AGI if it's not a world-class underwater basket weaver. I'd call that weak ASI.

1Daniel Tan
Fair point, I’ll probably need to revise this slightly to not require all capabilities for the definition to be satisfied. But when talking to laypeople I feel it’s more important to convey the general “vibe” than to be exceedingly precise. If they walk away with a roughly accurate impression I’ll have succeeded

Fatebook has worked nicely for me so far, and I think it'd be cool to use it more throughout the day. Some features I'd like to see:

  • Currently tags seem to only be useful for filtering your track record. I'd like to be able to filter the forecast list by tag.
  • Allow clicking and dragging the bar to modify probabilities.
  • An option to input probabilities in formats besides percentages, such as odds ratios or bits.
  • An option to resolve by a specific time, not just a date, plus an option for push notification reminders instead of emails. This would open the door to
... (read more)
1Adam B
Thank you! I agree this would be nice. But try clicking on a tag to see all of your forecasts under it!

When I see an event with the stated purpose of opposing highly politically polarized things such as cancel culture and safe spaces, I imagine a bunch of people with shared politics repeating their beliefs to each other and snickering, and any beliefs that are actually highly controversial within that group are met with "No no, that's what they want you to think, you missed the point!" It seems possible to avoid that failure mode with a genuine truth-seeking culture, so I hope you succeeded.

9Cole Wyeth
“Cancel culture is good actually” needs to go in the hat ;)

It's been about 4 years. How do you feel about this now?

Bluesky has custom feeds that can bring in posts from all platforms that use the AT Protocol, but Bluesky is the only such platform right now. Most feeds I've found so far are simple keyword searches, which work nicely for having communities around certain topics, but I hope to see more sophisticated ones pop up.

2Garrett Baker
Kicking myself for not making a fatebook about this. It definitely sounded like the kind of thing that wouldn't replicate.
1Cipolla
Do you know why the error bars in the replication are smaller than the original one?  (just more people?) And with which confidence is the null hypothesis (difference = 0) is rejected in both cases?
2Matt Goldenberg
They replicated it within the video itself?

While most people have super flimsy defenses of meat-eating, that doesn't mean everyone does. Some people simply think it's quite unlikely that non-human animals are sentient (besides primates, maybe). For example, IIRC Eliezer Yudkowsky and Rob Bensinger's guess is that consciousness is highly contingent on factors such as general intelligence and sociality, or something like that.

I think the "5% chance is still too much" argument is convincing, but it begs similar questions such as "Are you really so confident that fetuses aren't sentient? How could you be so sure?"

1Isaac King
Eliezer's argument is the primary one I'm thinking of as an obvious rationalization. https://www.lesswrong.com/posts/KFbGbTEtHiJnXw5sk/i-really-don-t-understand-eliezer-yudkowsky-s-position-on https://benthams.substack.com/p/against-yudkowskys-implausible-position I'm not confident about fetuses either, hence why I generally oppose abortion after the fetus has started developing a brain.

I agree that origami AIs would still be intelligent if implementing the same computations. I was trying to point at LLMs potentially being 'sphexish': having behaviors made of baked if-then patterns linked together that superficially resemble ones designed on-the-fly for a purpose. I think this is related to what the "heuristic hypothesis" is getting at.

2Noosphere89
IMO, I think the heuristic hypothesis is partially right, but partially right is the keyword, in the sense that LLMs both will have sphexish heuristics and mostly clean algorithms for solving problems. I also expect OpenAI to broadly move LLMs from more heuristic-like reasoning to algorithmic-like reasoning, and o1 is slight evidence towards more systematic reasoning in LLMs.

The paper "Auto-Regressive Next-Token Predictors are Universal Learners" made me a little more skeptical of attributing general reasoning ability to LLMs. They show that even linear predictive models, basically just linear regression, can technically perform any algorithm when used autoregressively like with chain-of-thought. The results aren't that mind-blowing but it made me wonder whether performing certain algorithms correctly with a scratchpad is as much evidence of intelligence as I thought.

2Noosphere89
One man's modus ponens is another man's modus tollens, and what I do take away from the result is that intelligence with enough compute is too easy to do, so easy that even linear predictive models can do it in theory. So they don't disprove that intelligent/algorithmic reasoning isn't happening in LLMs, but rather that it's too easy to get intelligence/computation by many different methods. It's similar to the proof that an origami computer can compute every function computable by a Turing Machine, and if in a hypothetical world we were instead using very large origami pieces to build up AIs like AlphaGo, I don't think that there would be a sense in which it's obviously not reasoning about the game of Go. https://www.quantamagazine.org/how-to-build-an-origami-computer-20240130/

Even if you know a certain market is a bubble, it's not exactly trivial to exploit if you don't know when it's going to burst, which prices will be affected, and to what degree. "The market can remain irrational longer than you can remain solvent" and all that.

Personally, while I think that investment will decrease and companies will die off, I doubt there's a true AI bubble, because there are so many articles about it being in a bubble that it couldn't possibly be a big surprise for the markets if it popped, and therefore the hypothetical pop is already p... (read more)

1Remmelt
Yes, all of this. I didn’t know how to time this, and also good point that operationalising it in terms of AI stocks to target at what strike price could be tricky too. 

The fourth image is of the "Z machine", or the Z Pulsed Power Facility, which creates massive electromagnetic pulses for experiments. It's awesome.

I can second this. I recommend the chrome extension Unhook, which allows you to disable individual parts of YouTube, and Youtube-shorts block, which makes YouTube shorts play like normal videos.

(Disclaimer: I'm not very knowledgeable about safety engineering or formal proofs)

I notice that whenever someone brings up "What if this unexpected thing happens?", you emphasize that it's about not causing accidents. I'm worried that it's hard to define exactly who caused an accident, for the same reason that deciding who's liable in the legal system is hard.

It seems quite easy to say that the person who sabotaged the stop sign was at fault for the accident. What if the saboteur poured oil on the road instead? Is it their fault if the car crashes from sli... (read more)

If random strangers start calling you "she", that implies you look feminine enough to be mistaken for a woman. I think most men would prefer to look masculine for many reasons: not being mistaken for a woman, being conventionally attractive, being assumed to have a 'manly' rather than 'effeminate' personality, looking your age, etc.

If you look obviously masculine, then being misgendered constantly would just be bewildering. Surely something is signaling that you use feminine pronouns.

If it's just people online misgendering you based on your writing, then that's less weird. But I think it still would bother some people for some of the reasons above.

6ymeskhout
That's the thing, I generally present as very masculine and if anyone referred to me as 'she' I would find it more confusing than anything else. If I actually cared, maybe I'd look for what effeminate signals I gave off, but I can't imagine a scenario where I would find it offensive or get mad at the person.

I predict that implementing bots like these into social media platforms (in their current state) would be poorly received by the public. I think many people's reaction to Grok's probability estimate would be "Why should I believe this? How could Grok, or anyone, know that?" If it were a prediction market, the answer would be "because <economic and empirical explanation as to why you can trust the markets>". There's no equivalent answer for a new bot, besides "because our tests say it works" (making the full analysis visible might help). From these co... (read more)

The images on this post appear to be broken.

If you go on Twitter/X and find the right people, you can get most of the benefits you list here. There are tastemakers that share and discuss intriguing papers, and researchers who post their own papers with explanation threads which are often more useful than the papers themselves. The researchers are usually available to answer questions about their work, and you can read the answers they've given already. You're also ahead of the game because preprints can appear way before conferences.

It may be through extrapolating too much from your (first-person, subjective) experiences with objects that seemingly possess intrinsic, observer-independent properties, like the classical objects of everyday life.

 

Are you trying to say that quantum physics provides evidence that physical reality is subjective, with conscious observers having a fundamental role? Rob implicitly assumes the position advocated by The Quantum Physics Sequence, which argues that reality exists independently of observers and that quantum stuff doesn't suggest otherwise. It'... (read more)

Another example in ML of a "non-conservative" optimization process: a common failure mode of GANs is mode collapse, wherein the generator and discriminator get stuck in a loop. The generator produces just one output that fools the discriminator, the discriminator memorizes it, the generator switches to another, until eventually they get back to the same output again.

In the rolling ball analogy, we could say that the ball rolls down into a divot, but the landscape flexes against the ball to raise it up again, and then the ball rolls into another divot, and so on.

brambleboy107

So of course Robin Hanson offered polls on these so-called taboo topics. The ‘controversial’ positions got overwhelming support. The tenth question, whether demographic diversity (race, gender) in the workplace often leads to worse performance got affirmed 54%-17%, and the rest were a lot less close than that. Three were roughly 90%-1%. I realize Hanson has unusual followers, but the ‘taboo questions’ academics want to discuss? People largely agree on the answers, and the academics have decided saying that answer out loud is not permitted.

 

I understan... (read more)

4cata
It seems like this is a place where "controversial" and "taboo" diverge in meaning. The politician would notice that the sentence was about a taboo topic and bounce off, but that's probably totally unconnected to whether or not it would be controversial among people who know anything about genetics or intelligence and are actually expressing a belief. For example, they would bounce off regardless of whether the number in the sentence was 1%, 50%, or 90%.

For those curious about the performance: eyeballing the technical report, it roughly performs at the level of LLama-3 70B. It seems to have an inferior parameters-to-performance ratio because it was only trained on 9 trillion tokens, while the Llama-3 models were trained on 15 trillion tokens. It's also trained with a 4k context length as opposed to Llama-3's 8k. Its primary purpose seems to be the synthetic data pipeline thing.

I encountered this while I was reading about an obscure estradiol ester, Estradiol undecylate, used for hormone replacement therapy and treating prostate cancer. It's very useful because it has a super long half-life, but it was discontinued. I had to reread the article to be sure I understood that the standard dose chosen arbitrarily in the first trials was hundreds of times larger than necessary, leading to massive estrogen overdoses and severe side effects that killed many people due to cardiovascular complications, and yet these insane doses were typical for decades and might've caused its discontinuation.

Although it has been over a decade, decent waterproof phone mounts now exist, too.

Thank you for writing this, this is by far the strongest argument for taking this problem seriously tailored to leftists I've seen and I'll be sharing it. Hopefully the frequent (probably unavoidable) references to EA doesn't turn them off too much.

2garrison
Thank you so much! I haven't gotten any serious negative feedback from lefties for the EA stuff so far, though an e/acc on Twitter mentioned it haha
Answer by brambleboy73

Here's why determinism doesn't bother me. I hope I get it across.

Deterministic systems still have to be simulated to find out what happens. Take cellular automata, such as Conway's Game of Life or Wolfram's Rule 110, . The result of all future steps is determined by the initial state, but we can't practically "skip ahead" because of what Wolfram calls 'computational irreducibility': despite the simplicity of the underlying program, there's no way to reduce the output to a calculation that's much cheaper than just simulating the whole thing. Same with a mat... (read more)

I disagree that the Reversal Curse demonstrates a fundamental lack of sophistication of knowledge on the model’s part. As Neel Nanda explained, it’s not surprising that current LLMs will store A -> B but not B -> A as they’re basically lookup tables, and this is definitely an important limitation. However, I think this is mainly due to a lack of computational depth. LLMs can perform that kind of deduction when the information is external, that is, if you prompt it with who Tom Cruise’s mom is, it can then answer who Mary Lee Pfeiffer’s son is. I... (read more)

6gwern
As Nanda also points out, the reversal curse only holds for out-of-context reasoning: in-context, they have no problem with it and can answer it perfectly easily. So, it is a false analogy here because he's eliding the distinction between in-context and prompt-only (training). Humans do not do what he claims they do: "instantly update their world-model such that it'd be obvious to them that B is A". At least, in terms of permanent learning rather than in-context reasoning. For example, I can tell you that Tom Cruise's mother is named 'Mary Lee Pfeiffer' (thanks to that post) but I cannot tell you who 'Mary Lee Pfeiffer' is out of the blue, any more than I can sing the alphabet song backwards spontaneously and fluently. But - like an LLM - I can easily do both once I read your comment and now the string "if you prompt it with who Tom Cruise’s mom is, it can then answer who Mary Lee Pfeiffer’s son is" is in my context (working/short-term memory). I expect, however, that despite my ability to do so as I write this comment, if you ask me again in a month 'who is Mary Lee Pfeiffer?' I will stare blankly at you and guess '...a character on Desperate Housewives, maybe?' It will take several repetitions, even optimally spaced, before I have a good chance of answering 'ah yes, she's Tom Cruise's mother' without any context. Because I do not 'instantly update my world-model such that it'd be obvious to me that [Mary Lee Pfeiffer] is [the mother of Tom Cruise]'.