Eli Tyre

Comments

Sorted by

You mean that the human attention mechanism is the assessor? 

Do you have a pointer for why you think that? 

My (admittedly weak) understanding of the neuroscience doesn't suggest that there's a specialized mechanism for critique of prior thoughts.

I'm kind of baffled that people are so willing to say that LLMs understand X, for various X. LLMs do not behave with respect to X like a person who understands X, for many X.

Do you have two or three representative examples?

In particular, even if the LLM were being continually trained (in a way that's similar to how LLMs are already trained, with similar architecture), it still wouldn't do the thing humans do with quickly picking up new analogies, quickly creating new concepts, and generally reforging concepts.

Is this true? How do you know? (I assume there's some facts here about in-context learning that I just happen to not know.)

It seems like eg I can teach an LLM a new game in one session, and it will operate within the rules of that game.

Remember that we have no a priori reason to suspect that there are jumps in the future; humans perform sequential reasoning differently, so comparisons to the brain are just not informative.

In what way do we do it differently than the reasoning models?

@Valentine comes to mind as a person who was raised lifeist and is now still lifeist, but I think has more complicated feelings/views about the situation related to enlightenment and metaphysics that make death an illusion, or something.

Of course the default outcome of doing finetuning on any subset of data with easy-to-predict biases will be that you aren't shifting the inductive biases of the model on the vast majority of the distribution. This isn't because of an analogy with evolution, it's a necessity of how we train big transformers. In this case, the AI will likely just learn how to speak the "corrigible language" the same way it learned to speak french, and this will make approximately zero difference to any of its internal cognition, unless you are doing transformations to its internal chain of thought that substantially change its performance on actual tasks that you are trying to optimize for.

This is a pretty helpful answer. 

(Though you keep referencing the AI's chain of thought. I wasn't imagining training over the chain of thought. I was imagining training over the AI's outputs, whatever those are in the relevant domain.)

Would you expect that if you trained an AI system on translating its internal chain of thought into a different language, that this would make it substantially harder for it to perform tasks in the language in which it was originally trained in?

I would guess that if you finetuned a model so that it always responded in French, regardless of the languge you prompt it with, it would persistently respond in French (absent various jailbreaks which would almost definitely exist).

 

I'm not sure that I share that intuition, I think because my background model of humans has them as much less general than I imagine yours does. 

 

Load More