christos
christos has not written any posts yet.

christos has not written any posts yet.

I’m skeptical of “tool AI” for a quite different reason: I don’t think such systems will be powerful enough. Just like the “mathematician AGI” in Section 11.3.2 above, I think a tool AI would be a neat toy, but it wouldn’t help solve the big problem—namely, that the clock is ticking until some other research group comes along and makes an agentic AGI.
I think that a math-AGI could not be of major help in alignment, on the premise that it works well on already well-researched and well-structured fields. For example, one could try to fit a model two proof techniques for a specific theorem, and see if it can produce a third one, that is... (read more)
In other words, the very essence of intelligence is coming up with new ideas, and that’s exactly where the value function is most out on a limb and prone to error.
But what exactly are new ideas? It could be the case that intelligence is pattern-matching at it most granural level even for "noveties". What could come in handy here is a great flagging mechanism for understanding when the model is out-of-distribution. However, this could come at its own cost.
It gets even worse if a self-reflective AGI is motivated to deliberately cause credit assignment failures.
Is the use of "deliberately" here trying to account for the *thinking about its own thoughts*-part of going back and forth between thought generator and thought assesor?
“A year before you first met your current boyfriend (or first saw him, or first become aware of his existence), did you already like him? Did you already think he was cute?” I predict that they will say “no”, and maybe even give you a funny look.
Okay, now I get the point of "neither like nor dislike" in your original statement.
I was originally thinking of sth as follows: "A year before you met your current boyfriend, would you have thought he was cute, if he was your type?". But "your type" requires seeing them to get a reference point of if they belong in that class or not. So there's a circular... (read more)
I liked the painting metaphor, and the diagram of brain-like AGI motivation!
Got a couple of questions below.
It’s possible that you would find this nameless pattern rewarding, were you to come across it. But you can’t like it, because it’s not currently part of your world-model. That also means: you can’t and won’t make a goal-oriented plan to induce that nameless pattern.
I agree that if you haven't seen something, then its not exactly a part of your world-model. But judging from the fact that it has say positive reward, does this not mean that you like(d) it? Or that aposteriori we can tell it lied inside your "like" region? (it was somewhere in close to... (read more)
Hey Steven, im new in the LW community so please excuse my formatting.
Case #1 would involve changing the model weights, while Case #2 would not. Instead, Case #2 would solely involve changing the model activations.
I am confused about the deployment part of offline training. Is it not the case that when people use a model (aka query a trained model on validation set), they seek to evaluate and not fit the new examples? So would it not be about changing weights in online learning vs using the relevant activations in offline mode?
Two models for AGI development. The one on the left is directly analogous to how evolution created human brains. The one on the right involves an analogy between the genome and the source code defining an ML algorithm, as spelled out in the next subsection.
Could it be the case that the "evolution from scratch" model is learned in the Learned Content of the "ML code" approach? Is that what the mesa-optimization line suggests?
Thanks!
These responses are based on my experiences, not through concrete evidence.
When you put humans in them.
I wouldn't say we're okay with it, we've reached a point of inertia I'd say.
The need to alleviate the burden of living.
Anything genetic aside, I'd say upbringing and mindset.
There... (read more)