azsantosk

Co-founder and CEO of quiver.trade. Interested in mechanism design and neuroscience. Hopes to contribute to AI alignment.

Twitter: https://twitter.com/azsantosk

Wiki Contributions

Comments

Sorted by

Curious to hear your thoughts @paulfchristiano, and whether you have updated based on the latest IMO progress.

azsantosk100

Early 2023 I bet $500 on AI winning the IMO gold medal by 2026. This was a 1:1 bet against Michael Vassar, meaning I attributed >50% to this. It now seems very likely that I'm going to win.

To me, this was to be expected as a straightforward application of AlphaZero-like self-play amplification and destillation. The missing piece was the analogous policy network, which was a convolutional neural network for the AlphaZero board games. Once it became quite clear that existing LLMs were capable of being smart enough to generate good heuristics to this (with enough data), it seemed quite obvious to me that self-play guided by an LLM-policy heuristic would work. 

Also relevent is Steven Byrnes' excelent Against evolution as an analogy for how humans will create AGI.

It has been over two years since the publication of that post, and criticism of this analogy has continued to intensify. The OP and other MIRI members have certainly been exposed to this criticism already by this point, and as far as I am aware, no principled defense has been made of the continued use of this example.

I encourage @So8res and others to either stop using this analogy, or to argue explicitly for its continued usage, engaging with the arguments presented by Byrnes, Pope, and others.

I think your argument is quite effective.

He may claim he is not willing to sell you this futures contract for $0.48 now. He expects to be willing to sell for that price in the future on average, but might refuse to do so now.

But then, why? Why would you not sell something for $0.49 now if you think, on average, it'll be worth less than that (to you) right after?

I see no contradictions with a superintelligent being mostly motivated to optimize virtual worlds, and it seems an interesting hypothesis of yours that this may be a common attractor. I expect this to be more likely if these simulations are rich enough to present a variety of problems, such that optimizing them continues to provide challenges and discoveries for a very long time.

Of course even a being that only cares about this simulated world may still take actions in the real-world (e.g. to obtain more compute power), so this "wire-heading" may not prevent successful power-seeking behavior.

Thank you very much for linking these two posts, which I hadn't read before. I'll start using the direct vs amortized optimization terminology as I think it makes things more clear.

The intuition that reward models and planners have an adversarial relationship seems crucial, and it doesn't seem as widespread as I'd like.

On a meta-level your appreciation comment will motivate me to write more, despite the ideas themselves being often half-baked in my mind, and the expositions not always clear and eloquent.

I feel quite strongly that the powerful minds we create will have curiosity drives, at least by default, unless we make quite a big effort to create one without them for alignment reasons.

The reason is that yes — if you’re superintelligent you can plan your way into curiosity behaviors instrumentally, but how do you get there?

Curiosity drives are a very effective way to “augment” your reward signals, allowing you to improve your models and your abilities by free self-play.

Load More