Would be very curious to know why people are downvoting this post.
Is it:
a) Too obvious
b) Too pretentious
c) Poorly written
d) Unsophisticated analysis
e) Promoting dishonesty
Or maybe something else.
You say counterfactuals in CLDT should correspond to consistent universes
That's not quite what I wrote in this article:
However, this now seems insufficient as I haven't explained why we should maintain the consistency conditions over comparability after making the ontological shift. In the past, I might have said that these consistency conditions are what define the problem and that if we dropped them it would no longer be Newcomb's Problem... My current approach now tends to put more focus on the evolutionary process that created the intuitions and instincts underlying these incompatible demands as I believe that this will help us figure out the best way to stitch them together.
I'll respond to the other component of your question later.
Just thought I'd add a second follow-up comment.
You'd have a much better idea of what made FHI successful than I would. At the same time, I would bet that in order to make this new project successful - and be its own thing - it'd likely have to break at least one assumption behind what made old FHI work well.
Then much later, when we ran the AI Alignment Prize here on LW, I also noticed that the prize by itself wasn't too important; the interactions between newcomers and old-timers were a big part of what drove the thing.
Could you provide more detail?
Reading your list, a bunch of it seems to be about decisions about what to work on or what locally to pursue.
I think my list appears more this way then I intended because I gave some examples of projects I would be excited by if they happened. I wasn't intending to stake out a strong position as to whether these projects should projects chosen by the institute vs. some examples of projects that it might be reasonable for a researcher to choose within that particular area.
I'd love your feedback on my thoughts on decision theory.
If you're trying to get a sense of my approach in order to determine whether it's interesting enough to be worth your time, I'd suggest starting with this article (3 minute read).
I'm also considering applying for funding to create a conceptual alignment course.
I strongly agree with Owen's suggestions about figuring out a plan grounded in current circumstances, rather than reproducing what was.
Here's some potentially useful directions to explore.
Just to be clear, I'm not claiming that it should adopt all of these. Indeed, an attempt to adopt all of these would likely be incoherent and attempting to pursue too many different directions at the same time.
These are just possibilities, some subset of which is hopefully useful:
"The structure of synchronization is, in general, richer than the world model itself. In this sense, LLMs learn more than a world model" given that I expect this is the statement that will catch a lot of people's attention.
Just in case this claim caught anyone else's attention, what they mean by this is that it contains:
• A model of the world
• A model of the agent's process for updating its belief about which state the world is in
This strongly updates me towards expecting the institute to produce useful work.
Do you have any thoughts on whether it would make sense to push for a rule that forces open-source or open-weight models to be released behind an API for a certain amount of time before they can be released to the public?