I think this post is interesting, although I don't particularly agree with the conclusions. I think it is helpful to think about the formation of your mind and goals - a tradition which I know goes back to Rousseau and most likely goes back further (I am not very knowledgeable on the topic).
I think a lot of the difficulty goes back to the distinction between 'real'/'intrinsic' and goals and those people proport to believe in. Looking at your example of a Christian sexual prude, nothing about their behaviour implies to me that these virtues of chastity and ...
I'm really excited about this, but not because of the distinction drawn between the shoggoth and the face. Applying a paraphraser such that the model's internal states are repeatedly swapped for states which we view as largely equivalent could be a large step towards interpretability.
This reminds me on the concept that CNNs work as they are equivariant under translation. The models can also be made (approximately) rotationally equivariant by applying all possible rotations for a given resolution to the training data. In doing this, we create a model which ...
I agree with your points about avoiding political polarisation and allowing people with different ideological positions to collaborate on alignment. I'm not sure about the idea that aligning to a single group's values (or to a coherent ideology) is technically easier than a more vague 'align to humanity's values' goal.
Groups rarely have clearly articulated ideologies - more like vibes which everyone normally gets behind. An alignment approach from clearly spelling out what you consider valuable doesn't seem likely to work. Looking to existing models which ...
One method of keeping humans in key industrial processes might be expanding credentialism. Individuals remaining control even when the majority of the thinking isn't done by them has always been a key part of any hierarchical organisation.
Legally speaking, certain key tasks can only be performed by qualified accountants, auditors, lawyers, doctors, elected officials and so on.
It would not be good for short term economic growth. However, legally requiring that certain tasks be performed by people with credentials machines are not eligible for might be a good (though absolutely not perfect) way of keeping humans in the loop.
Broadly agree, in that most safety research expands control over systems and our understanding of them, which can be abused by a bad actor.
This problem is encountered by for-profit companies, where profit is on the lines instead of catastrophe. They too have R&D departments and research directions which have the potential for misuse. However, this research is done inside a social environment (the company) where it is only explicitly used to make money.
To give a more concrete example, improving self-driving capabilities also allows the companies making ...
Really fascinating stuff! I have a (possibly answered) question about how using expert updates on other expert prediction might be valuable.
You discuss the negative impacts of allowing experts to aggregate themselves, or viewing one another's forecasts before initially submitting their own. Might there be value in allowing experts to submit multiple times, each time seeing the submitted predictions of a previous round? The final aggregation scheme would be able to not only assign a credence to each expert, but also gain a proxy for what credence the expert...
I agree that in a takeover scenario where AI capabilities rush wildly ahead of human understanding or control, the ability of the world's second species to retain exclusive resource access will be limited. This is a plausible future, but it is not the only one. A lot of effort is, and very likely will continue to be, directed towards controlling frontier AI and making it as economically beneficial for its owners as possible.
If this work bears fruit, a world where AI is made by people with capital for people with capital seems very likely.
The French n... (read more)