All of LTM's Comments + Replies

LTM30

I agree that in a takeover scenario where AI capabilities rush wildly ahead of human understanding or control, the ability of the world's second species to retain exclusive resource access will be limited. This is a plausible future, but it is not the only one. A lot of effort is, and very likely will continue to be, directed towards controlling frontier AI and making it as economically beneficial for its owners as possible.

If this work bears fruit, a world where AI is made by people with capital for people with capital seems very likely. 

The French n... (read more)

LTM10

I think this post is interesting, although I don't particularly agree with the conclusions. I think it is helpful to think about the formation of your mind and goals - a tradition which I know goes back to Rousseau and most likely goes back further (I am not very knowledgeable on the topic).

I think a lot of the difficulty goes back to the distinction between 'real'/'intrinsic' and goals and those people proport to believe in. Looking at your example of a Christian sexual prude, nothing about their behaviour implies to me that these virtues of chastity and ... (read more)

LTM72

I'm really excited about this, but not because of the distinction drawn between the shoggoth and the face. Applying a paraphraser such that the model's internal states are repeatedly swapped for states which we view as largely equivalent could be a large step towards interpretability.

This reminds me on the concept that CNNs work as they are equivariant under translation. The models can also be made (approximately) rotationally equivariant by applying all possible rotations for a given resolution to the training data. In doing this, we create a model which ... (read more)

3Daniel Kokotajlo
I tentatively agree that the paraphraser idea is more important than shoggoth/face
LTM50

I agree with your points about avoiding political polarisation and allowing people with different ideological positions to collaborate on alignment. I'm not sure about the idea that aligning to a single group's values (or to a coherent ideology) is technically easier than a more vague 'align to humanity's values' goal.

Groups rarely have clearly articulated ideologies - more like vibes which everyone normally gets behind. An alignment approach from clearly spelling out what you consider valuable doesn't seem likely to work. Looking to existing models which ... (read more)

5Seth Herd
I agree with everything you've said. The advantages are primarily from not aligning to values but only to following instructions rather than using RL or any other process to infer underlying values. Instruction-following AGI is easier and more likely than value aligned AGI. I think creating real AGI based on an LLM aligned to be helpful, harmless and honest would probably be the end of us, as carrying the set of value implied by RLHF to their logical conclusions outside of human control would probably be pretty different from our desired values. Instruction-following provides corrigibililty. Edit: by "'small group" I meant something like five people who are authorized to give insntructions to an AGI.
LTM50

One method of keeping humans in key industrial processes might be expanding credentialism. Individuals remaining control even when the majority of the thinking isn't done by them has always been a key part of any hierarchical organisation.

Legally speaking, certain key tasks can only be performed by qualified accountants, auditors, lawyers, doctors, elected officials and so on. 

It would not be good for short term economic growth. However, legally requiring that certain tasks be performed by people with credentials machines are not eligible for might be a good (though absolutely not perfect) way of keeping humans in the loop. 

LTM10

Broadly agree, in that most safety research expands control over systems and our understanding of them, which can be abused by a bad actor.

This problem is encountered by for-profit companies, where profit is on the lines instead of catastrophe. They too have R&D departments and research directions which have the potential for misuse. However, this research is done inside a social environment (the company) where it is only explicitly used to make money.

To give a more concrete example, improving self-driving capabilities also allows the companies making ... (read more)

LTM20

Really fascinating stuff! I have a (possibly answered) question about how using expert updates on other expert prediction might be valuable.

You discuss the negative impacts of allowing experts to aggregate themselves, or viewing one another's forecasts before initially submitting their own. Might there be value in allowing experts to submit multiple times, each time seeing the submitted predictions of a previous round? The final aggregation scheme would be able to not only assign a credence to each expert, but also gain a proxy for what credence the expert... (read more)

4Eric Neyman
Yeah, there's definitely value in experts being allowed to submit multiple times, allowing them to update on other experts' submissions. This is basically the frame taken in Chapter 8, where Alice and Bob update their estimate based on the other's estimate at each step. This is generally the way prediction markets work, and I think it's an understudied perspective (perhaps because it's more difficult to reason about than if you assume that each expert's estimate is static, i.e. does not depend on other experts' estimates).