All of wgryc's Comments + Replies

wgryc00

LLM-bots are inherently easy to align.

 

I don't know about this... You're using an extremely sandboxed LLM that has been trained aggressively to prevent itself from saying anything controversial. There's nothing preventing someone from finetuning a model to remove some of these ethical considerations, especially as GPU compute becomes more powerful and model weights are leaked (e.g. Llama).

In fact, the amount of money and effort that has gone into aligning LLM bots shows that in fact, they are not easy to align and require significant resources to do so.

9Seth Herd
Existing bots do benefit greatly from the RLHF alignment efforts. What I primarily mean is that you can and should include alignment goals in the bot's top-level goals. You can tell them to make me a bunch of money but also check with you before doing anything with any chance of harming people, leaving your control, etc. GPT4 does really well at interpreting these and balancing multiple goals. This doesn't address outer alignment or alignment stability, but it's a heck of a start. I just finished my post elaborating on this point: capabilities and alignment of LLM cognitive architectures