x

LESSWRONG
LW

sumanai — LessWrong

sumanai

sumanai

Message

1

2y

sumanai

sumanai has not written any posts yet.

Message

1 comment

Member for 2 years

Replying toLoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B

LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B

This kind of fine tuning is not a serious danger to humanity. First, it doesn't do anything beyond what the usual "Sure!" at the beginning of a response promt does (I don't think it's harmful to reveal this technique because it's obvious and known to many). Second, the most current models can do is to give away what already exists on the internet. Any recipes for drugs, explosives, and other harmful things are much easier to find in a search than to get them from Transformer models. Third, current and future models on the Transformer architecture do not pose an existential threat to humanity without serious modifications that are currently not available... (read more)

1

-1