boazbarak

Wikitag Contributions

Comments

Sorted by

Yes there is a general question I want to talk about which is the gap between training, evaluation, and deployment, and the reasons why models might be:


1. Able to tell in which of these environments they are in

2. Act differently based on that

Thank you Ben. I don’t think name calling and comparisons are helpful to a constructive debate, which I am happy to have. Happy 4th!

I agree with you on the categorization of 1 and 2. I think there is a reason why Godwin’s law was created once thread follow the controversy attractor to this direction they tend to be unproductive.

I edited the original post to make the same point with less sarcasm.

I take risk from AI very seriously which is precisely why I am working in alignment at OpenAI. I am also open to talking with people having different opinions, which is why I try to follow this forum (and also preordered the book). But I do draw the line at people making Nazi comparisons.

FWIW I think radicals often hurt the causes they espouse, whether it is animal rights, climate change, or Palestine. Even if after decades the radicals are perceived to have been on “the right side of history”, their impact was often negative and it caused that to have taken longer: David Shor was famously cancelled for making this point in the context of the civil rights movement.

I am one of those people that are supposed to be stigmatized/detterred by this action. I doubt this tactic will be effective. This thread (including the disgusting comparison to Eichmann who directed the killing of millions in the real world - not in some hypothetical future one) does not motivate me to interact with the people holding such positions. Given that much of my extended family was wiped out by the holocaust, I find these Nazi comparisons abhorrent, and would not look forward to interact with people making them whether or not they decide to boycott me.

BTW this is not some original tactic, PETA is using similar approaches for veganism. I don’t think they are very effective either.

To @So8res - I am surprised and disappointed that this Godwin’s law thread survived a moderation policy that is described as “Reign of Terror”

“Healthcare” is pretty broad - certainly some parts of it are safety critical and some are less. I am not familiar with all the applications of language models for healthcare but If you are using LLM for improving efficiency in healthcare documentation then I would not call it safety critical. If you are connecting an LLM to a robot performing surgery then I would call it safety critical.

 It’s also a question of whether AIs outputs are used without supervision. If doctors or patients ask a charbot questions, I would not call it safety critical since the AI is not autonomously making the decisions. 

I think "AI R&D" or "datacenter security" are a little too broad.

I can imagine cases where we could deploy even existing models as an extra layer for datacenter security (e.g. anomaly detection). As long as this is for adding security (not replacing humans), and we are not relying on 100% success of this model, then this can be a positive application, and certainly not one that should be "paused."

With AI R&D again the question is how you deploy it, if you are using a model in containers supervised by human employees then that's fine. If you are letting them autonomously carry out large scale training runs with little to no supervision that is a completely different matter.

At the moment, I think the right mental model is to think of current AI models as analogous to employees that have a certain skill profile (which we can measure via evals etc..) and also with some small probability could do something completely crazy. With appropriate supervision, such employees could also be useful, but you would not fully trust them with sensitive infrastructure.

As I wrote in my essay, I think the difficult point would be if we get  to the "alignment uncanny valley" - alignment is at sufficiently good level (e.g., probability of failure be small enough) so that people are actually tempted to entrust models with such sensitive tasks, but we don't have strong control of this probability to ensure we can drive it arbitrarily close to zero, and so there are risks of edge cases.

I am much more optimistic in getting AIs to reliably follow instructions (see https://www.lesswrong.com/posts/faAX5Buxc7cdjkXQG/machines-of-faithful-obedience )

But agree that we should not deploy systems (whether AI or not) in safety critical domains without extensive testing.


I don’t think that’s a very controversial opinion. In fact I’m not sure “pause” is the right term since I don’t think such deployment has started.

I think AI assistants would have common sense even if they are obedient. I doubt an AI assistants would interpret “go fetch me coffee” as “kill me first so I can’t interrupt your task and then fetch me coffee” but YMMV

Load More