Siebe

Former community director EA Netherlands. Now disabled by long covid, ME/CFS. Worried about AGI & US democracy 

Wikitag Contributions

Comments

Sorted by
Siebe165

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives

Siebe70

This makes me wonder if it's possible that "evil personas" can be entirely eliminated from distilled models, by including positive/aligned intent labels/traces throughout the whole distillation dataset

Siebe11

Seems to me the name AI safety is currently still widely used, no? As it covers much more than just alignment strategies, by including also stuff like control and governance

Siebe1-4

The AI Doomers are only one of several factions that oppose AI and seek to cripple it via weaponized regulation.

Bad faith

There are also factions concerned about “misinformation” and “algorithmic bias,” which in practice means they think chatbots must be censored to prevent them from saying anything politically inconvenient.

Bad faith

AI Doomer coalition abandoned the name “AI safety” and rebranded itself to “AI alignment.”

Seems wrong

Siebe80

What about whistle-blowing and anonymous leaking? Seems like it would go well together with concrete evidence of risk.

Siebe54

This is very interesting, and I had a recent thought that's very similar:

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that "that's just what ASIs do". It might prevent it from properly modelling other agents that aren't trained on this, but it's not obvious to me that that's going to happen or that it's such a decisively bad thing to outweigh the positives

I imagine that the ratio of descriptions of desirable vs. descriptions of undesirable behavior would matter, and perhaps an ideal approach would both (massively) increase the amount of descriptions of desirable behavior as well as filter out the descriptions of unwanted behavior?

Siebe30

I think it might make sense to do it as a research project first? Though you would need to be able to train a model from scratch

Siebe4224

I think you should publicly commit to:

  • full transparency about any funding from for profit organisations, including nonprofit organizations affiliated with for profit
  • no access to the benchmarks to any company
  • no NDAs around this stuff

If you currently have any of these with the computer use benchmark in development, you should seriously try to get out of those contractual obligations if there are any.

Ideally, you commit to these in a legally binding way, which would make it non-negotiable in any negotiation, and make you more credible to outsiders.

Siebe1412

I don't think that all media produced by AI risk concerned people needs to mention that AI risk is a big deal - that just seems annoying and preachy. I see Epoch's impact story as informing people of where AI is likely to go and what's likely to happen, and this works fine even if they don't explicitly discuss AI risk

I don't think that every podcast episode should mention AI risk, but it would be pretty weird in my eyes to never mention it. Listeners would understandably infer that "these well-informed people apparently don't really worry much, maybe I shouldn't worry much either". I think rationalists easily underestimate how much other people's beliefs depend on what the people around them & their authority figures believe.

I think they have a strong platform to discuss risks occasionally. It also simply feels part of "where AI is likely to go and what's likely to happen".

Load More