Sequences

Linguistic Freedom: Map and Territory Revisted
INVESTIGATIONS INTO INFINITY

Comments

You mention that society may do too little of the safer types of RL. Can you clarify what you mean by this?

This fails to account for one very important psychological fact: the population of startup founders who get a company off the ground is very heavily biased toward people who strongly believe in their ability to succeed. So it'll take quite a while for "it'll be hard to make money" to flow through and slow down training. And, in the mean time, it'll be acceleratory from pushing companies to stay ahead.

I've heard people suggest that they have arguments related to RL being particularly dangerous, although I have to admit that I'm struggling to find these arguments at the moment. I don't know, perhaps that helps clarify why I've framed the question the way that I've framed it?

I think it's still valid to ask in the abstract whether RL is a particularly dangerous approach to training an AI system.

Oh, this is a fascinating perspective.

So most uses of RL already just use a small-bit of RL.

So if the goal was "only use a little bit of RL", that's already happening.

Hmm... I still wonder if using even less RL would be safer still.

  1. "LLMs are self limiting": I strongly disagree with LLM's being limited point. If you follow ML discussion online, you'll see that people are constantly finding new ways to draw extra performance out of these models and that it's happening so fast it's almost impossible to keep up. Many of these will only provide small boosts or be exclusive with other techniques, but at least some of these will be scalable.
  2. "LLMs are decent at human values": I agree on your second point. We used to be worried that we'd tell an AI to get coffee and that it would push a kid out of the way. That doesn't seem to be very likely to be an issue these days.
  3. "Playing human roles is pretty human": This is a reasonable point. It seems easier to get an AI that is role-playing a human to actually act human than an AI that is completely alien.

Under the current version of the interactive model, its median prediction is just two decades earlier than that from Cotra’s forecast


Just?

There's a lot of overlap between alignment researchers and the EA community, so I'm wondering how that was handled.

It feels like it would be hard to find a good way of handling it: if you include everyone who indicated an affiliation with EA on the alignment survey it'd tilt the survey towards alignment people, in contrast if you exclude them then it seems likely it'd tilt the survey away from alignment people since people will be unlikely to fill in both surveys.

Regarding the support for various cause areas, I'm pretty sure that you'll find the support for AI Safety/Long-Termism/X-risk is higher among those most involved in EA than among those least involved. Part of this may be because of the number of jobs available in this cause area.

In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

If I'm being honest, I don't find this framing helpful.

If you believe that things will go well if certain actors gain access to advanced AI technologies first, you should directly argue that.

Focusing on status games feels like a red herring.

This comes out to ~600 pages of text per submission, which is extremely far beyond anything that current technology could leverage. Current NLP systems are unable to reason about more than 2048 tokens at a time, and handle longer inputs by splitting them up. Even if we assume that great strides are made in long-range attention over the next year or two, it does not seem plausible to me to anticipate SOTA systems in the near future to be able to use this dataset to its fullest.

 

It's interesting to come across this comment in 2024 given how much things have changed already.

Load More