Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
roha62

Further context about the "recent advancements in the AI sector have resolved this issue" paragraph:

roha10

I assume they can't make a statement and that their choice of next occupation will be the clearest signal they can and will send out to the public.

roha169

He has a stance towards risk that is a necessary condition for becoming the CEO of a company like OpenAI, but doesn't give you a high probability of building a safe ASI:

roha174

If everyone has his own asteroid impact, earth will not be displaced because the impulse vectors will cancel each other out on average*. This is important because it will keep the trajectory equilibrium of earth, which we know since ages from animals jumping up and down all the time around the globe in their games of survival. If only a few central players get asteroid impacts it's actually less safe! Safety advocates might actually cause the very outcomes that they fear!

*I've a degree in quantum physics and can derive everything from my model of the universe. This includes moral and political imperatives that physics dictate and thus most physicists advocate for.

roha140

We are decades if not centuries away from developing true asteroid impacts.

roha120

Given all the potential benefits there is no way we are not going to redirect asteroids to earth. Everybody will have an abundance of rare elements.

xlr8

roha172

Some context from Paul Christiano's work on RLHF and a later reflection on it:

Christiano et al.: Deep Reinforcement Learning from Human Preferences

In traditional reinforcement learning, the environment would also supply a reward [...] and the
agent’s goal would be to maximize the discounted sum of rewards. Instead of assuming that the
environment produces a reward signal, we assume that there is a human overseer who can express preferences between trajectory segments. [...] Informally, the goal of the agent is to produce trajectories which are preferred by the human, while making as few queries as possible to the human. [...] After using  to compute rewards, we are left with a traditional reinforcement learning problem

Christiano: Thoughts on the impact of RLHF research

The simplest plausible strategies for alignment involve humans (maybe with the assistance of AI systems) evaluating a model’s actions based on how much we expect to like their consequences, and then training the models to produce highly-evaluated actions. [...] Simple versions of this approach are expected to run into difficulties, and potentially to be totally unworkable, because:

  • Evaluating consequences is hard.
  • A treacherous turn can cause trouble too quickly to detect or correct even if you are able to do so, and it’s challenging to evaluate treacherous turn probability at training time.

[...] I don’t think that improving or studying RLHF is automatically “alignment” or necessarily net positive.

Edit: Another relevant section in an interview of Paul Christiano by Dwarkesh Patel:

Paul Christiano - Preventing an AI Takeover

roha12

Replacing must by may is a potential solution to the issues discussed here. I think analogies are misleading when they are used as a means for proof, i.e. convincing yourself or others of the truth of some proposition, but they can be extremely useful when they are used as a means for exploration, i.e. discovering new propositions worth of investigation. Taken seriously, this means that if you find something of interest with an analogy, it should not mark the end of a thought process or conversation, but the beginning of a validation process: Is there just a superficial or actually some deep connection between the compared phenomena? Does it point to a useful model or abstraction?

Example: I think the analogy that trying to align an AI is like trying to steer a rocket towards any target at all shouldn't be used to convince people that without proper alignment methods mankind is screwed. Who knows if directing a physical object in a geometrical space has much to do with directing a cognitive process in some unknown combinatorial space? Alternatively, the analogy could instead be used as a pointer towards a general class of control problems that come with specific assumptions, which may or may not hold for future AI systems. If we think that the assumptions hold, we may be able to learn a lot from existing instances of control problems like rockets and acrobots about future instances like advanced AIs. If we think that the assumptions don't hold, we may learn something by identifying the least plausible assumption and trying to formulate an alternative abstraction that doesn't depend on it, opening another path towards collecting empirical data points of existing instances.

roha30

For collaboration on job-like tasks that assumption might hold. For companionship and playful interactions I think the visual domain, possibly in VR/AR, will be found to be relevant and kept. Given our psychological priors, I also think for many people it may feel like a qualitative change in what kind of entity we are interacting with - from lifeless machine, over uncanny human imitation, to believable personality on another substrate.

Load More