Dave Orr

DeepMind Gemini Safety lead; Foundation board member

Wikitag Contributions

Comments

Sorted by

Humans have always been misaligned. Things now are probably significantly better in terms of human alignment than almost any time in history (citation needed) due to high levels of education and broad agreement about many things that we take for granted (e.g. the limits of free trade are debated but there has never been so much free trade). So you would need to think that something important was different now for there to be some kind of new existential risk.

One candidate is that as tech advances, the amount of damage a small misaligned group could do is growing. The obvious example is bioweapons -- the number of people who could create a lethal engineered global pandemic is steadily going up, and at some point some of them may be evil enough to actually try to do it.

This is one of the arguments in favor of the AGI project. Whether you think it's a good idea probably depends on your credences around human-caused xrisks versus AGI xrisk.

One tip for research of this kind is to not only measure recall, but also precision. It's easy to block 100% of dangerous prompts by blocking 100% of prompts, but obviously that doesn't work in practice. The actual task that labs are trying to solve is to block as many unsafe prompts as possible while rarely blocking safe prompts, or in other words, looking at both precision and recall.

Of course with truly dangerous models and prompts, you do want ~100% recall, and in that situation it's fair to say that nobody should ever be able to build a bioweapon. But in the world we currently live in, the amount of uplift you get from a frontier model and a prompt in your dataset isn't very much, so it's reasonable to trade off against losses from over refusal.

Gemini V2 (1206 experimental which is the larger model) one boxes, so.... progress?

I'm probably too conflicted to give you advice here (I work on safety at Google DeepMind), but you might want to think through, at a gears level, what could concretely happen with your work that would lead to bad outcomes. Then you can balance that against positives (getting paid, becoming more familiar with model outputs, whatever).

You might also think about how your work compares to whoever would replace you on average, and what implications that might have as well.

This is great data! I'd been wondering about this myself. 

 

Where were you measuring air quality? How far from the stove? Same place every time?

I haven't heard the p zombie argument before, but I agree that is at least some Bayesian evidence that we're not in a sim. 

 

  1. We don't know if simulated people will be p zombies
  2. I am not a p zombie [citation needed]
  3. It would be very surprising if sims were not p zombies but everyone in the physical universe is
  4. Therefore the likelihood ratio of being conscious is higher for the real universe than a simulation

Probably 3 needs to be developed further, but this is the first new piece of evidence I've seen since I first encountered the simulation argument in like 2005.

Are we playing the question game because the thread was started by Rosencranz? Is China doing well in the EV space a bad thing?

Is it the case that the tech would exist without him? I think that's pretty unclear, especially for SpaceX, where despite other startups in the space, nobody else managed to radically reduce the cost per launch in a way that transformed the industry.

Even for Tesla, which seems more pedestrian (heh) now, there were a number of years where they had the only viable car in the market. It was only once they proved it was feasible that everyone else piled in.

Load More