Florian Dietz — LessWrong

LESSWRONG
LW

Replying toHow to find AI alignment researchers to collaborate with?

How to find AI alignment researchers to collaborate with?

It looks like their coursework has already started, but I have contacted the organizer. Thanks!

How to find AI alignment researchers to collaborate with?

I am an AI researcher. I have a personal interest in AI alignment, because obviously. However I am not getting paid to do research on AI alignment, so I have to do it as a hobby.

Writing papers and submitting them to conferences is a full-time job and highly inefficient. I can not do this on the side.

Is there some way to effectively collaborate with AI alignment researchers? Some way to get in contact with one who has capacity / willingness to talk?

I have novel ideas that I have never read about in any AI alignment papers I read. It seems perverse that I am unable to get proper feedback on them because of the inefficiency of academia.

Replying toAchieving AI alignment through deliberate uncertainty in multiagent systems

Florian Dietz6y

Achieving AI alignment through deliberate uncertainty in multiagent systems

At some point, something the programmers typed has to have a causal consequence of making the AI look at programmers and ethics discussions not cat memes.

No. Or at least not directly. That's what reinforcement learning is for. I maintain that the AI should be smart enough to figure out on its own that cat memes have less relevance than ethics discussions.

Replying toAchieving AI alignment through deliberate uncertainty in multiagent systems

Florian Dietz6y

Achieving AI alignment through deliberate uncertainty in multiagent systems

I think we have some underlying disagreements about the nature of the AI we are talking about.

I assume that the AI will not necessarily be based on a sound mathematical system. I expect that the first workable AI systems will be hacked-together systems of heuristics, just like humans are. They can instrumentally use math to formalize problems, just like we can, but I don't think that they will fundamentally be based on math, or use complex formulas like Bayes without conscious prompting.

I assume that the AI breaking out of the box in my example will already be smart enough to e.g. realize on its own that ethics discussions are more relevant for cheat-identification than cat memes. An AI that is not smart enough to realize this wouldn't be smart enough to pose a threat, either.

Replying toMessage to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]

Florian Dietz6y

Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]

But why? What benefit would there be to the simulator in providing hints? The whole purpose of the simulation is to ensure that the AI acts correctly when it is uncertain whether or not it is in a simulation. Any information you provide that indicates whether or not you are in a simulation therefore runs counter to the goal of the experiment. The AI could mistakenly learn to act ethical only if there are hints that it is in a simulation. This would fail disastrously in the scenario where it isn't in a situation, which is the most important scenario of all.

For best results, the simulation should be completely indistinguishable from reality.

I actually find this counterintuitive as well. My instinct says that there should be hints. Some bored programmer should add an easter egg. It's what I would do. But when I think about the incentives, I don't think there is an actual reason why a rational simulation overseer would want to add any hints.

Replying toAchieving AI alignment through deliberate uncertainty in multiagent systems

Florian Dietz6y

Achieving AI alignment through deliberate uncertainty in multiagent systems

The difference between videogames and reality is that in some strange sense, reality is less complicated.

You make an excellent point about the complexity of videogames vs reality. It looks like that's the sort of situation where it could be helpful to let the supervisor AI trick the supervised AI into believing that videogames are simpler. Humans believed for the longest time that a bearded man in the sky was a simpler explanation than the natural laws, so hopefully the simulated AI will fall victim to the same biases, especially if the simulator is actively trying to trick it.

Reality takes a huge amount of compute to simulate. If you had that much compute,

... (read 1742 more words →)

Replying toMessage to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]

Florian Dietz6y

Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]

If the AI is in a simulation to test benevolence, why should there be any indicators in the simulation that it is a simulation? That would be counterproductive for the test.

Replying toAchieving AI alignment through deliberate uncertainty in multiagent systems

Florian Dietz6y

Achieving AI alignment through deliberate uncertainty in multiagent systems

This is a very interesting read. I had a similar idea in the past, but not nearly in that level of detail. I'm glad I read this.

Achieving AI alignment through deliberate uncertainty in multiagent systems

Florian Dietz

Epistemic status: A crazy idea I had that probably won't work. But: It's a very unusual and creative approach to AI alignment, and I suspect this will inspire new ideas in other researchers.

I outline a general approach to achieve this goal that counterintuitively relies on confusing the AI on purpose.

Basic observations

This approach relies on a number of basic observations about the nature of Artificial Intelligence.

An AI is different from a human in multiple ways. This is part of what makes AI alignment such a difficult problem, because our intuitions for how people act often do not apply to AI's. However, several of these differences between AI and humans actually work in our... (read 1900 more words →)

Replying toPointing to a Flower

Florian Dietz6y

Pointing to a Flower

I don't think this problem has an objectively correct answer.

It depends on the reason because of which we keep track of the flower.

There are edge cases that haven't been listed yet where even our human intuition breaks down:

What if we teleport the flower Star-Trek style? Is the teleported flower the original flower, or 'just' an identical copy?

The question is also related to the Ship of Theseus.

If we can't even solve the problem in real-life because of such edge cases, then it would be dangerous to attempt to code this directly into a program.

Instead, I would write the program to understand this: Pragmatically, a lot of tasks get easier if you assume that abstract objects / patterns in the universe can be treated as discrete objects. But that isn't actually objectively correct. In edge cases, the program should recognize that it has encountered an edge case, and the correct response is neither Yes or No, but N/A.