Davide_Zagami — LessWrong

LESSWRONG
LW

Vulnerabilities in CDT and TI-unaware agents

The aim of this post is illustrating the need to take into account decision-making and incentive considerations when designing agents. This post is also a proof that these considerations are important in order to ensure the safety of agents. Also, we will postulate that there exist some agents that are both robust to changing or having their reward function changed, although that will need a careful approach to incentive design and decision theory choosing.

The first agent we will consider is a (current Reward Function, Time Inconsistent aware, see in the second half of the post if you don't know what this means) agent that uses Causal Decision Theory (CDT). A review of... (read 956 more words →)

Replying toAI Safety Prerequisites Course: Revamp and New Lessons

Davide_Zagami7y

AI Safety Prerequisites Course: Revamp and New Lessons

Registration and access to the lessons is completely free. Where do you see a paywall?

Replying toAI Safety Prerequisites Course: Basic abstract representations of computation

Davide_Zagami7y*

AI Safety Prerequisites Course: Basic abstract representations of computation

Hi, full time content developer at RAISE here.

The overview page you are referring to (is it this one?) contains just some examples of subjects that we are working on.

1. One of the main goals is making a complete map of what is out there regarding AI Safety, and then recursively create explanations for the concepts it contains. That could fit multiple audiences depending on how deep we are able to go. We have started doing that with IRL and IDA. We are also trying a bottom-up approach with the prerequisite course because why not.

2. Almost the same as reading papers, with clear pointers to references to quickly integrate any missing knowledge. Whether... (read more)

Am I understanding the problem of fully updated deference correctly?

Davide_Zagami

I understand that one solution to AI alignment would be to build an agent with uncertainty about its utility function, so that by observing the environment and in particular us, it can learn our true utility function and optimize for that. And according to the problem of fully updated deference, trying to accomplish this would not significantly simplify our work because it involves two steps:

1) Learning our true utility function $V$ (easy step)

this "merely" consists of knowing more about the world (if a perfect description $W$ of the universe, which contains our true utility $V$ somewhere in it, could be fed into the agent, then this step would be complete)
even a permanently

... (read 192 more words →)

Replying toDuplication versus probability

Davide_Zagami8y

Duplication versus probability

After reading this I feel that how one should deal with anthropics strictly depends on goals. I'm not sure exactly which cognitive algorithm does the correct thing in general, but it seems that sometimes it reduces to "standard" probabilities and sometimes not. May I ask what does UDT say about all of this exactly?

Suppose you're rushing an urgent message back to the general of your army, and you fall into a deep hole. Down here, conveniently, there's a lever that can create a duplicate of you outside the hole. You can also break open the lever and use the wiring as ropes to climb to the top. You

Davide_Zagami8y

Against accusing people of motte and bailey

But suppose that we were discussing something of which there were both sensible and crazy interpretations - held by different people. So:

group A consistently makes and defends sensible claim A1

group B consistently makes and defends crazy claim B1

and maybe even:

group C consistently makes crazy claim B1, but when challenged on it, consistently retreats to defending A1

I may be missing something but it seems to me that:

if C is accused of motte-and-bailey fallacy there is no problem;
if B is accused of motte-and-bailey fallacy there is a problem because they never defended claim A1;
if A is accused of motte-and-bailey fallacy there is a problem because they never defended claim B1.

I hope I'm not being silly: would it be fair to say that you are pointing to the existence of the "accuse people who are not making a motte-and-bailey fallacy of making a motte-and-bailey fallacy" fallacy? Could we call it "straw-motte-and-bailey fallacy" or something?

I have only read a small fraction of Yudkowsky's sequences (I printed the 1800 pages two days ago and have only read about 50), so maybe I think I am discussing interesting stuff where in reality EY has already discussed it in length.

Mostly this. Other things too, but all mostly are caused by this one. I am one of the few who commented in one of your posts with links to some of his writings exactly for this reason. While I'm guilty of not having given you any elaborate feedback and of downvoting that post, I still think you need to catch up with the basics. It's praiseworthy that you want to engage in rationality and in new ideas, but by doing it without becoming familiar with the canon first, you are not just (1) probably going to say something silly (because rationality is harder than you think), (2) probably going to say something old (because a lot has been written), but also (3) wasting your own time.

Fake Selfishness and Fake Morality

Replying to"Just Suffer Until It Passes"

Davide_Zagami8y

"Just Suffer Until It Passes"

Ah! I independently invented this strategy some months ago and amazingly it doesn't work for me simply because I'm somehow capable of remaining in the "do nothing" state for literally days. However I thought it was a brilliant idea when I came up with it and I still think it is, I would be surprised if it doesn't work for a lot of people.

Replying toBabble

Davide_Zagami8y

Babble

This post made a lot of things click for me. Also it made me realize I am one of those with an "overdeveloped" Prune filter compared to the Babble filter. How could I not notice this? I knew something was wrong all along, but I couldn't pin down what, because I wasn't Babbling enough. I've gotta Babble more. Noted.

Replying to"Slow is smooth, and smooth is fast"

Davide_Zagami8y

"Slow is smooth, and smooth is fast"

Extremely important post in my opinion. The central idea seems true to me. I would like to see if someone has (even anecdotal) evidence for the opposite.

Replying toThe Mad Scientist Decision Problem

Davide_Zagami8y

The Mad Scientist Decision Problem

Probably you should have simply said something similar to "increasing portions of physical space have diminishing marginal returns to humans".