LESSWRONG
LW

All of PabloAMC's Comments + Replies

The main problem with wireheading, manipulation... seems related to a confusion between the goal in the world and its representation inside the agent. Perhaps a way to deal with this problem is to use the fact that the agent may be aware of it being an embedded agent. That means that it could be aware of the goal representing an external fact of the world, and we could potentially penalize the divergence between the goal and its representation during training.

AGI safety field building projects I’d like to see

PabloAMC2y30

Something I believe could also be helpful is to have a non-archival peer review system that helps improve the quality of safety writings or publications; and optionally makes a readable blog post etc.

ChristianKl2y100

LessWrong/Alignment forum essentially has this for users with >100 karma. If you have a draft you can click on the feedback button and ask for this kind of feedback.

Looking for Spanish AI Alignment Researchers

PabloAMC2y60

Hey Antb, I'm Pablo, Spanish and based in Spain. As far as I know these are the following AI Safety researchers:

Jaime Sevilla leads the organization Epoch AI, mostly interested in forecasting AI.
Juan Rocamonde and Adrià Garriga work full time on AI alignment in FAR.ai and Redwood Research.
There is a professor, called Jose Hernández Orallo, working between Cambridge and Valencia who has been for quite a while in AI Safety. I know him and he is an excellent researcher. Part of the existential AI Safety FLI community, and mostly interested in measuring in

... (read more)

2Jsevillamol2y

If you want to join the Spanish-speaking EA community, you can do so through this link!

Evaluations project @ ARC is hiring a researcher and a webdev/engineer

PabloAMC2yΩ220

For the record, I think Jose Orallo (and his lab), in Valencia, Spain and CSER, Cambridge, is quite interested in this same exact topics (evaluation of AI models, specifically towards safety). Jose is a really good researcher, part of the FLI existential risk faculty community, and has previously organised AI Safety conferences. Perhaps it would be interesting for you to get to know each other.

How to write a LW sequence to learn a topic?

PabloAMC3y20

Ok, so perhaps: specific tips on how to become a distiller: https://www.lesswrong.com/posts/zo9zKcz47JxDErFzQ/call-for-distillers In particular:

How to plan what to write about?
Here to write about it (lesswrong or somewhere else too)?
How much time do you expect this to take? Thanks Steven!

Call For Distillers

PabloAMC3y40

Are there examples or best practices you would recommend for this?

An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC3y10

I think value learning might be causal because human preferences cannot be observed, and therefore can act as a confounder, similar to the work in

Zhang, J., Kumor, D., Bareinboim, E. Causal Imitation Learning with Unobserved Confounders. In Advances in Neural Information Processing Systems 2020.

At least that was one of my motivations.

I think predicting things you have no data on ("what if the AI does something we didn't foresee") is sort of an impossible problem via tools in "data science." You have no data!

Sure, I agree. I think I was quite inaccurat... (read more)

How To Get Into Independent Research On Alignment/Agency

PabloAMC3y80

While I enjoyed this post, I wanted to indicate a couple of reasons why you may want to instead stay in academia or industry, rather than being an independent researcher:

The first one is that it gives more financial stability.
The second is that academia or industry set the bar high. If you get to publish in a good conference and get substantial citations, you know that you are making progress.

Now, many will argue that Safety is still preparadigmatic and consequently there might be contributions that do not really fit well into standard academic journ... (read more)

An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC3y10

Hi Ilya! Thanks a lot for commenting :)

(a) I think "causal representation learning" is too vague, this overview (https://arxiv.org/pdf/2102.11107.pdf) talks about a lot of different problems I would consider fairly unrelated under this same heading.

Yes, you're right. I had found this, and other reviews by similar authors. In this one, I was mostly thinking of section VI (Learning causal variables) and its applications to RL (section VII-E). Perhaps section V on causal discovery is also relevant.

(b) I would try to read "classical causal inference" stu

... (read more)

4IlyaShpitser3y

Classical RL isn't causal, because there's no confounding (although I think it is very useful to think about classical RL causally, for doing inference more efficiently). Various extensions of classical RL are causal, of course. A lot of interesting algorithmic fairness isn't really causal. Classical prediction problems aren't causal. ---------------------------------------- However, I think domain adaptation, covariate shift, semi-supervised learning are all causal problems. --- I think predicting things you have no data on ("what if the AI does something we didn't foresee") is sort of an impossible problem via tools in "data science." You have no data!

An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC3y10

Hey Koen, Thanks a lot for the pointers! The literature I am most aware of are https://crl.causalai.net/, https://githubmemory.com/repo/zhijing-jin/Causality4NLP_Papers and Bernhard Scholkopf's webpage

Reinforcement Learning Study Group

PabloAMC3y40

Alternatively you may want to join here: https://join.slack.com/t/ai-alignment/shared_invite/zt-fkgwbd2b-kK50z~BbVclOZMM9UP44gw

2Kay Kozaronek3y

Thanks, Pablo. This invite worked. Good to know that there's already such a big community.

Forecasting Thread: AI Timelines

Answer by PabloAMCSep 04, 202060

My prediction. Some comments

Your percentiles:
- 5th: 2023-05-16
- 25th: 2033-03-31
- 50th: 2046-04-13
- 75th: 2075-11-27
- 95th: above 2100-01-01
You think 2034-03-27 is the most likely point
- You think it's 6.9X as likely as 2099-05-29