All of PabloAMC's Comments + Replies

The main problem with wireheading, manipulation... seems related to a confusion between the goal in the world and its representation inside the agent. Perhaps a way to deal with this problem is to use the fact that the agent may be aware of it being an embedded agent. That means that it could be aware of the goal representing an external fact of the world, and we could potentially penalize the divergence between the goal and its representation during training.

Something I believe could also be helpful is to have a non-archival peer review system that helps improve the quality of safety writings or publications; and optionally makes a readable blog post etc.

LessWrong/Alignment forum essentially has this for users with >100 karma. If you have a draft you can click on the feedback button and ask for this kind of feedback.

Hey Antb, I'm Pablo, Spanish and based in Spain. As far as I know these are the following AI Safety researchers:

  • Jaime Sevilla leads the organization Epoch AI, mostly interested in forecasting AI.
  • Juan Rocamonde and Adrià Garriga work full time on AI alignment in FAR.ai and Redwood Research.
  • There is a professor, called Jose Hernández Orallo, working between Cambridge and Valencia who has been for quite a while in AI Safety. I know him and he is an excellent researcher. Part of the existential AI Safety FLI community, and mostly interested in measuring in
... (read more)
2Jsevillamol
If you want to join the Spanish-speaking EA community, you can do so through this link!
PabloAMCΩ220

For the record, I think Jose Orallo (and his lab), in Valencia, Spain and CSER, Cambridge, is quite interested in this same exact topics (evaluation of AI models, specifically towards safety). Jose is a really good researcher, part of the FLI existential risk faculty community, and has previously organised AI Safety conferences. Perhaps it would be interesting for you to get to know each other.

Ok, so perhaps: specific tips on how to become a distiller: https://www.lesswrong.com/posts/zo9zKcz47JxDErFzQ/call-for-distillers In particular:

  • How to plan what to write about?
  • Here to write about it (lesswrong or somewhere else too)?
  • How much time do you expect this to take? Thanks Steven!

Are there examples or best practices you would recommend for this?

I think value learning might be causal because human preferences cannot be observed, and therefore can act as a confounder, similar to the work in

Zhang, J., Kumor, D., Bareinboim, E. Causal Imitation Learning with Unobserved Confounders. In Advances in Neural Information Processing Systems 2020.

At least that was one of my motivations.

I think predicting things you have no data on ("what if the AI does something we didn't foresee") is sort of an impossible problem via tools in "data science." You have no data!

Sure, I agree. I think I was quite inaccurat... (read more)

While I enjoyed this post, I wanted to indicate a couple of reasons why you may want to instead stay in academia or industry, rather than being an independent researcher:

  • The first one is that it gives more financial stability.
  • The second is that academia or industry set the bar high. If you get to publish in a good conference and get substantial citations, you know that you are making progress.

Now, many will argue that Safety is still preparadigmatic and consequently there might be contributions that do not really fit well into standard academic journ... (read more)

Hi Ilya! Thanks a lot for commenting :)

(a) I think "causal representation learning" is too vague, this overview (https://arxiv.org/pdf/2102.11107.pdf) talks about a lot of different problems I would consider fairly unrelated under this same heading.

Yes, you're right. I had found this, and other reviews by similar authors. In this one, I was mostly thinking of section VI (Learning causal variables) and its applications to RL (section VII-E). Perhaps section V on causal discovery is also relevant.

(b) I would try to read "classical causal inference" stu

... (read more)
4IlyaShpitser
Classical RL isn't causal, because there's no confounding (although I think it is very useful to think about classical RL causally, for doing inference more efficiently). Various extensions of classical RL are causal, of course. A lot of interesting algorithmic fairness isn't really causal.  Classical prediction problems aren't causal. ---------------------------------------- However, I think domain adaptation, covariate shift, semi-supervised learning are all causal problems. --- I think predicting things you have no data on ("what if the AI does something we didn't foresee") is sort of an impossible problem via tools in "data science."  You have no data!
2Kay Kozaronek
Thanks, Pablo. This invite worked. Good to know that there's already such a big community.
Answer by PabloAMC60

My prediction. Some comments

  • Your percentiles:
    • 5th: 2023-05-16
    • 25th: 2033-03-31
    • 50th: 2046-04-13
    • 75th: 2075-11-27
    • 95th: above 2100-01-01
  • You think 2034-03-27 is the most likely point
    • You think it's 6.9X as likely as 2099-05-29