LESSWRONG
LW

PabloAMC — LessWrong

AGI and the structural foundations of democracy and the rule-based international order

1mo

Summary: This post argues that Artificial General Intelligence (AGI) threatens both liberal democracy and rule-based international order through a parallel mechanism. Domestically, if AGI makes human labor economically unnecessary, it removes the structural incentive for inclusive democratic institutions—workers lose leverage when their contribution is no longer essential. Internationally, if AGI gives one nation overwhelming productivity advantages, it erodes other countries' comparative advantages, reducing the benefits of trade and weakening incentives to maintain a rule-based world order. The post draws historical parallels to early 20th century concerns about capital concentration, distinguishes between "maritime" (trade-dependent) and "continental" (autarkic) power strategies, and discusses what middle powers like the EU might do to remain relevant. The... (read 2944 more words →)

International Programme on AI Evaluations

PabloAMC

4mo

Summary: I am helping set up a new skilling-up academic program centred on AI evaluations and their intersection with AI safety. Our goal is to train the people who will who will determine whether AI is safe and beneficial. This should include the various types of methodologies and tools available, and how to use them.

You can learn more at https://ai-evaluation.org/ and apply here.

Background

Available programs to skill up in AI safety broadly fall into a few categories: bootcamps ( e.g. ML4Good or Arena), online courses (e.g. Blue Dot Impact) and mentorship-based research programs (e.g. MATS). We believe that there is a niche market not being fully exploited at the moment, in both the... (read 441 more words →)

Recent progress on the science of evaluations

PabloAMC

8mo

Summary: This post presents new methodological innovations presented in the paper General Scales Unlock AI Evaluation with Explanatory and Predictive Power. The paper introduces a set of general (universal) cognitive abilities that allow us to predict and explain AI system behaviour out of distribution. I outline what I believe are the main advantages of this methodology, indicate future extensions, and discuss implications for AI safety. Particularly exciting points include the future extension to propensities and how it could be useful for alignment and control evaluations.

Disclaimer: I am a (secondary) author of this paper and thus my opinions are likely somewhat biased; though I sincerely believe this is a good evaluation methodology, with... (read 2277 more words →)

The EU commission seeks expert advisers on AI

PabloAMC

8mo

The EU Commission has opened a call for expressions of interest for researchers who would like to advise the EU AI office on the implementation of the AI act, concerning among other topics those related to the safety of general purpose AI systems. There’s a requirement to either hold a PhD or equivalent experience . On the other hand, there’s no need to be an EU national, though >80% will be. Take a look an apply here: https://digital-strategy.ec.europa.eu/en/news/commission-seeks-experts-ai-scientific-panel

PabloAMC2yQuick Take

The main problem with wireheading, manipulation... seems related to a confusion between the goal in the world and its representation inside the agent. Perhaps a way to deal with this problem is to use the fact that the agent may be aware of it being an embedded agent. That means that it could be aware of the goal representing an external fact of the world, and we could potentially penalize the divergence between the goal and its representation during training.

PabloAMC's Shortform

PabloAMC

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Replying toAGI safety field building projects I’d like to see

PabloAMC3y

AGI safety field building projects I’d like to see

Something I believe could also be helpful is to have a non-archival peer review system that helps improve the quality of safety writings or publications; and optionally makes a readable blog post etc.

Replying toLooking for Spanish AI Alignment Researchers

PabloAMC3y

Looking for Spanish AI Alignment Researchers

Hey Antb, I'm Pablo, Spanish and based in Spain. As far as I know these are the following AI Safety researchers:

Jaime Sevilla leads the organization Epoch AI, mostly interested in forecasting AI.
Juan Rocamonde and Adrià Garriga work full time on AI alignment in FAR.ai and Redwood Research.
There is a professor, called Jose Hernández Orallo, working between Cambridge and Valencia who has been for quite a while in AI Safety. I know him and he is an excellent researcher. Part of the existential AI Safety FLI community, and mostly interested in measuring intelligence.
There are also a couple of other professors at the Carlos III university in Madrid who could be friendly to these topics

PabloAMC

Summary: This is a submission for the goal misgeneralization contest organized by AI Alignment Awards, as well as the third iteration of some slowly improving AI Safety research agenda that I aim to pursue at some point in the future.

Thanks to Jaime Sevilla for the comments that helped sharpen the specific research proposals. Needless to say, I encourage people to take part in these contests and am happy to receive comments or criticisms.

Main proposal

Causal representation learning (CRL) is a set of techniques that can be used to find “high-level causal variables from low-level observations” (Schölkopf, 2021). By causal representation, we mean a causal model (usually Structural Causal Model, SCM) that disentangles the... (read 2111 more words →)

Replying toEvaluations project @ ARC is hiring a researcher and a webdev/engineer

PabloAMC3y

Evaluations project @ ARC is hiring a researcher and a webdev/engineer

For the record, I think Jose Orallo (and his lab), in Valencia, Spain and CSER, Cambridge, is quite interested in this same exact topics (evaluation of AI models, specifically towards safety). Jose is a really good researcher, part of the FLI existential risk faculty community, and has previously organised AI Safety conferences. Perhaps it would be interesting for you to get to know each other.

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

PabloAMC

Summary

This is a distillation post intended to summarize the article How RL Agents Behave When Their Actions Are Modified? by Eric Langlois and Tom Everitt, published at AAAI-21. The article describes Modified Action MDPs, where the environment or another agent such as a human may override the action of an agent. Then it studies the behavior of different agents depending on the training objective. Interestingly, some standard agents may ignore the possibility of action modifications, making them corrigible.

Check also this brief summary by the authors themselves. This post was corrected and improved thanks to comments by Eric Langlois.

Introduction

Causal incentives is one research line of AI Safety, sometimes framed as closely related to... (read 2127 more words →)

Replying toHow to write a LW sequence to learn a topic?

PabloAMC4y

How to write a LW sequence to learn a topic?

Ok, so perhaps: specific tips on how to become a distiller: https://www.lesswrong.com/posts/zo9zKcz47JxDErFzQ/call-for-distillers In particular:

How to plan what to write about?
Here to write about it (lesswrong or somewhere else too)?
How much time do you expect this to take? Thanks Steven!

Replying toCall For Distillers

PabloAMC4y

Call For Distillers

Are there examples or best practices you would recommend for this?

How to write a LW sequence to learn a topic?

PabloAMC

I am thinking of regularly writing a Lesswrong sequence on Causality applied to Machine Learning, as a lesser alternative to doing an AI Safety postdoc on the topic I suggested here.

The purpose of both would be to learn sufficiently about Causality applied to ML such that I can later on contribute with original research. For context, I am finishing a Ph.D. in quantum algorithms so I know how to do research, the issue is learning about a new research area rather.

However, the postdoc, which is still dependent on the OpenPhil decision next week, seems a bit worse career option than a job as a quantum research scientist in a startup in some... (read more)

Replying toAn Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC4y

An Open Philanthropy grant proposal: Causal representation learning of human preferences

I think value learning might be causal because human preferences cannot be observed, and therefore can act as a confounder, similar to the work in

Zhang, J., Kumor, D., Bareinboim, E. Causal Imitation Learning with Unobserved Confounders. In Advances in Neural Information Processing Systems 2020.

At least that was one of my motivations.

I think predicting things you have no data on ("what if the AI does something we didn't foresee") is sort of an impossible problem via tools in "data science." You have no data!

Sure, I agree. I think I was quite inaccurate. I am referring to transportability analysis, to be more specific. This approach should help in new situations where we have not directly trained our system, and in which our preferences could change.

Replying toHow To Get Into Independent Research On Alignment/Agency

PabloAMC4y

How To Get Into Independent Research On Alignment/Agency

While I enjoyed this post, I wanted to indicate a couple of reasons why you may want to instead stay in academia or industry, rather than being an independent researcher:

The first one is that it gives more financial stability.
The second is that academia or industry set the bar high. If you get to publish in a good conference and get substantial citations, you know that you are making progress.

Now, many will argue that Safety is still preparadigmatic and consequently there might be contributions that do not really fit well into standard academic journals or conferences. My answer to this point is that we should aim to make AI Safety become paradigmatic.... (read more)

Replying toAn Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC4y

An Open Philanthropy grant proposal: Causal representation learning of human preferences

Hi Ilya! Thanks a lot for commenting :)

(a) I think "causal representation learning" is too vague, this overview (https://arxiv.org/pdf/2102.11107.pdf) talks about a lot of different problems I would consider fairly unrelated under this same heading.

Yes, you're right. I had found this, and other reviews by similar authors. In this one, I was mostly thinking of section VI (Learning causal variables) and its applications to RL (section VII-E). Perhaps section V on causal discovery is also relevant.

(b) I would try to read "classical causal inference" stuff. There is a lot of reinventing of the wheel (often, badly) happening in the causal ML space.

Probably there is, I have to get to speed on quite... (read more)

Replying toAn Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC4y

An Open Philanthropy grant proposal: Causal representation learning of human preferences

Hey Koen, Thanks a lot for the pointers! The literature I am most aware of are https://crl.causalai.net/, https://githubmemory.com/repo/zhijing-jin/Causality4NLP_Papers and Bernhard Scholkopf's webpage

An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC

This is a proposal I wrote for the recent Open Philanthropy call on AI Alignment projects. I am finishing my Ph.D. in quantum computing and I would like to start working on AI Alignment. Thus, this is my proposal for a postdoc-style grant to learn more about one area I think the community could pay more attention to. However, since I am quite new to this area, there might be things that I have overlooked. For that reason, if you see something that doesn't sound promising, or you disagree with it, I would be happy to know about it.

The proposal received some light comments from Victor Veitch, Ryan Carey, and José Hernández-Orallo.... (read 2241 more words →)

A FLI postdoctoral grant application: AI alignment via causal analysis and design of agents

PabloAMC

TL;DR: This is a proposal I wrote for the Future of Life postdoctoral grants, on the topic of AI Safety. The topic of this proposal is being able to model human preferences in a causal model. This is somewhat similar to the agenda of value learning, but more related to that one of causal incentive design https://causalincentives.com/, which I think is interesting enough that more people should be aware of its existence. I think there might be ideas worth considering, but given that I am no expert in the topic of causal models there might be important technical or strategic errors, which I would be glad to know about. In other words, this is... (read 1846 more words →)

LESSWRONG
LW

LESSWRONG
LW

PabloAMC

Implications of Quantum Computing for Artificial Intelligence Alignment Research

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

Causal representation learning as a technique to prevent goal misgeneralization

AGI and the structural foundations of democracy and the rule-based international order

PabloAMC

PabloAMC

AGI and the structural foundations of democracy and the rule-based international order

International Programme on AI Evaluations

Recent progress on the science of evaluations

The EU commission seeks expert advisers on AI

PabloAMC's Shortform

Causal representation learning as a technique to prevent goal misgeneralization

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

PabloAMC

Implications of Quantum Computing for Artificial Intelligence Alignment Research

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

Causal representation learning as a technique to prevent goal misgeneralization

AGI and the structural foundations of democracy and the rule-based international order

PabloAMC

PabloAMC

AGI and the structural foundations of democracy and the rule-based international order

International Programme on AI Evaluations

Recent progress on the science of evaluations

The EU commission seeks expert advisers on AI

PabloAMC's Shortform

Causal representation learning as a technique to prevent goal misgeneralization

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

Background

Main proposal

Summary

Introduction