User Comment Replies

Great write up!

Why don't you do this in a mouse first? The whole cycle from birth to phenotype, including complex reasoning (e.g. bayesian inference, causality) can take 6 months.

3GeneSmith10d

I would love to try this in mice. Unfortunately our genetic predictors for mice are terrible. The way mouse research works is not at all like how one would want it to work if we planned to actually use them as a testbed for the efficacy of genetic engineering. Mice are mostly clones. So we don't have the kind of massive GWAS datasets on which genes are doing what and how large the effect sizes are. Instead we have a few hundred studies mostly on the effects of gene knockouts to determine the function of particular proteins. But we're mostly not interested in knockouts for genetic engineering. 2/3rds of the disease related alleles in humans are purely single letter base pair changes. We have very little idea which specific single letter base pair changes affect things like disease risk in mice. MAYBE some of the human predictors translate. We haven't actually explicitly tested this yet. And there's at least SOME hope here; we know that (amazingly), educational attainment predictors actually predict trainability in dogs with non-zero efficacy. So perhaps there's some chance some of our genetic predictors for human diseases would translate at least somewhat to mice. We do need to do more thorough investigation of this but I'm not really that hopeful. I think a far better test bed is in livestock, especially cows. We have at least a few hundred thousand cow genomes sequenced and we have pretty well labelled phenotype data. It should be sufficient to get a pretty good idea of which alleles are causing changes in breed value, which is the main metric all the embryo selection programs are optimizing for.

The Field of AI Alignment: A Postmortem, and What To Do About It

catubc1mo30

Exactly, and thanks for writing this.

I would go further and say that - AI safety is AI dev - and this happened years ago. If we stopped it all now, we'd extend our timelines:

https://www.lesswrong.com/posts/vkzmbf4Mve4GNyJaF/the-case-for-stopping-ai-safety-research

Decomposing Agency — capabilities without desires

catubc7mo10

Interesting read, would be great to see more done in this direction. However,it seems that mind-body dualism is still the prevalent (dare I say "dominant") mode of understanding human will and consciousness in CS and AI-safety. In my opinion - the best picture we have of human value creation comes from social and psychological sciences - not metaphysics and mathematics - and it would be great to have more interactions with those fields.

For what it's worth I've written a bunch on agency-loss as an attractor in AI/AGI-human interactions.

https://www.les... (read more)

The case for stopping AI safety research

catubc7mo20

Sorry, fixed broken link now.

The problem with "understanding the concept of intent" - is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent - and correlates like "well-being" mean - for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.

2Seth Herd7mo

I intended to refer to understanding the concept of manipulation adequately to avoid it if the AGI "wanted" to. As for understanding the concept of intent, I agree that "true" intent is very difficult to understand, particularly if it's projected far into the future. That's a huge problem for approaches like CEV. The virtue of the approach I'm suggesting is that it entirely bypasses that complexity (while introducing new problems). Instead of inferring "true" intent, the AGI just "wants" to do what the human principal tells it to do. The human gets to decide what their intent is. The machine just has to understand what the human meant by what they said- and the human can clarify that in a conversation. I'm thinking of this as do what I mean and check (DWIMAC) alignment. More on this in Instruction-following AGI is easier and more likely than value aligned AGI. I'll read your article.

The case for stopping AI safety research

catubc7mo*20

Seth. I just spoke about this work at ICML yesterday. Some other similar works:

Eliezers work from way back in 2004. https://intelligence.org/files/CEV.pdf. I haven't read it in full - but it's about AIs that interact with human volition - which is what I'm also worried about.

Christiano's: https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like. This is a lot about slow take offs and AI's that slowly become unstoppable or unchangeable because they become part of our economic world.

My paper on arxiv is a bit of a long read... (read more)

2Seth Herd7mo

Thank you! The link to your paper is broken. I've read the Christiano piece. And some/most of the CEV paper, I think. Any working intent alignment solution needs to prevent changing the intent of the human on purpose. That is a solvable problem with an AGI that understands the concept.

The case for stopping AI safety research

catubc10mo116

Thanks Garrett. There is obviously nuance that a 1min post can't get at. I am just hoping for at least some discussion to be had on this topic. There seems to be little to none now.

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

catubc2y70

Thanks for the comment. I agree broadly of course, but the paper says more specific things. For example, agency needs to be prioritized, probably taken outside of standard optimization, otherwise decimating pressure is applied on other concepts including truth and other "human values". The other part is a empirical one, also related to your concern, namely, human values are quite flexible and biology doesn't create hard bounds / limits on depletion. If you couple that with ML/AI technologies that will predict what we will do next - then approaches that depend on human intent and values (broadly) are not as safe anymore.

Why Simulator AIs want to be Active Inference AIs

catubc2y103

Thanks so much for writing this, I think it's a much needed - perhaps even a bit late contribution connecting static views of GPT-based LLMs to dynamical systems and predictive processing. I do research on empirical agency and it's still surprises me how little the AI-safety community touches on this central part of agency - namely that you can't have agents without this closed loop.

I've been speculating a bit (mostly to myself) about the possibility that "simulators" are already a type of organism - given that appear to do active inference - ... (read more)

4Jan_Kulveit2y

Thanks for the comment. In my view it's one of the results of AI safety community being small and sort of bad in absorbing knowledge from elsewhere - my guess is this is in part a quirk due to founders effects, and also downstream of incentive structure on platforms like LessWrong. But please do share this stuff. I think we don't have exact analogues of LLMs in existing systems, so there is a question where it's better to extend the boundaries of some concepts, where to create new concepts. I agree we are much more likely to use 'intentional stance' toward processes which are running on somewhat comparable time scales.

Red-teaming AI-safety concepts that rely on science metaphors

catubc2y10

Thanks for the comment Erik (and taking the time to read the post).

I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an "optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned". I see little difference - but I could be persuaded otherwise.

My post was meant to show that it's pretty easy to find significant holes in some of the most central co... (read more)

Red-teaming AI-safety concepts that rely on science metaphors

catubc2y20

Thanks for the comment. Indeed, if we could agree on capping, or slowing down, that would be a promising approach.

Focus on the places where you feel shocked everyone's dropping the ball

catubc2y1-5

Thank you so much for this effectiveness focused post. I thought I would add another perspective, namely "against the lone wolf" approach, i.e. that AI-safety will come down to one person, or a few persons, or an elite group of engineers somewhere. I agree for now there are some individuals who are doing more conceptual AI-framing than others, but in my view I am "shocked that everyone's dropping the ball" by putting up walls and saying that general public is not helpful. Yes, they might not be helpful now, but we need to work on this!... Maybe someo... (read more)

A newcomer’s guide to the technical AI safety field

catubc2yΩ120

Hi Chin. Thanks for writing this review, it seems like a well-needed and timed article - at least from my perspective as I was looking for something like this. In particular, I'm trying to frame my research interest relative to AI-safety field, but as you point out this is still too early.

I am wondering if you have any more insights for how you came up with your diagram above? In particular, are there any more peer-reviewed articles, or arXiv papers like Amodei et al (https://arxiv.org/abs/1606.06565) that you relied on? For example, I do... (read more)

1zeshen2y

With regards the Seed AI paradigm, most of the publications seem to have come from MIRI (especially the earlier ones when they were called the Singularity Institute) with many discussions happening both here on LessWrong as well as events like the Singularity Summit. I'd say most of the thinking around this paradigm happened before the era of deep learning. Nate Soares' post might provide more context. You're right that brain-like AI has not had much traction yet, but it seems to me that there is a growing interest in this research area lately (albeit much slower than the Prosaic AI paradigm), and I don't think they fall squarely under either of the Seed AI paradigm nor the Prosaic AI paradigm. Of course there may be considerable overlap between those 'paradigms', but I felt that they were sufficiently distinct to warrant a category of its own even though I may not think of it as a critical concept in AI literature.

AGIs may value intrinsic rewards more than extrinsic ones

catubc2y20

Thanks for the reply Jonathan. Indeed I'm also a bit skeptical that our innate drives (whether the ones from SDT theory or others) are really non-utility maximizing. But in some cases they do appear so.

One possibility is that they were driven to evolve for utility maximization but have now broken off completely and serve some difficult-to-understand purpose. I think there are similar theories of how consciousness developed - i.e. that it evolved as a by-effect/side-effect of some inter-organism communication - and now plays many other roles.

AGIs may value intrinsic rewards more than extrinsic ones

catubc2y10

Hi Roman.

First of all, thank you so much for reading and taking the time to respond.

I don't have the time - or knowledge - to respond to everything, but from your response, I worry that my article partially missed the target. I'm trying to argue that humans may not be just - utility - maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there's no real utility for some or perhaps the most important things that we value. Seeking out "surprising" resu... (read more)

1Roman Leventov2y

Let me rephrase your thought, as I understand it: "I don't think humans are (pure) RL-like agents, they are more like ActInf agents" (by "pure" RL I mean RL without entropy regularization, or other schemes that motivate exploration). There is copious literature finding the neuronal, neuropsychological, or psychological makeup of humans "basically implementing Active Inference", as well as "basically implementing RL". The portion of this research that is more rigorous maps the empirical observations from neurobiology directly onto the mathematics of ActInf and RL, respectively. I think this kind of research is useful, it equips us with instruments to predict certain aspects of human behaviour, and suggests avenues for disorder treatment. The portion of this research that is less rigorous and more philosophical, is like pointing out "it looks like humans behave here like ActInf agents", or "it looks like humans behave here like RL agents". This kind of philosophy is only useful for suggesting a direction for mining empirical observations, to either confirm or disprove theories that in this or that corner of behaviour/psychology, humans act more like ActInf, or RL agents. (Note that I would not count observations from psychology here, because they are notoriously unreliable themselves, see reproducibility crisis, etc.) RL is not falsifiable, too. Both can be seen as normative theories of agency. Normative theories are unfalsifiable, they are prescriptions, or, if you want, the sources of the definition of agency. However, I would say that ActInf is also a physical theory (apart from being normative) because it's derived from (or at least related to) statistical mechanics and the principle of least action. RL is "just" a normative framework of agency because I don't see any relationship with physics in it (again, if you don't add entropy regularisation). I answered to this question above: yes, you can design AI that will not minimise or maximise any utility or cost

Agency engineering: is AI-alignment "to human intent" enough?

catubc2y*21

Thanks Nathan. I understand that most people working on technical AI-safety research focus on this specific problem, namely of aligning AI - and less on misuse. I don't expect a large ai-misuse audience here.

Your response - that "truly-aligned-AI" would not change human intent - was also suggested by other AI researchers. But this doesn't address the problem: human intent is created from (and dependent on) societal structures. Perhaps I failed to make this clearer. But I was trying to suggest we lack an understanding of the genesis of human actions/intenti... (read more)

Can We Align a Self-Improving AGI?

catubc3y32

Great post Peter. I think a lot about whether it even makes sense to use the term "aligned AGI" as powerfull AGIs may break human intention for a number of reasons (https://www.lesswrong.com/posts/3broJA5XpBwDbjsYb/agency-engineering-is-ai-alignment-to-human-intent-enough).

I see you didn't refer to AIs become self driven (as in Omohundro: https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf). Is there a reason you don't view this as part of the college kid problem?

2Peter S. Park2y

Thank you so much for your kind words! I really appreciate it. One definition of alignment is: Will the AI do what we want it to do? And as your post compellingly argues, "what we want it to do" is not well-defined, because it is something that a powerful AI could be able to influence. For many settings, using a term that's less difficult to rigorously pin down, like safe AI, trustworthy AI, or corrigible AI, could have better utility. I would definitely count the AI's drive towards self-improvement as a part of the College Kid Problem! Sorry if the post did not make that clear.

3TAG3y

Alignment needs something to.align with, but it's far from proven that there is a coherent set of values shared by all humans.

Simulators

catubc3y21

Thanks for sharing! If I had a penny for every article that - in hindsight - would have taken me 10% of the time/effort to write ... lol

Simulators

catubc3y50

Thanks for the great post. 2-meta questions.

How long did it take you to write this? I work in academia and am curious to know how such a piece of writing relates to writing an opinion piece on my planet.
Is there a video and/or Q&A at some point (forgive me if I missed it).

janus3y112

LOL. Your question opens a can of worms. It took more than a year from when I first committed to writing about simulators, but the reason it took so long wasn't because writing the actual words in this post took a long time, rather:

I spent the first few months rescoping and refactoring outlines. Most of the ideas I wanted to express were stated in the ontology I've begun to present in this post, and I kept running into conceptual dependencies. The actual content of this post is very pared down in scope compared to what I had originally planned.
After

catubc3y10

Hi TAG, indeed, the post was missing some clarifications. I added a bit more about free will to the text, I hope it's helpful.

0TAG3y

The "free will problem" now looks like two problems. One is the theoretical problem of whether free will exists; the other is the practical problem of how powerful AI s might affect de facto human agency ... agency which might fall short of the traditional concept of free will. The impact of a perfect predictor, God or Laplace's Demon on free will has nothing to do with its actual existence. Such a predictor is only possible in a deterministic universe, and it is determinism, not prediction that impacts fee will. You don't become unfree when someone predicts you , you always were. There are a lot of things a powerful AI could do manipulate people, and they are not limited to prediction.

Agency engineering: is AI-alignment "to human intent" enough?

catubc3y30

Hi Charlie. Thanks for the welcome!

Indeed, I think that's a great way to put it "preserving human agency around powerful systems" (I added it to the article). Thanks for that! I am pessimistic that this is possible (or that the question makes sense as it stands). I guess what I tried to do above - was a soft argument that "intent-aligned AIs" might not make sense without further limits or boundaries on both human intent and what AIs can do.

I agree hard wiring is probably not the best solution. However, humans are probably hardwired with a bunch of to... (read more)

Agency engineering: is AI-alignment "to human intent" enough?

catubc3y20

Thanks shminux. My apologies for the confusion, part of my point was that we don't have consensus on whether we have free will (the professional philosophers usually fall into ~60% compatibilists; but the sociologists have a different conception altogether; and the physicists etc.). I think this got lost because I was not trying to explain the philosophical position on free will. [I have added a very brief note in the main text to clarify what I think of as the "free will problem"].

The rest of the post was an attempt to argue that because human actio... (read more)

LESSWRONG
LW

All of catubc's Comments + Replies