Thanks! We were wondering about that. Is there any way we could be changed to the frontpage category?
It's the latter.
A bomb would not be an optimizing system, because the target space is not small compared to the basin of attraction. An AI that systematically dismantles things would be an optimizing system if for no other reason than that the AI systematically preserves its own integrity.
It's worse, even, in a certain way, than that: the existence of optimizing systems organized around a certain idea of "natural class" feeds back into more observers observing data that is distributed according to this idea of "natural class", leading to more optimizing systems being built around that idea of "natural class", and so on.
Once a certain idea of "natural class" gains a foothold somewhere, observers will make real changes in the world that further suggest this particular idea of "natural class" to others, and this forms a feedback loop.
If you pin down what a thing refers to according to what that thing was optimized to refer to, then don't you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept "thermodynamics" refers to, it may not be enough to look at the time evolution of the concept "thermodynamics" on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn't it raise anot...
There seems to be some real wisdom in this post but given the length and title of the post, you haven't offered much of an exit -- you've just offered a single link to a youtube channel for a trauma healer. If what you say here is true, then this is a bit like offering an alcoholic friend the sum total of one text message containing a single link to the homepage of alcoholics anonymous -- better than nothing, but not worthy of the bombastic title of this post.
friends and family significantly express their concern for my well being
What exact concerns do they have?
Wow, thank you for this context!
I just want to acknowledge the very high emotional weight of this topic.
For about two decades, many of us in this community have been kind of following in the wake of a certain group of very competent people tackling an amazingly frightening problem. In the last couple of years, coincident with a quite rapid upsurge in AI capabilities, that dynamic has really changed. This is truly not a small thing to live through. The situation has real breadth -- it seems good to take it in for a moment, not in order to cultivate anxiety, but in order to really engage w...
That is correct. I know it seems little weird to generate a new policy on every timestep. The reason it's done that way is that the logical inductor needs to understand the function that maps prices to the quantities that will be purchased, in order to solve for a set of prices that "defeat" the current set of trading algorithms. That function (from prices to quantities) is what I call a "trading policy", and it has to be represented in a particular way -- as a set of syntax tree over trading primitives -- in order for the logical inductor to solve for pri...
Thank you for this extraordinarily valuable report!
I believe that what you are engaging in, when you enter into a romantic relationship with either a person or a language model, is a kind of artistic creation. What matters is not whether the person on the "other end" of the relationship is a "real person" but whether the thing you create is of true benefit to the world. If you enter into a romantic relationship with a language model and produce something of true benefit to the world, then the relationship was real, whether or not there was a "real person" on the other end of it (whatever that would mean, even in the case of a human).
This is a relatively banal meta-commentary on reasons people sometimes give for doing worst-case analysis, and the differences between those reasons. The post reads like a list of things with no clear through-line. There is a gesture at an important idea from a Yudkowsky post (the logistic success curve idea) but the post does not helpfully expound that idea. There is a kind of trailing-off towards the end of the post as things like "planning fallacy" seem to have been added to the list with little time taken to place them in the context of the other thing...
Many people believe that they already understand Dennett's intentional stance idea, and due to that will not read this post in detail. That is, in many cases, a mistake. This post makes an excellent and important point, which is wonderfully summarized in the second-to-last paragraph:
...In general, I think that much of the confusion about whether some system that appears agent-y “really is an agent” derives from an intuitive sense that the beliefs and desires we experience internally are somehow fundamentally different from those that we “merely” infer and a
Have you personally ever ridden in a robot car that has no safety driver?
This post consists of comments on summaries of a debate about the nature and difficulty of the alignment problem. The original debate was between Eliezer Yudkowsky and Richard Ngo but this post does not contain the content from that debate. This posts is mostly of commentary by Jaan Tallinn on that debate, with comments by Eliezer.
The post provides a kind of fascinating level of insight into true insider conversations about AI alignment. How do Eliezer and Jaan converse about alignment? Sure, this is a public setting, so perhaps they communicate differentl...
Thanks for the note.
In Life, I don't think it's easy to generate an X-1 time state that leads to an X time state, unfortunately. The reason is that each cell in an X time state puts a logical constraint on 9 cells in an X-1 time state. It is therefore possible to set up certain constraint satisfaction problems in terms of finding an X-1 time state that leads to an X time state, and in general these can be NP-hard.
However, in practice, it is very very often quite easy to find an X-1 time state that leads to a given X time state, so maybe this experiment cou...
Interesting. Thank you for the pointer.
The real question, though, is whether it is possible within our physics.
Oh the only information I have about that is Dave Green's comment, plus a few private messages from people over the years who had read the post and were interested in experimenting with concrete GoL constructions. I just messaged the author of the post on the GoL forum asking about whether any of that work was spurred by this post.
Thanks - fixed! And thank you for the note, too.
Yeah it might just be a lack of training data in 10-second-or-less interactive instructions.
The thing I really wanted to test with this experiment was actually whether ChatGPT could engage with the real world using me as a guinea pig. The 10-second-or-less thing was just the format I used to try to "get at" the phenomenon of engaging with the real world. I'm interested in improving the format to more cleanly get at the phenomenon.
I do currently have the sense that it's more than just a lack of training data. I have the sense that ChatGPT has learned much l...
I asked a group of friends for "someone to help me with an AI experiment" and then I gave this particular friend the context that I wanted her help guiding me through a task via text message and that she should be in front of her phone in some room that was not the kitchen.
I asked a group of friends for "someone to help me with an AI experiment" and then I gave this particular friend the context that I wanted her help guiding me through a task via text message and that she should be in front of her phone in some room that was not the kitchen.
If you look at how ChatGPT responds, it seems to be really struggling to "get" what's happening in the kitchen -- it never really comes to the point of giving specific instructions, and especially never comes to the point of having any sense of the "situation" in the kitchen -- e.g. whet...
I'm very interested in Wei Dai's work, but I haven't followed closely in recent years. Any pointers to what I might read of his recent writings?
I do think Eliezer tackled this problem in the sequences, but I don't really think he came to an answer to these particular questions. I think what he said about meta-ethics is that it is neither that there is some measure of goodness to be found in the material world independent from our own minds, nor that goodness is completely open to be constructed based on our whims or preferences. He then says "well there ju...
Recursive relevance realization seems to be designed to answer about the "quantum of wisdom".
It does! But... does it really answer the question? Curious about your thoughts on this.
you ask whether you are aligned to yourself (ideals, goals etc) and find that your actuality is not coherent with your aim
Right! Very often, what it means to become wiser is to discover something within yourself that just doesn't make sense, and then to in some way resolve that.
Discovering incoherency seems very different from keeping a model on coherence rails
True. Eliezer is quite vague about the term "coherent" in his write-ups, and some more recent discussions of CEV drop it entirely. I think "coherent" was originally about balancing the extrapo...
Did you ever end up reading Reducing Goodhart?
Not yet, but I hope to, and I'm grateful to you for writing it.
processes for evolving humans' values that humans themselves think are good, in the ordinary way we think ordinary good things are good
Well, sure, but the question is whether this can really be done by modelling human values and then evolving those models. If you claim yes then there are several thorny issues to contend with, including what constitutes a viable starting point for such a process, what is a reasonable dynamic for such a process, and on what basis we decide the answers to these things.
Wasn't able to record it - technical difficulties :(
Yes, I should be able to record the discussion and post a link in the comments here.
If you train a model by giving it reward when it appears to follow a particular human's intention, you probably get a model that is really optimizing for reward, or appearing to follow said humans intention, or something else completely different, while scheming to seize control so as to optimize even more effectively in the future. Rather than an aligned AI.
Right yeah I do agree with this.
...Perhaps instead you mean: No really the reward signal is whether the system really deep down followed the humans intention, not merely appeared to do so [...] That
Well even if language models do generalize beyond their training domain in the way that humans can, you still need to be in contact with a given problem in order to solve that problem. Suppose I take a very intelligent human and ask them to become a world expert at some game X, but I don't actually tell them the rules of game X nor give them any way of playing out game X. No matter how intelligent the person is, they still need some information about what the game consists of.
Now suppose that you have this intelligent person write essays about how one ough...
This is a post about the mystery of agency. It sets up a thought experiment in which we consider a completely deterministic environment that operates according to very simple rules, and ask what it would be for an agentic entity to exist within that.
People in the game of life community actually spent some time investigating the empirical questions that were raised in this post. Dave Greene notes:
...The technology for clearing random ash out of a region of space isn't entirely proven yet, but it's looking a lot more likely than it was a year ago, that a work
Thanks for this note Dave
This post attempts to separate a certain phenomenon from a certain very common model that we use to understand that phenomenon. The model is the "agent model" in which intelligent systems operate according to an unchanging algorithm. In order to make sense of their being an unchanging algorithm at the heart of each "agent", we suppose that this algorithm exchanges inputs and outputs with the environment via communication channels known as "observations" and "actions".
This post really is my central critique of contemporary artificial intelligence discourse....
This is an essay about methodology. It is about the ethos with which we approach deep philosophical impasses of the kind that really matter. The first part of the essay is about those impasses themselves, and the second part is about what I learned in a monastery about addressing those impasses.
I cried a lot while writing this essay. The subject matter -- the impasses themselves -- are deeply meaningful to me, and I have the sense that they really do matter.
It is certainly true that there are these three philosophical impasses -- each has been discussed in...
This post trims down the philosophical premises that sit under many accounts of AI risk. In particular it routes entirely around notions of agency, goal-directedness, and consequentialism. It argues that it is not humans losing power that we should be most worried about, but humans quickly gaining power and misusing such a rapid increase in power.
Re-reading the post now, I have the sense that the arguments are even more relevant than when it was written, due to the broad improvements in machine learning models since it was written. The arguments in this po...
Thanks Scott
Thanks for writing this.
Alignment research has a track record of being a long slow slog. It seems that what we’re looking for is a kind of insight that is just very very hard to see, and people who have made real progress seem to have done so through long periods of staring at the problem.
With your two week research sprints, how do you decide what to work on for a given sprint?
Well suffering is a real thing, like bread or stones. It's not a word that refers to a term in anyone's utility function, although it's of course possible to formulate utility functions that refer to it.
The direct information I'm aware of is (1) CZ's tweets about not acquiring, (2) SBF's own tweets yesterday, (3) the leaked P&L doc from Alameda. I don't think any of these are sufficient to decide "SBF committed fraud" or "SBF did something unethical". Perhaps there is additional information that I haven't seen, though.
(I do think that if SBF committed fraud, then he did something unethical.)
If you view people as machiavelian actors using models to pursue goals then you will eventually find social interactions to be bewildering and terrifying, because there actually is no way to discern honesty or kindness or good intention if you start from the view that each person is ultimately pursuing some kind of goal in an ends-justify-means way.
But neither does it really make sense to say "hey let's give everyone the benefit of the doubt because then such-and-such".
I think in the end you have to find a way to trust something that is not the particular beliefs or goals of a person.
In Buddhist ideology, the reason to pick one set of values over another is to find an end to suffering. The Buddha claimed that certain values tended to lead towards the end of suffering and other values tended to lead in the opposite direction. He recommended that people check this claim for themselves.
In this way values are seen as instrumental rather than fundamental in Buddhism -- that is, Buddhists pick values on the basis of the consequences of holding those values, rather than any fundamental rightness of the values themselves.
Now you may say that t...
There's mounting evidence that FTX was engaged in theft/fraud, which would be straightforwardly unethical.
I think it's way too early to decide anything remotely like that. As far as I understand, we have a single leaked balance sheet from Alameda and a handful of tweets from CZ (CEO of Binance) who presumably got to look at some aspect of FTX internals when deciding whether to acquire. Do we have any other real information?
I'm curious about this too. I actually have the sense that overall funding for AI alignment was already larger than overall shovel-ready projects before FTX was involved. This is normal and expected in a field that many people is working on an important problem but where most of the work is funding for research, and where hardly anyone has promising scalable uses for money.
I think this led a lot of prizes being announced. A prize is a good way to fund if you don't see enough shovel-ready projects to exhaust your funding. You offer prizes for anyone who can...
Regarding your point on ELK: to make the output of the opaque machine learning system counterfactable, wouldn't it be sufficient to include the whole program trace? Program trace means the results of all the intermediate computations computed along the way. Yet including a program trace wouldn't help us much if we don't know what function of that program trace will tell us, for example, whether the machine learning system is deliberately deceiving us.
So yes it's necessary to have an information set that includes the relevant information, but isn't the main part of the (ELK) problem to determine what function of that information corresponds to the particular latent variable that we're looking for?
If I understand you correctly, the reason that this notion of counterfactable connects with what we normally call a counterfactual is that when an event screens of its own history, it's easy to consider other "values" of the "variable" underlying that event without coming into any logical contradictions with other events ("values of other variables") that we're holding fixed.
For example if I try to consider what would have happened if there had been a snow storm in Vermont last night, while holding fixed the particular weather patterns observed in Vermont ...
I expect you could build a system like this that reliably runs around and tidies your house say, or runs your social media presence, without it containing any impetus to become a more coherent agent (because it doesn’t have any reflexes that lead to pondering self-improvement in this way).
I agree, but if there is any kind of evolutionary variation in the thing then surely the variations that move towards stronger goal-directedness will be favored.
I think that overcoming this molochian dynamic is the alignment problem: how do you build a powerful system ...
I really appreciate this post!
For instance, employers would often prefer employees who predictably follow rules than ones who try to forward company success in unforeseen ways.
Fascinatingly, EA employers in particular seem to seek employees who do try to forward organization goals in unforeseen ways!
Got it! Good to know.