Thoroughly agree except for what to do with money. I expect that throwing money at orgs that are trying to slow down AI progress (eg PauseAI, or better if someone makes something better) gets you more utility per dollar than nvidia (and also it's more ethical).
Edit: to be clear, I mean actual utility in your utility function. Even if you're fully self-interested and not altruistic at all, I still think your interests are better served by donating to PauseAI-type orgs than investing in nvidia.
"why would they be doing that?"
same reason people make poor decisions all the time. if they had a clear head and hadn't already sunk some cost into AI, they could see that working on AI might make them wealthy in the short term but it'll increase {the risk that they die soon} enough that they go "not worth it", as they should. but once you're already working in AI stuff, it's tempting and easy to retroactively justify why doing that is safe. or to just not worry about it and enjoy the money, even though if you thought about the impact of your actions on your own survival in the next few years you'd decide to quit.
at least that's my vague best guess.
people usually think of corporations as either {advancing their own interests and also the public's interests} or {advancing their own interests at cost to the public} — ime mostly the latter. what's actually going on with AI frontier labs, i.e. {going against the interests of everyone including themselves}, is very un-memetic and very far from the overton window.
in fiction, the heads of big organizations are either good (making things good for everyone) or evil (worsening everyone else's outcomes, but improving their own). most of the time, just evil. ver...
There are plenty examples in fiction of greed and hubris leading to a disaster that takes down its own architects. The dwarves who mined too deep and awoke the Balrog, the creators of Skynet, Peter Isherwell in "Don't Look Up", Frankenstein and his Creature...
^^ Why wouldn't people seeing a cool cyborg tool just lead to more cyborg tools? As opposed to the black boxes that big tech has been building?
You imply a cyborg tool is a "powerful unaligned AI", it's not, it's a tool to improve bandwidth and throughput between any existing AI (which remains untouched by cyborg research) and the human
I was making a more general argument that applies mainly to powerful AI but also to all other things that might help one build powerful AI (such as: insights about AI, cyborg tools, etc). These things-that-help have the...
I think (not sure!) the damage from people/orgs/states going "wow, AI is powerful, I will try to build some" is larger than the upside of people/orgs/states going "wow, AI is powerful, I should be scared of it". It only takes one strong enough one of the former to kill everyone, and the latter is gonna have a very hard time stopping all of them.
By not informing the public that AI is indeed powerful, awareness of that fact is disproportionately allocated to people who will choose to think hard about it on their own, and thus that knowledge is more likely to...
Even if tool AI is controllable, tool AI can be used to assist in building non-tool AI. A benign superassistant is one query away from outputting world-ending code.
Right, Tamsin: so reasonable safety standards would presumably ban fully unrestricted superassistants too, but allow more limited assistants that could still be incredibly helpful. I'm curious what AI safety standards you'd propose – it's not a hypothetical question, since many politicians would like to know.
In my opinion the hard part would not be figuring out where to donate to {decrease P(doom) a lot} rather than {decrease P(doom) a little}, but figuring out where to donate to {decrease P(doom)} rather than {increase P(doom)}.
(oops, this ended up being fairly long-winded! hope you don't mind. feel free to ask for further clarifications.)
There's a bunch of things wrong with your description, so I'll first try to rewrite it in my own words, but still as close to the way you wrote it (so as to try to bridge the gap to your ontology) as possible. Note that I might post QACI 2 somewhat soon, which simplifies a bunch of QACI by locating the user as {whatever is interacting with the computer the AI is running on} rather than by using a beacon.
A first pass is to correct your descriptio...
Hi !
ATA is extremely neglected. The field of ATA is at a very early stage, and currently there does not exist any research project dedicated to ATA. The present post argues that this lack of progress is dangerous and that this neglect is a serious mistake.
I agree it's neglected, but there is in fact at least one researh project dedicated to at least designing alignment targets: the part of the formal alignment agenda dedicated to formal outer alignment, which is the design of math problems to which solutions would be world-saving. Our notable attempts ...
I wonder how much of those seemingly idealistic people retained power when it was available because they were indeed only pretending to be idealistic. Assuming one is actually initially idealistic but then gets corrupted by having power in some way, one thing someone can do in CEV that you can't do in real life is reuse the CEV process to come up with even better CEV processes which will be even more likely to retain/recover their just-before-launching-CEV values. Yes, many people would mess this up or fail in some other way in CEV; but we only need one
the main arguments for the programmers including all of [current?] humanity in the CEV "extrapolation base" […] apply symmetrically to AIs-we're-sharing-the-world-with at the time
I think timeless values might possibly help resolve this; if some {AIs that are around at the time} are moral patients, then sure, just like other moral patients around they should get a fair share of the future.
If an AI grabs more resources than is fair, you do the exact same thing as if a human grabs more resources than is fair: satisfy the values of moral patients (including...
trying to solve morality by themselves
It doesn't have to be by themselves; they can defer to others inside CEV, or come up with better schemes that their initial CEV inside CEV and then defer to that. Whatever other solutions than "solve everything on your own inside CEV" might exist, they can figure those out and defer to them from inside CEV. At least that's the case in my own attempts at implementing CEV in math (eg QACI).
Seems really wonky and like there could be a lot of things that could go wrong in hard-to-predict ways, but I guess I sorta get the idea.
I guess one of the main things I'm worried about is that it seems to require that we either:
So the formalized concept is Get_Simplest_Concept_Which_Can_Be_Informally_Described_As("QACI is an outer alignment scheme consisting of…")
? Is an informal definition written in english?
It seems like "natural latent" here just means "simple (in some simplicity prior)". If I read the first line of your post as:
Has anyone thought about QACI could be located in some simplicity prior, by searching the prior for concepts matching(??in some way??) some informal description in english?
It sure sounds like I should read the two posts you linked (perhaps especia...
To me kinda the whole point of QACI is that it tries to actually be fully formalized. Informal definitions seem very much not robust to when superintelligences think about them; fully formalized definitions are the only thing I know of that keep meaning the same thing regardless of what kind of AI looks at it or with what kind of ontology.
I don't really get the whole natural latents ontology at all, and mostly expect it to be too weak for us to be able to get reflectively stable goal-content integrity even as the AI becomes vastly superintelligent. If defi...
The knightian in IB is related to limits of what hypotheses you can possibly find/write down, not - if i understand so far - about an adversary. The adversary stuff is afaict mostly to make proofs work.
I don't think this makes a difference here? If you say "what's the best not-blacklisted-by-any-knightian-hypothesis action", then it doesn't really matter if you're thinking of your knightian hypotheses as adversaries trying to screw you over by blacklisting actions that are fine, or if you're thinking of your knightian hypotheses as a more abstract worst...
This is indeed a meaningful distinction! I'd phrase it as:
"Global" and "local" is not the worst nomenclature. Maybe "global" vs "personal" values? I dunno.
...my best idea is to call the former "global preferences" and the latter "local preferences", but that clashes with the pre-existing notion of locality of preferences as the quality of termina
That is, in fact, a helpful elaboration! When you said
Most people who "work on AI alignment" don't even think that thinking is a thing.
my leading hypotheses for what you could mean were:
Now, instead, my leading hypothesis is that you mean:
To be more precise: extrapolated over time, for any undesired selection process or other problem of that kind, either the problem is large enough that it gets exarcerbated over time so much that it eats everything — and then that's just extinction, but slower — or it's not large enough to win out and aligned superintelligence(s) + coordinated human action is enough to stamp it out in the long run, which means they won't be an issue for almost all of the future.
It seems like for a problem to be just large enough that coordination doesn't stamp it away, but ...
single-use
Considering how loog it took me to get that by this you mean "not dual-use", I expect some others just won't get it.
Some people who are very concerned about suffering might be considering building an unaligned AI that kills everyone just to avoid the risk of an AI takeover by an AI aligned to values which want some people to suffer.
Let this be me being on the record saying: I believe the probability of {alignment to values that strongly diswant suffering for all moral patients} is high enough, and the probability of {alignment to values that want some moral patients to suffer} is low enough, that this action is not worth it.
I think this applies to approximately anyone w...
sigh I wish people realized how useless it is to have money when the singularity happens. Either we die or we get a utopia in which it's pretty unlikely that pre-singularity wealth matters. What you want to maximize is not your wealth but your utility function, and you sure as hell are gonna get more from LDT handshakes with aligned superintelligences in saved worlds, if you don't help OpenAI reduce the amount of saved worlds.
downvote and agree. but being financially ruined makes it harder to do other things, and it's probably pretty aversive to go through even if you expect things to turn out better in expectation because of it. the canaries thing seems pretty reasonable to me in light of this.
I wish you would realize that whatever we're looking at, it isn't people not realizing this.
I believe that ChatGPT was not released with the expectation that it would become as popular as it did.
Well, even if that's true, causing such an outcome by accident should still count as evidence of vast irresponsibility imo.
You continue to model OpenAI as this black box monolith instead of trying to unravel the dynamics inside it and understand the incentive structures that lead these things to occur. Its a common pattern I notice in the way you interface with certain parts of reality.
I don't consider OpenAI as responsible for this as much as Paul Christiano and Jan Leike and his team. Back in 2016 or 2017, when they initiated and led research into RLHF, they focused on LLMs because they expected that LLMs would be significantly more amenable to RLHF. This means that instruct...
I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.
Half a year ago, I'd have guessed that OpenAI leadership, while likely misguided, was essentially well-meaning and driven by a genuine desire to confront a difficult situation. The recent series of events has made me update significantly against the general trustworthiness and general epistemic reliability of Altman and his circle. While my overall view of OpenAI's strategy hasn't really changed, my likelihood of them possibly "knowing better" has dramatically gone down now.
I still parse that move as devastating the commons in order to make a quick buck.
I believe that ChatGPT was not released with the expectation that it would become as popular as it did. OpenAI pivoted hard when it saw the results.
Also, I think you are misinterpreting the sort of 'updates' people are making here.
but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.
Rather, " us" — the good alignment researchers who will be careful at all about the long term effects of our actions, unlike capabilities researchers who are happy to accelerate race dynamics and increase p(doom) if they make a quick profit out of it in the short term.
I think these judgements would benefit from more concreteness: that rather than proposing a dichotomy of "capabilities research" (them, Bad) and "alignment research" (us, Good), you could be more specific about what kinds of work you want to see more and less of.
I agree that (say) Carmack and Sutton are doing a bad thing by declaring a goal to "build AGI" while dismissing the reasons that this is incredibly dangerous. But the thing that makes infohazard concerns so fraught is that there's a lot of work that potentially affects our civilization's trajectory...
I am a utilitarian and agree with your comment.
The intent of the post was
I agree people should take the utilitalianly optimal action, I just think they're doing the utilitarian calculus wrong or not doing the calculus at all.
I think research that is mostly about outer alignment (what to point the AI to) rather than inner alignment (how to point the AI to it) tends to be good — quantilizers, corrigibility, QACI, decision theory, embedded agency, indirect normativity, infra bayesianism, things like that. Though I could see some of those backfiring the way RLHF did — in the hands of a very irresponsible org, even not very capabilities-related research can be used to accelerate timelines and increase race dynamics if the org doing it thinks it can get a quick buck out of it.
I don't buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs
I don't think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
That said, if someone hasn't thought at all about concepts like "differentially advancing safety" or "capabilities externalities," then reading this post would probably be helpful, and I'd endorse thinking about those issues.
That's...
I don't think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
There currently seems to be >10x as many people directly trying to build AGI/improve capabilities as trying to improve safety.
Suppose that the safety people have as good ideas and research ability as the capabilities people. (As a simplifying assumption.)
Then, if all the safety people switched to working full time on maximally advancing capabilities, this woul...
One straightforward alternative is to just not do that; I agree it's not very satisfying but it should still be the action that's pursued if it's the one that has more utility.
I wish I had better alternatives, but I don't. But the null action is an alternative.
It certainly is possible! In more decision-theoritic terms, I'd describe this as "it sure would suck if agents in my reference class just optimized for their own happiness; it seems like the instrumental thing for agents in my reference class to do is maximize for everyone's happiness". Which is probly correct!
But as per my post, I'd describe this position as "not intrinsically altruistic" — you're optimizing for everyone's happiness because "it sure would sure if agents in my reference class didn't do that", not because you intrinsically value that everyone be happy, regardless of reasoning about agents and reference classes and veils of ignorance.
some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following:
my utility function is about what i want for just me; but i'm altruistic (/egalitarian/cosmopolitan/pro-fairness/etc) because decision theory says i should cooperate with other agents. decision theoritic cooperation is the true name of altruism.
it's possible that this is true for some people, but in general i expect that to be a mistaken anal...
An interesting question for me is how much true altruism is required to give rise to a generally altruistic society under high quality coordination frameworks. I suspect it's quite small.
Another question is whether building coordination frameworks to any degree requires some background of altruism. I suspect that this is the case. It's the hypothesis I've accreted for explaining the success of post-war economies (guessing that war leads to a boom in nationalistic altruism, generally increased fairness and mutual faith).
I would feel better about this if there was something closer to (1) on which to discuss what is probably the most important topic in history (AI alignment). But noted.
I'm generally not a fan of increasing the amount of illegible selection effects.
On the privacy side, can lesswrong guarantee that, if I never click on Recommended, then recombee will never see an (even anonymized) trace of what I browse on lesswrong?
Here the thing that I'm calling evil is pursuing short-term profits at the cost of non-negligeably higher risk that everyone dies.
Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety.
There's just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies.
(No, "you need huge profits to solve alignment" isn't a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)
It's generally also very questionable that they started creating models for research, then seamlessly pivoted to commercial exploitation without changing any of their practices. A prototype meant as proof of concept isn't the same as a safe finished product you can sell. Honestly, only in software and ML we get people doing such shoddy engineering.
Unambiguously evil seems unnecessarily strong. Something like "almost certainly misguided" might be more appropriate? (still strong, but arguably defensible)
There's also the case of harmful warning shots: for example, if it turns out that, upon seeing an AI do a scary but impressive thing, enough people/orgs/states go "woah, AI is powerful, I should make one!" or "I guess we're doomed anyways, might as well stop thinking about safety and just enjoy making profit with AI while we're still alive", to offset the positive effect. This is totally the kind of thing that could be the case in our civilization.
There could be a difference but only after a certain point in time, which you're trying to predict / plan for.
What you propose, ≈"weigh indices by kolmogorov complexity" is indeed a way to go about picking indices, but "weigh indices by one over their square" feels a lot more natural to me; a lot simpler than invoking the universal prior twice.
If you use the UTMs for cartesian-framed inputs/outputs, sure; but if you're running the programs as entire worlds, then you still have the issue of "where are you in time".
Say there's an infinitely growing conway's-game-of-life program, or some universal program, which contains a copy of me at infinitely many locations. How do I weigh which ones are me?
It doesn't matter that the UTM has a fixed amount of weight, there's still infinitely many locations within it.
Suppose I live inside a simulation ran by a computer over which I have some control.
Scenario 1: I make the computer run the following:
pause simulation
if is even(calculate billionth digit of pi):
resume simulation
Suppose, after running this program, that I observe that I still exist. This is some anthropic evidence for the billionth digit of pi being even.
Thus, one can get anthropic evidence about logical facts.
Scenario 2: I
I didn't see a clear indication in the post about whether the music is AI-generated or not, and I'd like to know; was there an indication I missed?
(I care because I'll want to listen to that music less if it's AI-generated.)
Huh I had the opposite reaction -- I was listening to it and was like "meh these voices are a bit bland, the beats are too but that's fine I guess. Makes sense for an amateur band. Good effort though, and great April Fools joke." Now I'm like "wait this is AI? Cooooooool"
UPDATE: I judged them too harshly. I think the voices and beats are not bland in general, I think just for the first song or two that I happened to listen to. Also, most of the songs are growing on me as I listen to them.
Yes, it doesn't say so explicitly, but it's very clear from the post that it is.
Taboo the word "intelligence".
An agent can superhumanly-optimize any utility function. Even if there are objective values, a superhuman-optimizer can ignore them and superhuman-optimize paperclips instead (and then we die because it optimized for that harder than we optimized for what we want).
(I'm gonna interpret these disagree-votes as "I also don't think this is the case" rather than "I disagree with you tamsin, I think this is the case".)
I don't think this is the case, but I'm mentioning this possibility because I'm surprised I've never seen someone suggest it before:
Maybe the reason Sam Altman is taking decisions that increase p(doom) is because he's a pure negative utilitarian (and he doesn't know-about/believe-in acausal trade).
Reposting myself from discord, on the topic of donating 5000$ to EA causes.
...if you're doing alignment research, even just a bit, then the 5000$ are probly better spent on yourself
if you have any gears level model of AI stuff then it's better value to pick which alignment org to give to yourself; charity orgs are vastly understaffed and you're essentially contributing to the "picking what to donate to" effort by thinking about it yourself
if you have no gears level model of AI then it's hard to judge which alignment orgs it's helpful to donate to (or, if gi
I agree that there's no substitute for thinking about this for yourself, but I think that morally or socially counting "spending thousands of dollars on yourself, an AI researcher" as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it's-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it's very easy for donating to people or organizations in your social circle to have substantial negative ...
yes, edited
10M$ sounds like it'd be a lot for PauseAI-type orgs imo, though admittedly this is not a very informed take.
Anyways, I stand by my comment; I expect throwing money at PauseAI-type orgs is better utility per dollar than nvidia even after taking into account that investing in nvidia to donate to PauseAI later is a possibility.