Thanks for your comment! I didn't get around to answering earlier, but maybe it's still useful to try to clarify a few things.
If I understand the setup correctly, there's no guarantee that the optimal element would be good, right?
My threat model here is that we have access to an Oracle that's not trustworthy (as specified in the first paragraph), so that even if we were able to specify our preferences correctly, we would still have a problem. So in this context you could assume that we managed to specify our preferences correctly. If our problem is simply ...
I think that, if you are wanting a formally verified proof of some maths theorem out of the oracle, then this is getting towards actually likely to not kill you.
Yes, I believe that's within reach using this technique.
You can start with m huge, and slowly turn it down, so you get a long list of "no results", followed by a proof. (Where the optimizer only had a couple of bits of free optimization in choosing which proof.)
This is quite dangerous though if the Oracle is deceptively withholding answers; I commented on this in the last paragraph of t...
How do we prove the AI tools we used didn't insert the most gnarly backdoors possible? Things that require a particular code implementation that is valid, in top of a subtle compiler bug, on top of an exact timing hardware bug that can't be tested for and therefore won't be found in validation....
I believe this exactly the kind of thing that my proposal would be good for: Gnarly backdoors that exploit a compiler bug etc. should be very rare in the set of all valid implementations!
First thought: The oracle is going to choose to systematically answer or not answer the queries we give it. This represents a causal channel of one bit per query it can use to influence the outside world[1]. Can you conquer the world in one awkwardly delivered kilobyte or less? Maybe.
Agreed. I think it's potentially a good bit worse than one kilobyte if let ourselves bet tricked to ask many questions, different questions or lower the difficulty of the safety constraint too much.
As mentioned in footnote 10, this requires a kind of perfect coordination...
We probably would've been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and factions of humanity. An Oracle which is simply neutral is still default dangerous.
I completely agree with that. My proposal does not address the global coordination problem that we face, but it might be a useful tool if we collectively get our act together or if the first party with access to superintelligence has enough slack to proceed extra carefully. Even more modestly, I was hoping this might contribute to our theoretical understanding of why soft-optimization can be useful.
The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to 'safe' solutions reported by the Oracle.
Sure, I mostly agree with the distinction you're making here between "sins of commission" and "sins of omissions". Contrary to you, though, I believe that getting rid of the threat of "sins of commission" is extremely useful. If the output of the Oracle is just optimized to fulfill your satisfaction goal and not for anything else, you've basically got...
The example you gave about the Oracle producing a complicated plan that leaks the source of the Oracle is an example of this: It's trivially defended against by not connecting the device the Oracle is running on to the internet and not using the same device to execute the great "cure all cancer" plan. (I don't believe that either you or I would have made that mistake!)
We probably would've been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and fact...
Ah, I think there was a misunderstanding. I (and maybe also quetzal_rainbow?) thought that in the inverted world also no "apparently-very-lucrative deals" that turn out to be scams are known, whereas you made a distinction between those kind of deals and Ponzi schemes in particular.
I think my interpretation is more in the spirit of the inversion, otherwise the Epistemologist should really have answered as you suggested, and the whole premise of the discussion (people seem to have trouble understanding what the Spokesperson is doing) is broken.
I think this would be a good argument against Said Achmiz's suggested response, but I feel the text doesn't completely support it, e.g. the Epistemologist says "such schemes often go through two phases" and "many schemes like that start with a flawed person", suggesting that such schemes are known to him.
The soft optimization post took 24 person-weeks (assuming 4 people half-time for 12 weeks) plus some of Jeremy's time.
Team member here. I think this is a significant overestimate, I'd guess at 12-15 person-weeks. If it's relevant I can ask all former team members how much time they spent; it was around 10h per week for me. Given that we were beginners and spent a lot of time learning about the topic, I feel we were doing fine and learnt a lot.
Working on this part-time was difficult for me and the fact that people are not working on these things full-time in the camp should be considered when judging research output.
Missile attacks are not piracy, though, right?
It's good that you learned a few things from these incidents, but I'm sceptical of the (different) claim implied by the headline that Peter Zeihan was meaningfully correct here. If you interpret "directions" imprecisely enough, it's not hard to be sometimes directionally correct.
I guess it's hard to keep "they are experimenting with / building huge amounts of tanks" and "they are conducting combined arms exercises" secret from France and Russia, so they would have a lot of advance warning and could then also develop tanks.
But if you have lot more than a layman's understanding of tank design / combined arms doctrine, you could still come out ahead in this.
Microsoft is the sort of corporate bureaucracy where dynamic orgs/founders/researchers go to die. My median expectation is that whatever former OpenAI group ends up there will be far less productive than they were at OpenAI.
I'm a bit sceptical of that. You gave some reasonable arguments, but all of this should be known to Sam Altman, and he still chose to accept Microsoft's offer instead of founding his own org (I'm assuming he would easily able to raise a lot of money). So, given that "how productive are the former OpenAI folks at Microsoft?" is the crux of the argument, it seems that recent events are good news iff Sam Altman made a big mistake with that decision.
I'm confused by this statement. Are you assuming that AGI will definitely be built after the research time is over, using the most-plausible-sounding solution?
Or do you believe that you understand NOW that a wide variety of approaches to alignment, including most of those that can be thought of by a community of non-upgraded alignment researchers (CNUAR) in a hundred years, will kill everyone and that in a hundred years the CNUAR will not understand this?
If so, is this because you think you personally know better or do you predict the CNUAR will predictably update in the wrong direction? Would it matter if you got to choose the composition of the CNUAR?
I suspect Wave refers to this company: https://www.wave.com/en/ (they are connected to EA)
Planecrash is a glowfic co-written by Yudkowsky: https://glowficwiki.noblejury.com/books/planecrash
Seconding the recommendation of the rest in motion post, it has helped me with a maybe-similar feeling.
I think I mostly agree with this, but from my perspective it hints that you're framing the problem slightly wrong. Roughly, the problem with the outsourcing-approaches is our inability to specify/verify solutions to the alignment problem, not that specifying is not in general easier than solving yourself.
(Because of the difficulty of specifying the alignment problem, I restricted myself to speculating about pivotal acts in the post linked above.)
In cases where outsourcing succeeds (to various degrees), I think the primary load-bearing mechanism of success in practice is usually not "it is easier to be confident that work has been done correctly than to actually do the work", at least for non-experts.
I find this statement very surprising. Isn't almost all of software development like this?
E.g., the client asks the developer for a certain feature and then clicks around the UI to check if it's implemented / works as expected.
"This is what it looks like in practice, by default, when someone tries to outsource some cognitive labor which they could not themselves perform."
This proves way too much.
I agree, I think this even proves P=NP.
Maybe a more reasonable statement would be: You can not outsource cognitive labor if you don't know how to verify the solution. But I think that's still not completely true, given that interactive proofs are a thing. (Plug: I wrote a post exploring the idea of applying interactive proofs to AI safety.)
No, that's not quite right. What you are describing is the NP-Oracle.
On the other hand, with the IP-Oracle we can (in principle, limited by the power of the prover/AI) solve all problems in the PSPACE complexity class.
Of course, PSPACE is again a class of decision problems, but using binary search it's straightforward to extract complete answers like the designs mentioned later in the article.
Your reasoning here relies on the assumption that the learning mostly takes place during the individual organisms lifetime. But I think it's widely accepted that brains are not "blank slates" at birth of the organism, but contain significant amount of information, akin to a pre-trained neural network. Thus, if we consider evolution as the training process, we might reach the opposite conclusion: Data quantity and training compute are extremely high, while parameter count (~brain size) and brain compute is restricted and selected against.
Hypothesis: If a part of the computation that you want your trained system to compute "factorizes", it might be easier to evolve a modular system for this computation. By factorization I just mean that (part of) the computation can be performed using mostly independent parts / modules.
Reasoning: Training independent parts to each perform some specific sub-calculation should be easier than training the whole system at once. E.g. training n neural networks of size N/n should be easier (in terms of compute or data needed) than training one of size N, given th...
Well, if your chances of getting infected are drastically reduced, then so is the use of the "protect others" effect of wearing the mask, so overall these masks are likely to be very useful.
That said, a slightly modified design that filters air both on the in- and the out- breath might be a good idea. This way, you keep your in-breath filters dry and have some "protect others" effect.
[...] P3 masks, worn properly, with appropriate eye protection while maintaining basic hand hygiene are efficient in preventing SARS-CoV-2 infection regardless of setting.
If this is true, then this is a great idea and it's somewhat suprising that these masks are not in widespread use already.
I suspect the plan is a bit less practical than stated, as I expect there to be problems with compliance, in particular because the mask are mildly unpleasant to wear for prolonged periods.
The paper had nothing to do with what you talked about in your opening paragraph
What? Your post starts with:
My goal in this essay is to analyze some widely discussed scenarios that predict dire and almost unavoidable negative behavior from future artificial general intelligences, even if they are programmed to be friendly to humans.
Eli's opening paragraph explains the "basic UFAI doomsday scenario". How is this not what you talked about?
It depends on the skill difference and the size of the board, on smaller boards the advantage is probably pretty large: Discussion on LittleGolem
Regarding the drop of unemployment in Germany, I've heard it claimed that it is mainly due to changing the way the unemployment statististics are done, e.g. people who are in temporary, 1€/h jobs and still receiving benefits are counted als employed. If this point is still important, I can look for more details and translate.
EDIT: Some details are here:
...It is possible to earn income from a job and receive Arbeitslosengeld II benefits at the same time. [...] There are criticisms that this defies competition and leads to a downward spiral in wages and the l
Almost certainly? That's a bit too confident for my taste.