All of Simon Fischer's Comments + Replies

will almost certainly be a critical period for AGI development.

Almost certainly? That's a bit too confident for my taste.

5Cameron Berg
Note this is not equivalent to saying 'we're almost certainly going to get AGI during Trump's presidency,' but rather that there will be substantial developments that occur during this period that prove critical to AGI development (which, at least to me, does seem almost certainly true).

Thanks for your comment! I didn't get around to answering earlier, but maybe it's still useful to try to clarify a few things.

If I understand the setup correctly, there's no guarantee that the optimal element would be good, right?

My threat model here is that we have access to an Oracle that's not trustworthy (as specified in the first paragraph), so that even if we were able to specify our preferences correctly, we would still have a problem. So in this context you could assume that we managed to specify our preferences correctly. If our problem is simply ... (read more)

I think that, if you are wanting a formally verified proof of some maths theorem out of the oracle, then this is getting towards actually likely to not kill you.

Yes, I believe that's within reach using this technique.

 

You can start with m huge, and slowly turn it down, so you get a long list of "no results", followed by a proof. (Where the optimizer only had a couple of bits of free optimization in choosing which proof.) 

This is quite dangerous though if the Oracle is deceptively withholding answers; I commented on this in the last paragraph of t... (read more)

2Donald Hobson
If the oracle is deceptively withholding answers, give up on using it. I had taken the description to imply that the oracle wasn't doing that. 

I'm not sure I understand your question. What restriction do you have in mind? A safety restriction on what the generated code should be like? Something like requiring the code should be in some canonical form to remove degrees of freedom for the (potential malicious) code generating AI?

2[anonymous]
I gave "changing canon randomly" in the comment you are replying to. Is this how you propose limiting the hostile AIs ability to inject subtle hostile plans? Or similarly, "design the columns for this building. Oh they must all be roman arches." Would be a similar example.

How do we prove the AI tools we used didn't insert the most gnarly backdoors possible? Things that require a particular code implementation that is valid, in top of a subtle compiler bug, on top of an exact timing hardware bug that can't be tested for and therefore won't be found in validation....

I believe this exactly the kind of thing that my proposal would be good for: Gnarly backdoors that exploit a compiler bug etc. should be very rare in the set of all valid implementations!

2[anonymous]
For this particular situation, can you describe what the restriction would be in concrete terms? Is it "ok write this compiler function to convert C arithmetic to bytecode. Declare any variables used at the latest valid location. Use only 3 registers. " And then elsewhere in the compiler the restriction might be "declare any variables used at the top of the main function and pass it by reference to any child functions. Use all available registers possible, and manually update the instruction pointer"

I think you mean  (two answers that satisfice and fulfill the safety constraint), but otherwise I agree. This is also an example of this whole "let's measure optimization in bits"-business being a lot more subtle than it appears at first sight.

2Lucius Bushnaq
Typo fixed, thanks.

First thought: The oracle is going to choose to systematically answer or not answer the queries we give it. This represents a causal channel of one bit per query it can use to influence the outside world[1]. Can you conquer the world in one awkwardly delivered kilobyte or less? Maybe.

Agreed. I think it's potentially a good bit worse than one kilobyte if let ourselves bet tricked to ask many questions, different questions or lower the difficulty of the safety constraint too much. 

As mentioned in footnote 10, this requires a kind of perfect coordination... (read more)

We probably would've been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and factions of humanity. An Oracle which is simply neutral is still default dangerous.

I completely agree with that. My proposal does not address the global coordination problem that we face, but it might be a useful tool if we collectively get our act together or if the first party with access to superintelligence has enough slack to proceed extra carefully. Even more modestly, I was hoping this might contribute to our theoretical understanding of why soft-optimization can be useful.

The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to 'safe' solutions reported by the Oracle.

Sure, I mostly agree with the distinction you're making here between "sins of commission" and "sins of omissions". Contrary to you, though, I believe that getting rid of the threat of "sins of commission" is extremely useful. If the output of the Oracle is just optimized to fulfill your satisfaction goal and not for anything else, you've basically got... (read more)

The example you gave about the Oracle producing a complicated plan that leaks the source of the Oracle is an example of this: It's trivially defended against by not connecting the device the Oracle is running on to the internet and not using the same device to execute the great "cure all cancer" plan. (I don't believe that either you or I would have made that mistake!)

We probably would've been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and fact... (read more)

Ah, I think there was a misunderstanding. I (and maybe also quetzal_rainbow?) thought that in the inverted world also no "apparently-very-lucrative deals" that turn out to be scams are known, whereas you made a distinction between those kind of deals and Ponzi schemes in particular.

I think my interpretation is more in the spirit of the inversion, otherwise the Epistemologist should really have answered as you suggested, and the whole premise of the discussion (people seem to have trouble understanding what the Spokesperson is doing) is broken.

3Martin Randall
If I was living in a world where there are zero observed apparently-very-lucrative deals that turn out to be scams then I hope I would conclude that there is some supernatural Creator who is putting a thumb on the scale to be sure that cheaters never win and winners never cheat. So I would invest in Ponzi Pyramid Inc. I would not expect to be scammed, because this is a world where there are zero observed apparently-very-lucrative deals that turn out to be scams. I would aim to invest in a diversified portfolio of apparently-very-lucrative deals, for all the same reasons I have a diversified portfolio in this world. In such a world the Epistemologist is promoting a world model that does not explain my observations and I would not take their investment advice, similarly to how in this world I ignore investment advice from people who believe that the economy is secretly controlled by lizard people.
3Said Achmiz
If the premise is a world where nobody ever does any scams or tries to swindle anyone out of money, then it’s so far removed from our world that I don’t rightly know how to interpret any of the included commentary on human nature / psychology / etc. Lying for personal gain is one of those “human universals”, without which I wouldn’t even recognize the characters as anything resembling humans.

I think this would be a good argument against Said Achmiz's suggested response, but I feel the text doesn't completely support it, e.g. the Epistemologist says "such schemes often go through two phases" and "many schemes like that start with a flawed person", suggesting that such schemes are known to him.

4Said Achmiz
Even setting aside such textual anomalies, why is this a good argument? As I noted in a sibling comment to yours, my response assumes that Ponzi schemes have never happened in this world, because otherwise we’d simply identify the Spokesperson’s plan as a Ponzi scheme! The reasoning that I described is only necessary because we can’t say “ah, a Ponzi scheme”!

In Section 5 we discuss why expect oversight and control of powerful AIs to be difficult.

Another typo, probably missing a "we".

The soft optimization post took 24 person-weeks (assuming 4 people half-time for 12 weeks) plus some of Jeremy's time.

Team member here. I think this is a significant overestimate, I'd guess at 12-15 person-weeks. If it's relevant I can ask all former team members how much time they spent; it was around 10h per week for me. Given that we were beginners and spent a lot of time learning about the topic, I feel we were doing fine and learnt a lot. 

Working on this part-time was difficult for me and the fact that people are not working on these things full-time in the camp should be considered when judging research output.

Missile attacks are not piracy, though, right?

It's good that you learned a few things from these incidents, but I'm sceptical of the (different) claim implied by the headline that Peter Zeihan was meaningfully correct here. If you interpret "directions" imprecisely enough, it's not hard to be sometimes directionally correct.

2Chris_Leong
Yeah, I probably could have framed the post a bit better, but I don't really think that affects the core point.

I know this answer doesn't qualify, but very likely the best you can currently do is: Don't do it. Don't train the model.

1lemonhope
Sorry doesn't count but I appreciate the sentiment

(I downvoted your comment because it's just complaining about downvotes to unrelated comments/posts and not meaningfully engaging with the topic at hand)

0Joseph Van Name
I am pointing out something wrong with the community here. The name of this site is LessWrong. On this site, it is better to acknowledge wrongdoing so that the people here do not fall into traps like FTX again. If you read the article, you would know that it is better to acknowledge wrongdoing or a community weakness than to double down.

"Powerful AIs Are Black Boxes" seems like a message worth sending out

Everybody knows what (computer) scientists and engineers mean by "black box", of course.

1Heron
I laughed when I read ' black box.' Us oldies who predominantly use Microsoft (Word and Excel) and Google Docs*, are mostly bewildered. by the very language of computer technology. For some, there is overwhelming fear because of ignorance. Regardless of MScs and PhDs, I know someone who refuses to consider the possibilities of AI, insisting that it is simply the means to amass data, perchance to extrapolate. It is, I think, worth educating even the old folk (me) whose ability to update is limited. * And maybe Zoom, and some Social Media.

I guess it's hard to keep "they are experimenting with / building huge amounts of tanks" and "they are conducting combined arms exercises" secret from France and Russia, so they would have a lot of advance warning and could then also develop tanks.

But if you have lot more than a layman's understanding of tank design / combined arms doctrine, you could still come out ahead in this.

 "6. f6" should be "6. h3".

1Zane
Thanks, fixed.

Microsoft is the sort of corporate bureaucracy where dynamic orgs/founders/researchers go to die. My median expectation is that whatever former OpenAI group ends up there will be far less productive than they were at OpenAI.


I'm a bit sceptical of that. You gave some reasonable arguments, but all of this should be known to Sam Altman, and he still chose to accept Microsoft's offer instead of founding his own org (I'm assuming he would easily able to raise a lot of money). So, given that "how productive are the former OpenAI folks at Microsoft?" is the crux of the argument, it seems that recent events are good news iff Sam Altman made a big mistake with that decision.

6Thane Ruthenis
Or if Sam Altman isn't actually primarily motivated by the desire to build an AGI, as opposed to standard power-/profit-maximization motives. Accelerationists are now touting him as their messiah, and he'd obviously always been happy to generate hype about OpenAI's business vision. But it's not necessarily the case that it translates into him actually believing, at the gut level, that the best way to maximize prosperity/power is to build an AGI. He may realize that an exodus into Microsoft would cripple OpenAI talent's ability to be productive, and do it anyway, because it offers him personally better political opportunities for growth. It doesn't even have to be a dichotomy of "total AGI believer" vs "total simulacrum-level-4 power-maximizer". As long as myopic political motives have a significant-enough stake in his thinking, they may lead one astray. "Doomers vs. Accelerationists" is one frame on this conflict, but it may not be the dominant one. "Short-sighted self-advancement vs. Long-term vision" is another, and a more fundamental one. Moloch favours capabilities over alignment, so it usually hands the victory to the accelerationists. But that only goes inasmuch as accelerationists' motives coincide with short-sighted power-maximization. The moment there's an even shorter-sighted way for things to go, an even lower energy-state to fall into, Moloch would cast capability-pursuit aside. The current events may (or may not!) be an instance of that.

I'm confused by this statement. Are you assuming that AGI will definitely be built after the research time is over, using the most-plausible-sounding solution?

Or do you believe that you understand NOW that a wide variety of approaches to alignment, including most of those that can be thought of by a community of non-upgraded alignment researchers (CNUAR) in a hundred years, will kill everyone and that in a hundred years the CNUAR will not understand this?

If so, is this because you think you personally know better or do you predict the CNUAR will predictably update in the wrong direction? Would it matter if you got to choose the composition of the CNUAR?

Another big source of potential volunteers: People who are going to be dead soon anyway. I'd probably volunteer if I knew that I'm dying from cancer in a few weeks anyway.

Typo: This should be .
 

after 17... dxc6 or 17. c6

This should probably be "after 17... cxd6 or 17... c6".

1Zane
Thanks, fixed.

I suspect Wave refers to this company: https://www.wave.com/en/ (they are connected to EA)

Planecrash is a glowfic co-written by Yudkowsky: https://glowficwiki.noblejury.com/books/planecrash

2MondSemmel
For Planecrash / Project Lawful there's a LW writeup here.

I don't believe these "practical" problems ("can't try long enough") generalize enough to support your much more general initial statement. This doesn't feel like a true rejection to me, but maybe I'm misunderstanding your point.

2[comment deleted]

I think I mostly agree with this, but from my perspective it hints that you're framing the problem slightly wrong. Roughly, the problem with the outsourcing-approaches is our inability to specify/verify solutions to the alignment problem, not that specifying is not in general easier than solving yourself.

(Because of the difficulty of specifying the alignment problem, I restricted myself to speculating about pivotal acts in the post linked above.)

2johnswentworth
Fair. I am fairly confident that (1) the video at the start of the post is pointing to a real and ubiquitous phenomenon, and (2) attempts to outsource alignment research to AI look like an extremely central example of a situation where that phenomenon will occur. I'm less confident that my models here properly frame/capture the gears of the phenomenon.

But you don't need to be able to code to recognize that a software is slow and buggy!?

About the terrible UI part I agree a bit more, but even there one can think of relatively objective measures to check usability without being able to speak python.

6johnswentworth
True! And indeed my uncle has noticed that it's slow and buggy. But you do need to be able to code to distinguish competent developers, and my uncle did not have so many resources to throw at the problem that he could keep trying long enough to find a competent developer, while paying each one to build the whole app before finding out whether they're any good. (Also I don't think he's fully aware of how bad his app is relative to what a competent developer could produce.)

In cases where outsourcing succeeds (to various degrees), I think the primary load-bearing mechanism of success in practice is usually not "it is easier to be confident that work has been done correctly than to actually do the work", at least for non-experts.

I find this statement very surprising. Isn't almost all of software development like this?
E.g., the client asks the developer for a certain feature and then clicks around the UI to check if it's implemented / works as expected.

2johnswentworth
At least in my personal experience, a client who couldn't have written the software themselves usually gets a slow, buggy product with a terrible UI. (My uncle is a good example here - he's in the septic business, hired someone to make a simple app for keeping track of his customers. It's a mess.) By contrast, at most of the places where I've worked or my friends have worked which produce noticeably good software, the bulk of the managers are themselves software engineers or former software engineers, and leadership always has at least some object-level software experience. The main outsourcing step which jumps between a non-expert and an expert, in that context, is usually between the customer and the company producing an app. And that's exactly where there's a standardized product. The bespoke products for non-expert customers - like e.g. my uncle's app for his business - tend to be a mess.

"This is what it looks like in practice, by default, when someone tries to outsource some cognitive labor which they could not themselves perform."
This proves way too much.

I agree, I think this even proves P=NP.

Maybe a more reasonable statement would be: You can not outsource cognitive labor if you don't know how to verify the solution. But I think that's still not completely true, given that interactive proofs are a thing. (Plug: I wrote a post exploring the idea of applying interactive proofs to AI safety.)

2johnswentworth
I think the standard setups in computational complexity theory assume away the problems which are usually most often blockers to outsourcing in practice - i.e. in complexity theory the problem is always formally specified, there's no question of "does the spec actually match what we want?" or "has what we want been communicated successfully, or miscommunicated?".

No, that's not quite right. What you are describing is the NP-Oracle.

On the other hand, with the IP-Oracle we can (in principle, limited by the power of the prover/AI) solve all problems in the PSPACE complexity class.

Of course, PSPACE is again a class of decision problems, but using binary search it's straightforward to extract complete answers like the designs mentioned later in the article.

Your reasoning here relies on the assumption that the learning mostly takes place during the individual organisms lifetime. But I think it's widely accepted that brains are not "blank slates" at birth of the organism, but contain significant amount of information, akin to a pre-trained neural network. Thus, if we consider evolution as the training process, we might reach the opposite conclusion: Data quantity and training compute are extremely high, while parameter count (~brain size) and brain compute is restricted and selected against.

2jacob_cannell
Much depends on what you mean by learning and mostly, but the evidence for some form of blank slate is overwhelming. Firstly most of the bits in the genome must code for cellular machinery and even then the total genome bits is absolutely tiny compared to brain synaptic bits. Then we have vast accumulating evidence from DL that nearly all the bits come from learning/experience, that optimal model bit complexity is proportional to dataset size (which not coincidentally is roughly on order 1e15 bits for humans - 1e9 seconds * 1e6 bit/s), and that the tiny tiny number of bits needed to specify architecture and learning hyperparams are simply a prior which can be overcome with more data. And there is much more.

Thank you for writing about this! A minor point: I don't think aerosolizing monkeypox suspensions using a nebulizer can be counted as gain of function research, not even "at least kind of". (Or do I lack reading comprehension and misunderstood something?)

Hypothesis: If a part of the computation that you want your trained system to compute "factorizes", it might be easier to evolve a modular system for this computation. By factorization I just mean that (part of) the computation can be performed using mostly independent parts / modules.

Reasoning: Training independent parts to each perform some specific sub-calculation should be easier than training the whole system at once. E.g. training n neural networks of size N/n should be easier (in terms of compute or data needed) than training one of size N, given th... (read more)

1Lucius Bushnaq
To clarify, the main difficulty I see here is that this isn't actually like training n networks of size N/n, because you're still using the original loss function.  Your optimiser doesn't get to see how well each module is performing individually, only their aggregate performance. So if module three is doing great, but module five is doing abysmally, and the answer depends on both being right, your loss is really bad. So the optimiser is going to happily modify three away from the optimum it doesn't know it's in. Nevertheless, I think there could be something to the basic intuition of fine tuning just getting more and more difficult for the optimiser as you increase the parameter count, and with it the number of interaction terms. Until the only way to find anything good anymore is to just set a bunch of those interactions to zero.  This would predict that in 2005-style NNs with tiny parameter counts, you would have no modularity. In real biology, with far more interacting parts, you would have modularity. And in modern deep learning nets with billions of parameters, you would also have modularity. This matches what we observe. Really neatly and simply too. It's also dead easy to test. Just make a CNN or something and see how modularity scales with parameter count. This is now definitely on our to do list.  Thanks a lot again, Simon!
Yes, this mask is more of a symbolic pic, perhaps Simon can briefly explain why he chose this one (copyright issues I think).

Yep, it's simply the first one in open domain that I found. I hope it's not too misleading; it should get the general idea across.

3Decius
Using a picture of a product to illustrate a discussion about it would be fair use even if there were copyrightable elements of the picture.
1jmh
Responding more to the other post but seems perhaps more sensible here as this seems more visible. The use of these masks has have serious drawback in the case of verbal communications. So, out of the box off the shelf these would not be a long term solution -- assuming worst case and no cure/vaccine and the virus stays around for years and years. However, we can use written communications if needed. Moreover, it would not be that hard to put communications into the mask some way. One, just a cheap internal mic and and external speaker. You could also integrated it via bluetooth and your smartphone or just dedicated bluetooth ear buds/head set getting paired with the internal mic. Obviously some new protocols for dealing with a a multi device setting would be needed but I cannot imagine that we don't already have solutions that are 80 to 90 percent ready to be apply to the specific setting. The upside here is that such innovations to the mask may well have positive value in existing use cases as well.

Well, if your chances of getting infected are drastically reduced, then so is the use of the "protect others" effect of wearing the mask, so overall these masks are likely to be very useful.

That said, a slightly modified design that filters air both on the in- and the out- breath might be a good idea. This way, you keep your in-breath filters dry and have some "protect others" effect.

[...] P3 masks, worn properly, with appropriate eye protection while maintaining basic hand hygiene are efficient in preventing SARS-CoV-2 infection regardless of setting.

If this is true, then this is a great idea and it's somewhat suprising that these masks are not in widespread use already.

I suspect the plan is a bit less practical than stated, as I expect there to be problems with compliance, in particular because the mask are mildly unpleasant to wear for prolonged periods.

1Yandong Zhang
The outlook of kids with a respiratory is kind of scary, as shown in HK. I did not see any other issue with this strategy.
3EGI
Thanks for pointing these things out, I probably should have adressed them more. I could think of several reasons for this. * Many (most?) health care professionals do not know of these masks or do not think of them as "medical equipment". * People do not realize that filters can be used multiple times thus dismissing the idea as logistically impossible / even more expensive than FFP masks for everyone * People think that all masks do not work (well) to prevent transmission * People think that these masks are "overkill", not realizing that a well fitting!!! reusable silicone mask is actually much less unleasant to wear than FFP masks. ... problems with compliance ... unpleasant to wear for prolonged periods. Yes, to a degree that is true. This should be addressed by... * Well fitting masks, at least 5 to 10 different types as discribed above with state of the art low resistance filters * Requiring people to wear masks only if there is actual risk of infection as described above * Rigorous enforcement especially in places where there are lots of people around (public transit, dense work places, schools and so on)

They have a copy at our university library. I would need to investigate how to scan it efficiently, but I'm up for it if there isn't an easier way and noone else finds a digital copy.

Definitely Main, I found your post (including the many references) and the discussion very interesting.

I still agree with Eli and think you're "really failing to clarify the issue", and claiming that xyz is not the issue does not resolve anything. Disengaging.

The paper had nothing to do with what you talked about in your opening paragraph

What? Your post starts with:

My goal in this essay is to analyze some widely discussed scenarios that predict dire and almost unavoidable negative behavior from future artificial general intelligences, even if they are programmed to be friendly to humans.

Eli's opening paragraph explains the "basic UFAI doomsday scenario". How is this not what you talked about?

0[anonymous]
The paper's goal is not to discuss "basic UFAI doomsday scenarios" in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans. That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans - it claims to be maximizing human happiness - but in spite of that it does something insanely wicked. So, Eli says: ... and this clearly says that the type of AI he has in mind is one that is not even trying to be friendly. Rather, he talks about how its And then he adds that ... which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend. If you read the paper all of this is obvious pretty quickly, but perhaps if you only skim-read a few paragraphs you might get the wrong impression. I suspect that is what happened.
1Regex
After having read Worm I will say this much: it engages the creative thinking of the reader.

Awesome, a meetup in Cologne. I'll try to be there, too. :)

It depends on the skill difference and the size of the board, on smaller boards the advantage is probably pretty large: Discussion on LittleGolem

2lukeprog
Thanks!

Regarding the drop of unemployment in Germany, I've heard it claimed that it is mainly due to changing the way the unemployment statististics are done, e.g. people who are in temporary, 1€/h jobs and still receiving benefits are counted als employed. If this point is still important, I can look for more details and translate.

EDIT: Some details are here:

It is possible to earn income from a job and receive Arbeitslosengeld II benefits at the same time. [...] There are criticisms that this defies competition and leads to a downward spiral in wages and the l

... (read more)
4Kaj_Sotala
Damnit. Fixed again, hopefully for real this time.
0Stuart_Armstrong
Cheers!
Load More