LESSWRONG
LW

264
Raemon
59593Ω7324948624311
Message
Dialogue
Subscribe

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
22Raemon's Shortform
Ω
8y
Ω
691
mrpalmtree19's Shortform
Raemon1h20

I have an anxious feeling about trying to leverage little free libraries for proselytizing (which doesn't mean it's wrong, but I notice I stopped being interested in Little Free Libraries the more it became clear that the two types of books there are mostly "old books nobody actually liked that much" and "books someone is trying to proselytize")

Reply
Raemon's Shortform
Raemon2h20

I think I mostly agree with everything you say in this last comment, but I don't see how my previous comment disagreed with any of that either?

Yeah it doesn't necessarily disagree with it. But, framing the question:

The non-straightforward-to-me and in fact imo probably in at least some important sense false/confused adjacent thing is captured by stuff like:

  • as a mind M grows, it gets close to never getting stuck
  • as M grows, it gets close to not being silly

seemed like those things were only in some sense false/confused because they are asking the wrong question. 

I think "more advanced" still doesn't feel like really the right way to frame the question, because "advanced" is still very underspecified. 

Reply
Raemon's Shortform
Raemon2h20

Nod, makes sense.

One thing I maybe should note, I don't think Yudkowsky ever actually said "in the limit" per se,  that was me glosseing various things he said, and I'm suddenly worried about subtle games of telephone about whatever he meant.

Another thing I thought of reading this (and maybe @johnswentworth's Framing Practicum finally paying off, is that a better word than "limit" might be "equilibrium."

i.e. this isn't (necessarily) about "there is some f(x), where if you dial up X from 10 to 11 to 100 to 10,000, you expect f(x) to approach some limit". A different angle of looking it is "what are the plausible stable equilibria that a mind could end up in, or the solar-system-system could end up in?"

A system reaching equilibria includes multiple forces pushing on stuff and interacting with each other, until they settle into a shape where it's hard to really move the outcome –until something new shocks the system.

...

Some ~specific things you might care about the equilibrium of:

A. One particular AI mind – given some initial conditions, after somehow achieving a minimum threshold of relentless-creative-resourcefulness, and the ability to modify itself and/or it's environment, and it has whatever combo of goals/impulses it turns out to have.

The equilibrium includes "what will the mind end up doing with itself" and also "how will the outside world try to apply pressure to the mind, and how will the mind apply pressure back?".

B. The human economy/geopolitical-system. Given that there are lots of groups trying to build AI, there's a clear economic incentive to do so if you don't believe in doom, and it's going to get easier over time. (But also, there are reasons for various political factions to oppose this). 

Does this eventually produce a mind, with the conditions to kick off the previous point?

C. The collection of AI minds that end up existing, once some of them hit the minimum relentless-creative-resourcesfullness necessary to kick off A?

...

But translating back into limits: 

Looking at your list of "which of these f(x)s are we talking about?", the answer is "the humanity meta-system that includes all of B." 

"X" is "human labor + resource capital + time, etc".

The "F" I'm most focused on is "the process of looking at the current set of AI systems, and asking 'is there a way to improve how much profit/fame/power we can get out of this?', and then creatively selecting a thing (such as from your list of things above), and then trying it."

(It's also useful to ask "what's F?" re: a given transformer gradient descent architecture, given a set of training data and a process for generating more training data. But, that's a narrower question, and most such systems will not be the "It" that would kill everyone if anyone build it)

...

Having said that:

"f(x)", where f is "all human ingenuity focused on building AGI, + all opposed political focuses", is a confusing type, yes. 

I mentioned elsewhere, the "confusing type" is the problem. (or, "a" problem). We are inside a "Find the Correct Types" problem. The thing to do when you're in a Find the Correct Types problem, is bust out your Handle Confusion and Find the Correct Types  toolkit.

I am not a Type Theory Catgirl, but, some early steps I'd want to take are:

  • map out everything I am confused by that seems relevant (see if some confusions dissolve when I look at them)
  • map out everything important that seems relevant that I'm not confused by
  • map out at least a few different ways of structuring the problem. (including, maybe this isn't actually best thought of as a Type Theory problem)

And part of my response to "f(x) is confusing" is to articulate the stuff above, which hopefully narrows down the confusion slightly. But, I'd also say, before getting to the point of articulating the above, "a'ight, seems like the structure here is something like"

1. AI will probably eventually get built somewhere. It might FOOM. It might takeover. And later, evolution might destroy everything we care about. (You might be uncertain about these, and might confused about some sub-pieces, but I don't think you-in-particular were confused about this bit)

2. There will be some processes that take in resources and turn them into more intelligence. [FLAG: confused about what this process is and what inputs it involved. But, call this confusing thing f(x)]

3. There are lots of different possible shapes of f(x), I'm confused about that

4. But, the reason I care about f(x) is so that I know either a) will a given AI system FOOM or Takeover? or b) is it capable of stopping other things from FOOMing or taking over? and c) is it capable of preventing death-by-evolution, without causing worse side effects?

And #4 is what specifies which possible ways of resolving confusing bits are most useful. It specifically implies we need to be talking about pretty high powerlevels. However you choose to wrap your brain around it, it somehow need to eventually help you think about extremely high power levels.

So, like, yep "in the limit" is confusing and underspecified. But, it's meant to be directing your attention to aspects of the confusingness that are more relevant.

Reply
Raemon's Shortform
Raemon16h30

The thing I care about here is not "what happens as a mind grows", in some abstract sense.

The thing I care about is, "what is the best way for a powerful system to accomplish a very difficult goal quickly/reliably?" (which is what we want the AI for)

As either we deliberately scale up the AI's ability to accomplish stuff, it will be true that:

  • if it is getting stuck, it'd achieve stuff better if got stuck less
  • if it is exploitable in ways that are relevant, it'd be better if it wasn't exploitable
  • if it was acting incoherently in ways that wasted resources, it'd accomplish the goal better
  • if it plays suboptimal moves, it'd achieve the goals better if it it doesn't.
  • if doesn't have the best possible working memory / processing speed, it'd achieve the goals better if it had more.
  • if it doesn't have enough resources to do any the above, it'd achieve the goals better if it had more resources
  • if it could accomplish the above faster if it deliberately self modified to do so, rather than waiting for us to apply more selection pressure to it, it has an incentive to do that.

And... sure, it could not do those things. Then, either Lab A will put more pressure on the AI to accomplish stuff (and some of the above will become more true). Or Lab A won't, and some other Lab B will instead.

And once the AI unlocks "deliberately self-modify" as a strategy to achieve the other stuff, and sufficient resources to do it, then it doesn't matter what Lab A or B does.

Reply
Wei Dai's Shortform
Raemon22h30

Say more about the de-facto eugenics program?

Reply
Raemon's Shortform
Raemon22h142

I've heard ~"I don't really get this concept of 'intelligence in the limit'" a couple times this week. 

Which seems worth responding to, but I'm not sure how.

It seemed like some combination of: "wait, why do we care about 'superintelligence in the limit' as opposed to any particular "superintelligence-in-practice?", as well as "what exactly do we mean by The Limit?" and "why would we think The Limit shaped the way Yudkowsky thinks?"

My impression, based on my two most recent conversations about it, is that this is not only sort of cloudy and confusing feeling to some people, but, also, it's intertwined with a few other things that are separately cloudy and confusing. And also it's intertwined with other things that aren't cloudy and confusing per-se, but there's a lot of individual arguments to keep track of, so it's easy to get lost.

One ontology here is:

  • it's useful to reason with nice abstractions that generalize to different situations.
    • (It's easier to think about such abstractions at extremes, given simple assumptions)
  • its also useful to reason about the nitty-gritty details of a particular implementation of a thing.
  • it's useful to be able to move back and forth between abstractions, and specific implementations.

One person I chatted with seemed to both be frustrated simultaneously with:

[note, not sure if this is a correct summary of them, they can pop up here to clarify if they want]

"Why does this concept of corrigibility need to care about the specifics of 'powerful enough to end the acute risk period?' Why can't we just think about iteratively making an AI more corrigible as we improve it's capabilities? That can't possibly be important to the type-signature of corrigibility?"

and also: "what is this 'The Limit?' and why do I care?"

and also: "Why is MIRI always talking about abstractions and not talking about the specific implementation details of how takeoff will work in practice? It seems sus and castles-on-sand-y"

My answer was "well, the type signature of corrigibility doesn't care about any particular powerful level, but, it's useful to have typed out "what does corrigibility look at in the limit?". But the reason it's useful to specifically think about corrigibility at The Limit is because the actually nitty-gritty-details of what we need corrigibility for require absurd power levels (i.e. preventing loss by immediate AI takeover, and also loss from longterm evolution a couple decades later).

It seemed like what was going on, in this case, was they were attempting to loop through the "abstractions" and "gritty details" and "interplay between the two", but they didn't have a good handle on abstraction-in-The-Limit, and then they couldn't actually complete the loop. And because it was fuzzy to think about the gritty-details without a clear abstraction, and hard to think about abstractions without realistic details, this was making it hard to think about either. (Even though, AFAICT, the problem lay mostly in The Abstract Limit side, not the Realistic Details side)

...

A different person recently seemed to similarly not be grokking "why do we care about the Limit?", but the corresponding problem wasn't about any particular other argument, just, there were a lot of other arguments and they weren't keeping track of all of them at once, and it seemed like they were getting lost in a more mundane way.

...

I don't actually know what to do with any of this, because I'm not sure what's confusing about "Intelligence in the limit." 

(Or: I get that there's a lot of fuzziness there you need to keep track of while reasoning about this. But the basic concept of "well, if it was imperfect at either not-getting-resource-pumped, or making suboptimal game theory choices, or if it gave up when it got stuck, it would know that it wasn't as cognitively powerful as it could be, and would want to find ways to be more cognitively powerful all-else-equal"... seems straightforward to me, and I'm not sure what makes it not straightforward seeming to others).

Reply
shortplav
Raemon1d30

I think this is less important than the other confusing terms in this thread, but something I stumbled into yesterday:

"Intelligence"/"Capable" -> "Relentlessly Resourceful/Creative" [1]

(at least in some contexts)

i.e. the reason you expect a superintelligence to be difficult to control, is not exactly the raw intelligence. It's that (some people think) the way something succeeds at being truly superintelligent requires being relentlessly resourceful.

(If it wasn't relentlessly resourceful, it maybe could one-shot a large-but-shallow set of problems in a way that wasn't concerning. But, then, if it hit a snag, it would get stuck, and it would be less useful than something that didn't get stuck when it hit snags)

I like this because

a) it highlights what the problem is, more clearly.

and b), it highlights "if you could build a very useful powerful tool that succeeds without being relentlessly resourceful, that's maybe a useful avenue."

For examples of relentlessly resourceful people, see:

  • Startup founders
  • Prolific Inventors
  • Elon Musk in a particularly famous way that includes both traditional startup-founder-y but also technical innovation
  • Richard Feyman (who found he had a hard time doing important work after working on the atom bomb, but then solved the problem by changing his mindset to deliberately not focus on "important" things and just follow his interests, which eventually led to more good ideas)

For people that are smart but not obviously "relentless resourceful", see "one hit wonders", or people who have certain kinds of genius but it only comes in flashes and they don't really know how to cultivate it on purpose.

  1. ^

    Resourceful and Creative sort of mean the same thing and for conciseness I'm going to mostly say "Relentlessly Resourceful" since it's more fun evocative, but, there are some connotations creativity has that are important to not loose track of. i.e. not just able to fully exhaust all local resources, but, able to think from a wide variety of angles and see entirely different solutions that lie completely outside it's current set of affordances.

Reply
Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
Raemon1d70

I definitely think the "benevolent godlike singleton" is just as likely to fail in horrifying ways as any other scenario. Once you permanently give away all your power, how do you guarantee any bargain?

This is why you won't build a benevelont godlike singleton until you have vastly more knowledge than we currently have (i.e, with augmenting human intelligence, etc)[1]

  1. ^

    I'm not sure I buy the current orientation Eliezer/Nate have to augmented human intelligence in the context of global shutdown, but, does seem like a thing you want before building anything that's likely to escalate to full overwhelming-in-the-limit-superintelligence.

Reply
Thomas Kwa's Shortform
Raemon1d20

Low polarization. If there's high polarization with a strong opposing side, the opposing side can point to the radicals in order to hurt the moderates.[2]

I'm not 100% sure what this means (but it sounds interesting)

Agreed that this is a good frame.

Reply
Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
Raemon1d20

Thing I wanted to briefly check before responding to some other comments – does your work here particularly route through criticism or changing of the VNM axioms frame? 

Reply
Load More
Step by Step Metacognition
Feedbackloop-First Rationality
The Coordination Frontier
Privacy Practices
Keep your beliefs cruxy and your frames explicit
LW Open Source Guide
Tensions in Truthseeking
Project Hufflepuff
Rational Ritual
Load More (9/10)
124Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
2d
17
53</rant> </uncharitable> </psychologizing>
3d
11
78Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
5d
49
93The Illustrated Petrov Day Ceremony
8d
11
97"Shut It Down" is simpler than "Controlled Takeoff"
10d
29
70Accelerando as a "Slow, Reasonably Nice Takeoff" Story
12d
12
193The title is reasonable
15d
128
45Meetup Month
17d
10
125Simulating the *rest* of the political disagreement
1mo
16
98Yudkowsky on "Don't use p(doom)"
1mo
39
Load More
AI Consciousness
a month ago
AI Auditing
2 months ago
(+25)
AI Auditing
2 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
(+317)
Sandbagging (AI)
6 months ago
Sandbagging (AI)
6 months ago
(+88)
AI "Agent" Scaffolds
6 months ago
Load More