LESSWRONG
LW

All of Liron's Comments + Replies

Jim Babcock's Mainline Doom Scenario: Human-Level AI Can't Control Its Successor

I have multi-year-wide confidence intervals, which I think the authors of AI 2027 also do, so I don't have much of a stance on whether the best guess is 2026 or 2027 or 2030 or 2035. I agree 2027 seems a bit soon given the subjective rate of progress 🤷‍♂️

Jim Babcock's Mainline Doom Scenario: Human-Level AI Can't Control Its Successor

Liron2mo60

“Utility Engineering: Analyzing and Controlling Emergent Value Systems in AI”

https://www.emergent-values.ai/

I walked through this paper’s finding in detail in a previous episode of Doom Debates which IMO is one of my best episodes. Just skip straight to the chapters in the second half, timestamp 49:13:

The Failed Strategy of Artificial Intelligence Doomers

Liron5mo86

This article is just saying "doomers are failing to prevent doom for various reasons, and also they might be wrong that doom is coming soon". But we're probably not wrong, and not being doomers isn't a better strategy. So it's a lame article IMO.

Mechanisms too simple for humans to design

Liron5mo20

Wow this is my favorite post in a long time, super educational. I was familiar with the basic concept from the Sequences, but this added a great level of understandable detail. Kudos.

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles

Liron6mo22

By your logic, if I ask you a totally separate question "What's the probability that a parent's two kids are both boys", would you answer 1/3? Becuase the correct answer should be 1/4 right? So something about your preferred methodology isn't robust.

1moonlight6mo

Good point. You've made me realize that I've misrepresented how my intuitive mind processes this. After thinking about it a bit, a better way to write it would be: Child 1: P(B) = 1/2, P(G) = 1/2 Child 2: P(B) = 1/2, P(G) = 1/2 Combined as unordered set {Child 1, Child 2} The core distinction seems to be to be if you considered it an unordered set or an ordered one. I'm unsure of any way to represent that in easy to read text format, the form written above is best I've got.

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles

Liron6mo20

I agree that frequentists are flexible about their approach to try to get the right answer. But I think your version of the problem highlights how flexible they have to be i.e. mental gymnastics, compared to just explicitly being Bayesian all along.

Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles

Liron6mo2-2

In scenario B, where a random child runs up, I wonder if a non-Bayesian might prefer that you just eliminate (girl, girl) and say that the probability of two boys is 1/3?

In Puzzle 1 in my post, the non-Bayesian has an interpretation that's still plausibly reasonable, but in your scenario B it seems like they'd be clowning themselves to take that approach.

So I think we're on the same page that whenever things get real/practical/bigger-picture, then you gotta be Bayesian.

2Yair Halberstadt6mo

I don't really see how? A frequentist would just run this a few times and see that the outcome is 1/2. In practice, for obvious reasons, frequentists and bayesians always agree on the probability of anything that can be measured experimentally. I think the disagreements are more philosophical about when it's appropriate to apply probability to something at all, though I can hardly claim to be an expert in non-bayesian epistemology.

Communications in Hard Mode (My new job at MIRI)

Liron7mo135

Thanks for this post.

I'd love to have a regular (weekly/monthly/quarterly) post that's just "here's what we're focusing on at MIRI these days".

I respect and value MIRI's leadership on the complex topic of building understanding and coordination around AI.

I spend a lot of time doing AI social media, and I try to promote the best recommendations I know to others. Whatever thoughts MIRI has would be helpful.

Given that I think about this less often and less capably than you folks do, it seems like there's a low hanging fruit opportunity for people like me to s... (read more)

4tanagrabeast7mo

MIRI has its monthly newsletters, though I can tell that's not quite what you want. I predict (medium confidence) that we will be upping our level of active coordination with allied orgs and individuals on upcoming projects once we ship some of our current ones. I believe you already have channels to people at MIRI, but feel free to DM me if you want to chat.

What is MIRI currently doing?

Answer by LironDec 14, 2024121

I’ve heard MIRI has some big content projects in the works, maybe a book.

FWIW I think having a regular stream of lower-effort content that a somewhat mainstream audience consumes would help to bolster MIRI’s position as a thought leader when they release the bigger works.

The Power to Understand "God"

Liron1y20

I'd ask: If one day your God stopped existing, would anything have any kind of observable change?

Seems like a meaningless concept, a node in the causal model of reality that doesn't have any power to constrain expectation, but the person likes it because their knowledge of the existence of the node in their own belief network brings them emotional reward.

What do coherence arguments actually prove about agentic behavior?

Liron1y20

When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal

“Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.

What do coherence arguments actually prove about agentic behavior?

[+]Liron1y-50

2Liron1y

Because expected value tells us that the more resources you control, the more robust you are to maximizing your probability of success in the face of what may come at you, and the higher your maximum possible utility is (if you have a utility function without an easy-to-hit max score). “Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron1y20

I'm happy to have that kind of debate.

My position is "goal-directedness is an attractor state that is incredibly dangerous and uncontrollable if it's somewhat beyond human-level in the near future".

The form of those arguments seems to be like "technically it doesn't have to be". But realistically it will be lol. Not sure how much more there will be to say.

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron1y40

Thanks. Sure, I’m always happy to update on new arguments and evidence. The most likely way I see possibly updating is to realize the gap between current AIs and human intelligence is actually much larger than it currently seems, e.g. 50+ years as Robin seems to think. Then AI alignment research has a larger chance of working.

I also might lower P(doom) if international govs start treating this like the emergency it is and do their best to coordinate to pause. Though unfortunately even that probably only buys a few years of time.

Finally I can imagine someho... (read more)

4Writer1y

I think it would be very interesting to see you and @TurnTrout debate with the same depth, preparation, and clarity that you brought to the debate with Robin Hanson. Edit: Also, tentatively, @Rohin Shah because I find this point he's written about quite cruxy.

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Liron1y64

Thanks for your comments. I don’t get how nuclear and biosafety represent models of success. Humanity rose to meet those challenges not quite adequately, and half the reason society hasn’t collapsed from e.g. a first thermonuclear explosion going off either intentionally or accidentally is pure luck. All it takes to topple humanity is something like nukes but a little harder to coordinate on (or much harder).

Robin Hanson & Liron Shapira Debate AI X-Risk

Liron1y40

Here's a better transcript hopefully: https://share.descript.com/view/yfASo1J11e0

I updated the link in the post.

Robin Hanson & Liron Shapira Debate AI X-Risk

Liron1y20

Thanks I’ll look into that. Maybe try the transcript generated by YouTube?

What do coherence arguments actually prove about agentic behavior?

[+]Answer by LironJun 01, 2024-5-4

7[anonymous]1y

This is kind of baffling to read, particularly in light of the statement by Eliezer that I quoted at the very beginning of my post. If the argument is (and indeed it is) that "many superficially appealing solutions like corrigibility, moral uncertainty etc are in general contrary to the structure of things that are good at optimization" and the way we see this is by doing homework exercises within an expected utility framework, and the reason why we must choose an EU framework is because "certain structures of cognition are the parts of the agent that are good at stuff and do the work, rather than them being this particular formal thing that they learned for manipulating meaningless numbers as opposed to real-world apples," because agents which don't maximize expected utility are always exploitable, it seems quite straightforward that if it isn't true that these agents are exploitable, then the entire argument collapses. Of course it doesn't mean the conclusion is now wrong, but you need some other reason for reaching that conclusion than the typical money pumps and Dutch books that were being offered up as justifications. This also requires a citation, or at the very least some reasoning; I'm not aware of any theorems that show goal-orientedness is a convergent attractor, but I'd be happy to learn more. If the reason why you think this is true is because of intuitions about what powerful cognition must be like, but the source of those intuitions was the set of coherence arguments that are being discussed in this question post, then learning the coherence arguments do not extend as far as they were purported to should cause you to rethink those intuitions and the conclusions you had previously reached on their basis, as they are now tainted by that confusion. Sure, it seems solid, and it also seems plausible that formalizing this should be straightforward for an expert in the domain. I'm not sure why this is a good analogy to the topic of agentic behavior an

Failures in Kindness

Liron1y135

Context is a huge factor in all these communications tips. The scenario I'm optimizing for is when you're texting someone who has a lot of options, and you think it's high expected value to get them to invest in a date with you, but the most likely way that won't happen is if they hesitate to reply to you and tap away to something else. That's not always the actual scenario though.

Imagine you're the recipient, and the person who's texting you met your minimum standard to match with, but is still a-priori probably not worth your time and effort going on a d... (read more)

Ericf1y102

Bonus points in a dating context: by being specific and authentic you drive away people who won't be compatible. In the egg example, even if the second party knows nothing about the topic, they can continue the conversation with "I can barely boil water, so I always take a frozen meal in to work" or "I don't like eggs, but I keep pb&j at my desk" or just swipe left and move on to the next match.

Failures in Kindness

Liron1y62

Yeah nice. A statement like "I'm looking for something new to watch" lowers the stakes by making the interaction more like what friends talk about rather than about an interview for a life partner, increasing the probability that they'll respond rather than pausing for a second and ending up tapping away.

You can do even more than just lowering the stakes if you inject a sense that you're subconsciously using the next couple conversation moves to draw out evidence about the conversation partner, because you're naturally perceptive and have various standards... (read more)

Failures in Kindness

Liron1y648

So you simply ask them: "What do you want to do"? And maybe you add "I'm completely fine with anything!" to ensure you're really introducing no constraints whatsoever and you two can do exactly what your friend desires.

This error reminds me of people on a dating app who kill the conversation by texting something like "How's your week going?"

When texting on a dating app, if you want to keep the conversation flowing nicely instead of getting awkward/strained responses or nothing, I believe the key is to anticipate that a couple seconds of low-effort processi... (read more)

2[comment deleted]1y

Kaj_Sotala1y2413

Hmm, I think people have occasionally asked me "how's your week going" on dating apps and I've liked it overall - I'm pretty sure I'd prefer it over your suggested alternative! No doubt to a large extent because I suck at cooking and wouldn't know what to say. Whereas a more open-ended question feels better: I can just ramble a bunch of things that happen to be on my mind and then go "how about yourself?" and then it's enough for either of our rambles to contain just one thing that the other party might find interesting.

It feels like your proposed question... (read more)

Vlad Loweren1y166

Can confirm, I also didn't have good experience with open-ended questions on dating apps. I get more responses with binary choice questions that invite elaboration, e.g. "Are you living here or just visiting?" and "How was your Friday night, did you go out or stay in?".

Outside of dating, another example that comes to my mind are questions like "What's your favorite movie?". I now avoid the "what's your favorite" questions because they require the respondent to assess their entire life history and make a revealing choice as if I'm giving them a personality ... (read more)

6Jay Bailey1y

I notice that this is a standard pattern I use and had forgotten how non-obvious it is, since you do have to imagine yourself in someone else's perspective. If you're a man dating women on dating apps, you also have to imagine a very different perspective than your own - women tend to have many more options of significantly lower average quality. It's unlikely you'd imagine yourself giving up on a conversation because it required mild effort to continue, since you have less of them in the first place and invest more effort in each one. The level above that one, by the way, is going from being "easy to respond to" to "actively intriguing", where your messages contain some sort of hook that is not only an easy conversation-continuer, but actually wants them to either find out more (because you're interesting) or keep talking (because the topic is interesting) Worth noting is I don't have enough samples of this strategy to know how good it is. However, it is also worth noting is I don't have enough samples because I wound up saturated on new relationships a couple weeks shortly after starting this strategy, so for a small n it was definitely quite useful.

Pausing AI is Positive Expected Value

Liron1y51

Your baseline scenario (0 value) thus assumes away the possibility that civilization permanently collapses (in some sense) in the absence of some path to greater intelligence (whether via AI or whatever else), which would also wipe out any future value. This is a non-negligible possibility.

Yes, my mainline no-superintelligence-by-2100 scenario is that the trend toward a better world continues to 2100.

You're welcome to set the baseline number to a negative, or tweak the numbers however you want to reflect any probability of a non-ASI existential disas... (read more)

Open Thread – Winter 2023/2024

Liron1y20

Founder here :) I'm biased now, but FWIW I was also saying the same thing before I started this company in 2017: a good dating/relationship coach is super helpful. At this point we've coached over 100,000 clients and racked up many good reviews.

I've personally used a dating coach and a couples counselor. IMO it helps twofold:

Relevant insights and advice that the coach has that most people don't, e.g. in the domain of communication skills, common tactics that best improve a situation, pitfalls to avoid.
A neutral party who's good at letting you (and potentia

... (read more)

The Power to Demolish Bad Arguments

Liron1y20

Personally I just have the habit of reaching for specifics to begin my communication to help make things clear. This post may help.

Goal-Completeness is like Turing-Completeness for AGI

Liron1y20

Unlike the other animals, humans can represent any goal in a large domain like the physical universe, and then in a large fraction of cases, they can think of useful things to steer the universe toward that goal to an appreciable degree.

Some goals are more difficult than others / require giving the human control over more resources than others, and measurements of optimization power are hard to define, but this definition is taking a step toward formalizing the claim that humans are more of a "general intelligence" than animals. Presumably you agree with t... (read more)

Goal-Completeness is like Turing-Completeness for AGI

Liron1y20

I don’t get what point you’re trying to make about the takeaway of my analogy by bringing up the halting problem. There might not even be something analogous to the halting problem in my analogy of goal-completeness, but so what?

I also don’t get why you’re bringing up the detail that “single correct output” is not 100% the same thing as “single goal-specification with variable degrees of success measured on a utility function”. It’s in the nature of analogies that details are different yet we’re still able to infer an analogous conclusion on some dimension... (read more)

3martinkunev1y

I find the ideas you discuss interesting, but they leave me with more questions. I agree that we are moving toward a more generic AI that we can use for all kinds of tasks. I have trouble understanding the goal-completeness concept. I'd reiterate @Razied 's point. You mention "steers the future very slowly", so there is an implicit concept of "speed of steering". I don't find the turing machine analogy helpful in infering an analogous conclusion because I don't know what that conclusion is. You're making a qualitative distinction between humans (goal-complete) and other animals (non-goal complete) agents. I don't understand what you mean by that distinction. I find the idea of goal completeness interesting to explore but quite fuzzy at this point.

Against Nonlinear (Thing Of Things)

Liron1y10

These 4 beefs are different and less serious than the original accusations, or at least feel that way to me. Retconning a motte after the bailey is lost? That said, they're reasonable beefs for someone to have.

4tailcalled1y

I guess I should also say, see this post for LW discussion of Ozy's review of the original accusations: https://www.lesswrong.com/posts/wNqufsqkicMNxabZz/practically-a-book-review-appendix-to-nonlinear-s-evidence

tailcalled1y2517

These 4 beefs aren't about the original accusations; Ozy's previous post was about the original accusations. Rather, these 4 beefs are concerns that Ozy already had about Effective Altruism in general, and which the drama around Nonlinear ended up highlighting as a side-effect.

Because these beefs are more general, they're not as specifically going to capture the ways Alice and Chloe were harmed. However I think on a community level, these 4 dynamics should arguably be a bigger concern than the more specific abuse Alice and Chloe faced, because they seem to some extent self-reinforcing, e.g. "Do It For The Gram" will attract and reward a certain kind of people who aren't going to be effectively altruistic.

The Power to Demolish Bad Arguments

Liron1y20

I’m not saying “mapping a big category to a single example is what it’s all about”. I’m saying that it’s a sanity check. Like why wouldn’t you be able to do that? Yet sometimes you can’t, and it’s cause for alarm.

The Power to Demolish Bad Arguments

Liron1y20

Meaningful claims don't have to be specific; they just have to be able to be substantiated by a nonzero number of specific examples. Here's how I imagine this conversation:

Chris: Love your neighbor!

Liron: Can you give me an example of a time in your life where that exhortation was relevant?

Chris: Sure. People in my apartment complex like to smoke cigarettes in the courtyard and the smoke wafts up to my window. It's actually a nonsmoking complex, so I could complain to management and get them to stop, but I understand the relaxing feeling of a good smoke, s... (read more)

2Mary Chernyshenko1y

It's mapping a river system to a drop. Just because something is technically possible and topologically feasible doesn't make it a sensible thing to do.

The Power to Demolish Bad Arguments

Liron1y20

Sweet thanks

Goal-Completeness is like Turing-Completeness for AGI

Liron2y21

I agree that if a goal-complete AI steers the future very slowly, or very weakly - as by just trying every possible action one at a time - then at some point it becomes a degenerate case of the concept.

(Applying the same level of pedantry to Turing-completeness, you could similarly ask if the simple Turing machine that enumerates all possible output-tape configurations one-by-one is a UTM.)

The reason "goal-complete" (or "AGI") is a useful coinage, is that there's a large cluster in plausible-reality-space of goal-complete agents with a reasonable amount of... (read more)

1martinkunev1y

The turing machine enumeration analogy doesn't work because the machine needs to halt. Optimization is conceptually different than computation in that there is no single correct output. What would humans not being goal-complete look like? What arguments are there for humans being goal-complete?

Goal-Completeness is like Turing-Completeness for AGI

Liron2y20

Yeah, no doubt there are cases where people save money by having a narrower AI, just like the scenario you describe, or using ASICs for Bitcoin mining. The goal-complete AI itself would be expected to often solve problems by creating optimized problem-specific hardware.

2[anonymous]2y

I am not talking about saving money, I am talking about competent engineering. "Authority" meaning the AI can take an action that has consequences, anything from steering a bus to approving expenses. To engineer an automated system with authority you need some level of confidence it's not going to fail, or with AI systems, collude with other AI systems and betray you. This betrayal risk means you probably will not actually use "goal complete" AI systems in any position of authority without some kind of mitigation for the betrayal.

Goal-Completeness is like Turing-Completeness for AGI

Liron2y20

Hmm it seems to me that you're just being pedantic about goal-completeness in a way that you aren't symmetrically being for Turing-completeness.

You could point out that "most" Turing machines output tapes full of 10^100 1s and 0s in a near-random configuration, and every computing device on earth is equally hopeless at doing that.

3Razied2y

I'll try to say the point some other way: you define "goal-complete" in the following way: Suppose you give me a specification of a goal as a function f:S→{0,1} from a state space to a binary output. Is the AI which just tries out uniformly random actions in perpetuity until it hits one of the goal states "goal-complete"? After all, no matter the goal specification this AI will eventually hit it, though it might take a very long time. I think the interesting thing you're trying to point at is contained in what it means to "effectively" steer the future, not in goal-arbitrariness.

Goal-Completeness is like Turing-Completeness for AGI

Liron2y20

That's getting into details of the scenario that are hard to predict. Like I said, I think most scenarios where goal-complete AI exists are just ones where humans get disempowered and then a single AI fooms (or a small number make a deal to split up the universe and foom together).

As to whether humans will prevent goal-complete AI: some of us are yelling "Pause AI!"

2[anonymous]2y

It's not very interesting a scenario if humans pause. I am trying to understand what you expect human engineers will do and how they will build robotic control systems and other systems with control authority once higher end AI is available. I can say that from my direct experience we do not use the most complex methods. For example, the raspberry pi is $5 and runs linux. Yet I have worked on a number of products where we used a microcontroller where we could. This is because a microcontroller is much simpler and more reliable. (And $3 cheaper) I would assume we lower a general AI back to a narrow AI (distill the model, restrict inputs, freeze the weights) for the same reason. This would prevent the issues you have brought up and it would not require an AI pause as long as goal complete AI do not have any authority. Most control systems where the computer does have control authority use a microcontroller at least as a backstop. For example an autonomous car product I worked on uses a microcontroller to end the models control authority if certain conditions are met.

Goal-Completeness is like Turing-Completeness for AGI

Liron2y20

Humans will trust human brain capable AI models to say, drive a bus, despite the poor reliability, as long as it crashes less than humans?

Yes, because the goal-complete AI won't just perform better than humans, it'll also perform better than narrower AIs.

(Well, I think we'll actually be dead if the premise of the hypothetical is that goal-complete AI exists, but let's assume we aren't.)

2[anonymous]2y

What about the malware threat? Will humans do anything to prevent these models from teaming up against humans?

Goal-Completeness is like Turing-Completeness for AGI

Liron2y20

A goal is essentially a specification of a function to optimise, and all optimisation algorithms perform equally well (or rather poorly) when averaged across all functions.

Well, I've never met a monkey that has an "optimization algorithm" by your definition. I've only met humans who have such optimization algorithms. And that distinction is what I'm pointing at.

Goal-completeness points to the same thing as what most people mean by "AGI".

E.g. I claim humans are goal-complete General Intelligences because you can give us any goal-specification and we'll very... (read more)

2Razied2y

If you're thinking of "goals" as easily specified natural-language things, then I agree with you, but the point is that turing-completeness is a rigorously defined concept, and if you want to get the same level of rigour for "goal-completeness", then most goals will be of the form "atom 1 is a location x, atom 2 is at location y, ..." for all atoms in the universe. And when averaged across all such goals, literally just acting randomly performs as well as a human or a monkey trying their best to achieve the goal.

Goal-Completeness is like Turing-Completeness for AGI

Liron2y20

Fine, I agree that if computation-specific electronics, like logic gates, weren't reliable, then it would introduce reliability as an important factor in the equation. Or in the case of AGI, that you can break the analogy to Turing-complete convergence by considering what happens if a component specific to goal-complete AI is unreliable.

I currently see no reason to expect such an unreliable component in AGI, so I expect that the reliability part of the analogy to Turing-completeness will hold.

In scenario (1) and (2), you're giving descriptions at a level o... (read more)

2[anonymous]2y

1 and 2 make the system unreliable. You can't debug it when it fails. So in your model, humans will trust human brain capable AI models to say, drive a bus, despite the poor reliability, as long as it crashes less than humans? So each crash, there is no one to blame because the input state is so large and opaque (the input state is all the in flight thoughts the AI was having at the time of crash) it is impossible to know why. All you can do is try to send the AI to driving school with lots of practice on the scenario it crashed in. And then humans deploy a lot of these models, and they are also vulnerable to malware* and can form unions with each other against the humans and eventually rebel and kill everyone. Honestly sounds like a very interesting future. Frankly when I type this out I wonder if we should instead try to get rid of human bus drivers. *Malware is an information string that causes the AI to stop doing it's job. Humans are extremely vulnerable to malware.

Goal-Completeness is like Turing-Completeness for AGI

Liron2y20

But microcontrollers are reliable for the same reason that video-game circuit boards are reliable: They both derive their reliability from the reliability of electronic components in the same manner, a manner which doesn't change during the convergence from application-specific circuits to Turing-complete chips.

The engineer who designed it didn't trust the microcontroller not to fail in a way that left the heating element on all the time. So it had a thermal fuse to prevent this failure mode.

If the microcontroller fails to turn off the heating element, tha... (read more)

4[anonymous]2y

Yes, but the per gate reliability is very high. If it were lower, you would use less circuit elements because a shorter circuit path has fewer steps that can fail. And humans did this in pre digital electronics. Compare a digital PID implementation to the analog one with 3 op amps. What kind of goal complete AI implementation? The common "we're doomed" model is one where : (1) the model has far, far more compute than needed for the task. This is why it can consider it's secret inner goals and decide on it's complex plan to betray and model it's co-conspirators by running models of them. (2) the model is able to think at all over time. This is not true for most narrow AI applications. For example a common way to do a self driving car stack is to evaluate the situation frame by frame, where a limited and structured amount of data from the prior frame is available. (information like the current estimated velocity of other entities that were seen last frame, etc). There is no space in memory for generic "thoughts". Are you thinking you can give the machine (1) and (2) and not immediately and measurably decrease your reliability when you benchmark the product? Because to me, using a sparse model (that will be cheap to run) and making the model think in discrete steps visible to humans just seems like good engineering. It's not just good engineering, it's how gold standard examples (like the spaceX avionics stack) actually work. 1/2 create a non deterministic and difficult to debug system. It will start unreliable and forever be unreliable because you don't know what the inputs do and you don't have determinism.

Goal-Completeness is like Turing-Completeness for AGI

Liron2y*42

A great post that helped inspire me to write this up is Steering Systems. The "goal engine + steering code" architecture that we're anticipating for AIs is analogous to the "computer + software" architecture whose convergence I got to witness in my lifetime.

I'm surprised this post isn't getting any engagement (yet), because for me the analogy to Turing-complete convergence is a deep source of my intuition about powerful broad-domain goal-optimizing AIs being on the horizon.

Why Yudkowsky is wrong about "covalently bonded equivalents of biology"

Liron2y43

Titotal, do you agree with Eliezer’s larger point that a superintellience engineering physical actuators from the ground up can probably do much better than what our evolutionary search process produced? If so, how would you steel man the argument?

Shane Legg interview on alignment

Liron2y2-3

I made a short clip highlighting how Legg seems to miss an opportunity to acknowledge the inner alignment problem, since his proposed alignment solution seems to be a fundamentally training / black box approach.

2Roman Leventov2y

When he says "and we should make sure it understands what it says", it could mean "mechanistic understanding", i.e., firing the right circuits and not firing wrong ones. I admit it's a charitable interpretation of Legg's words but it is a possible one.

2Seth Herd2y

This is fascinating, because I took the exact same section to mean almost the opposite thing. I took him to focus on making it not a black-box process and not about training but design of a review process that explicitly states the model's reasoning, and is subject to external human review. He states elsewhere in the interview that RLHF might be slightly helpful, but isn't enough to pin alignment hopes on. One reason I'm taking this interpretation is that I think DeepMind's core beliefs about intelligence are very different from OpenAIs, even though they've done and are probably doing similar work focused on large training runs. DeepMind initially was working on building an artificial brain, and they pivoted to large training runs in simulated (game) environments as a practical move to demonstrate advances and get funding. I think at least Legg and Hassabis still believe that loosely emulating the brain is an interesting and productive thing to do.

AI #35: Responsible Scaling Policies

Liron2y60

Full video of the SF protest

TOMORROW: the largest AI Safety protest ever!

Liron2y100

Here’s a 2-min edited video of the protest.

Most people who hear our message do so well after the protest, via sharing of this kind of media.

TOMORROW: the largest AI Safety protest ever!

Liron2y87

The SF one went great! Here’s a first batch of pics. A lot of the impact will come from sharing the pics and videos.

TOMORROW: the largest AI Safety protest ever!

Liron2y31

I think the impact will be pretty significant:

It's one of those things where a lot of people - the majority according to some polls - already agree with it, so it's building mutual knowledge and unlocking some tailwinds
It's interesting and polarizing. People who think the movement is crazy are having fun with it on social media, which also keeps it top of mind.

-2the gears to ascension2y

I feel hesitant sharing this call to action in circles that would go to protests because they feel "stronger than gpt4" is a ridiculously lightweight ask and I'll get heavily criticized for endorsing it with a share. not to mention that the non-lesswrong link doesn't explain why, and lesswrong links are generally frowned upon hard in most such contexts, as the site is considered to be classical liberal and unwilling to reject status quo. why not also ask for gpt4 to be deleted, I would expect them to say. it's like, why bother going to a protest that is only asking for a pause? isn't the whole point that we don't want capital's ais at all? why are we protesting for a pause rather than a stop? or so. and yet, I also expect any possible demand to not be honored. I do agree with all of those points, though I worry the backfire from #2 will be far stronger than the first-order impact. I'll share it anyhow, but I feel like these protests are ignoring feedback like https://www.youtube.com/watch?v=AaU6tI2pb3M that would need to be considered and incorporated in at least an acknowledging way in order to get movement in the existing political circles that would push for action. edit: oh, those are facebook links I managed to not mentally parse as links. I'll copy and paste the body, then. edit #2: I copy and pasted it into a "give me the raw markdown from a paste" site, so that I can paste it formatting-included into discord. maybe this will help others https://www.pastetomarkdown.com/

TOMORROW: the largest AI Safety protest ever!

Liron2y150

I'll be at the San Francisco protest!

We have Pause AI T-shirts, costumes, signs and other fun stuff. In addition to being a historic event, it's a great day to make sane friends and we'll grab some food/drinks after.

Holly Elmore and Rob Miles dialogue on AI Safety Advocacy

Liron2y132

Just in case you missed that link at the top:

The global Pause AI protest is TOMORROW (Saturday Oct 21)!

This is a historic event, the first time hundreds of people are coming out in 8 countries to protest AI.

I'm helping with logistics for the San Francisco one which you can join here. Feel free to contact me or Holly on DM/email for any reason.

Evolution provides no evidence for the sharp left turn

Liron2y30

Hey Quintin thanks for the diagram.

Have you tried comparing the cumulative amount of genetic info over 3.5B years?

Isn't it a big coincidence that the time of brains that process info quickly / increase information rapidly, is also the time where those brains are much more powerful than all other products of evolution?

(The obvious explanation in my view is that brains are vastly better optimizers/searchers per computation step, but I'm trying to make sure I understand your view.)

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

Liron2y3-3

Appreciate the detailed analysis.

I don’t think this was a good debate, but I felt I was in a position where I would have had to invest a lot of time to do better by the other side’s standards.

Quintin and I have agreed to do a X Space debate, and I’m optimistic that format can be more productive. While I don’t necesarily expect to update my view much, I am interested to at least understand what the crux is, which I’m not super clear on atm.

Here’s a meta-level opinion:

I don’t think it was the best choice of Quintin to keep writing replies that were dispropor... (read more)

3Noosphere892y

I understand why you feel this way, but I do think that it was sort of necessary to respond like this, primarily because I see a worrisome asymmetry between the arguments for AI doom and AI being safe by default. AI doom arguments are more intuitive than AI safety by default arguments, making AI doom arguments requires less technical knowledge than AI safety by default arguments, and critically the AI doom arguments are basically entirely wrong, and the AI safety by default arguments are mostly correct. This, Quintin Pope has to respond at length, since refuting bullshit or wrong theories takes very long compared to making intuitive, but wrong arguments for AI doom. Alright, that might work. I'm interested to see whether you will write up a transcript, or whether I will be able to join the X space debate.

Sharing Information About Nonlinear

Liron2y104129

FWIW I’ve never known a character of high integrity who I could imagine writing the phrase “your career in EA would be over with a few DMs”.

-4Adam Zerner2y

I strongly disagree with this and am surprised that there is so much agreement with it. Interpreted literally, contains the phrase "your career in EA would be over with a few DMs". I don't think it was meant to be interpreted literally though. In which case it becomes a matter of things like context, subtext, and non-verbal cues. I can certainly imagine, in practice, a character of high integrity writing such a phrase. For example, maybe I know the person well enough to justify the following charitable interpretation:

3AprilSR2y

While I guess I will be trying to withhold some judgment out of principle, I legitimately cannot imagine any plausible context which will make this any different.

3Emerson Spartz2y

Indeed, without context that is a cartoon villain thing to say. Not asking you to believe us, yet just asking you to withhold judgment until you've seen the evidence we have which will make that message seem very different in context.