Lukas_Gloor - LessWrong

In order to submit a question to the benchmark, people had to run it against the listed LLMs; the question would only advance to the next stage once the LLMs used for this testing got it wrong.

When is reward ever the optimization target?

Lukas_Gloor16d20

So I think the more rational and cognitively capable a human is, the more likely they'll optimize more strictly and accurately for future reward.

If this is true at all, it's not going to be a very strong effect, meaning you can find very rational and cognitively capable people who do the opposite of this in decision situations that directly pit reward against the things they hold most dearly. (And it may not be true because a lot of personal hedonists tend to "lack sophistication," in the sense that they don't understand that their own feelings of valuing nothing but their own pleasure is not how everyone else who's smart experiences the world. So, there's at least a midwit level of "sophistication" where hedonists seem overrepresented.)

Maybe it's the case that there's a weak correlation that makes the quote above "technically accurate," but that's not enough to speak of reward being the optimization target. For comparison, even if it is the case that more intelligent people prefer classical music over k-pop, that doesn't mean classical music is somehow inherently superior to k-pop, or that classical music is "the music taste target" in any revealing or profound sense. After all, some highly smart people can still be into k-pop without making any mistake.

I've written about this extensively here and here. Some relevant exercepts from the first linked post:

One of many takeaways I got from reading Kaj Sotala’s multi-agent models of mind sequence (as well as comments by him) is that we can model people as pursuers of deep-seated needs. In particular, we have subsystems (or “subagents”) in our minds devoted to various needs-meeting strategies. The subsystems contribute behavioral strategies and responses to help maneuver us toward states where our brain predicts our needs will be satisfied. We can view many of our beliefs, emotional reactions, and even our self-concept/identity as part of this set of strategies. Like life plans, life goals are “merely” components of people’s needs-meeting machinery.^[8]
Still, as far as components of needs-meeting machinery go, life goals are pretty unusual. Having life goals means to care about an objective enough to (do one’s best to) disentangle success on it from the reasons we adopted said objective in the first place. The objective takes on a life of its own, and the two aims (meeting one’s needs vs. progressing toward the objective) come apart. Having a life goal means having a particular kind of mental organization so that “we” – particularly the rational, planning parts of our brain – come to identify with the goal more so than with our human needs.^[9]
To form a life goal, an objective needs to resonate with someone’s self-concept and activate (or get tied to) mental concepts like instrumental rationality and consequentialism. Some life goals may appeal to a person’s systematizing tendencies and intuitions for consistency. Scrupulosity or sacredness intuitions may also play a role, overriding the felt sense that other drives or desires (objectives other than the life goal) are of comparable importance.

[...]
Adopting an optimization mindset toward outcomes inevitably leads to a kind of instrumentalization of everything “near term.” For example, suppose your life goal is about maximizing the number of your happy days. The rational way to go about your life probably implies treating the next decades as “instrumental only.” On a first approximation, the only thing that matters is optimizing the chances of obtaining indefinite life extension (potentially leading to more happy days). Through adopting an outcome-focused optimizing mindset, seemingly self-oriented concerns such as wanting to maximize the number of happiness moments turn into an almost “other-regarding” endeavor. After all, only one’s far-away future selves get to enjoy the benefits – which can feel essentially like living for someone else.^[12]
[12] This points at another line of argument (in addition to the ones I gave in my previous post) to show why hedonist axiology isn’t universally compelling:
To be a good hedonist, someone has to disentangle the part of their brain that cares about short-term pleasure from the part of them that does long-term planning. In doing so, they prove they’re capable of caring about something other than their pleasure. It is now an open question whether they use this disentanglement capability for maximizing pleasure or for something else that motivates them to act on long-term plans.

Benito's Shortform Feed

Lukas_Gloor1mo60

I like all the considerations you point out, but based on that reasoning alone, you could also argue that a con man who ran a lying scheme for 1 year and stole only like $20,000 should get life in prison -- after all, con men are pathological liars and that phenotype rarely changes all the way. And that seems too harsh?

I'm in two minds about it: On the one hand, I totally see the utilitarian argument of just locking up people who "lack a conscience" forever the first time they get caught for any serious crime. On the other hand, they didn't choose how they were born, and some people without prosocial system-1 emotions do in fact learn how to become a decent citizen.

It seems worth mentioning that punishments for financial crime often include measures like "person gets banned from their industry" or them getting banned from participating in all kinds of financial schemes. In reality, the rules there are probably too lax and people who got banned in finance or pharma just transition to running crypto scams or sell predatory online courses on how to be successful (lol). But in theory, I like the idea of adding things to the sentencing that make re-offending less likely. This way, you can maybe justify giving people second chances.

What are the strongest arguments for very short timelines?

Lukas_Gloor1mo91

Suppose that a researcher's conception of current missing pieces is a mental object M, their timeline estimate is a probability function P, and their forecasting expertise F is a function that maps M to P. In this model, F can be pretty crazy, creating vast differences in P depending how you ask, while M is still solid.

Good point. This would be reasonable if you think someone can be super bad at F and still great at M.

Still, I think estimating "how big is this gap?" and "how long will it take to cross it?" might quite related, so I expect the skills to be correlated or even strongly correlated.

What are the strongest arguments for very short timelines?

Lukas_Gloor1mo2816

It surveyed 2,778 AI researchers who had published peer-reviewed research in the prior year in six top AI venues (NeurIPS, ICML, ICLR, AAAI, IJCAI, JMLR); the median time for a 50% chance of AGI was either in 23 or 92 years, depending on how the question was phrased.

Doesn't that discrepancy (how much answers vary between different ways of asking the question) tell you that the median AI researcher who published at these conferences hasn't thought about this question sufficiently and/or sanely?

It seems irresponsible to me to update even just a small bit to the specific reference class of which your above statement is true.

If you take people who follow progress closely and have thought more and longer about AGI as a research target specifically, my sense is that the ones who have longer timeline medians tend to say more like 10-20y rather than 23y+. (At the same time, there's probably a bubble effect in who I follow or talk to, so I can get behind maybe lengthening that range a bit.)

Doing my own reasoning, here are the considerations that I weigh heavily:

we're within the human range of most skill types already (which is where many of us in the past would have predicted that progress speeds up, and don't see any evidence of anything that should change our minds on that past prediction – deep learning visibly hitting a wall would have been one conceivable way, but it hasn't happened yet)
that time for "how long does it take to cross and overshoot the human range at a given skill?" has historically gotten a lot smaller and is maybe even decreasing(?) (e.g., it admittedly took a long time to cross the human expert range in chess, but it took less long in Go, less long at various academic tests or essays, etc., to the point that chess certainly doesn't constitute a typical baseline anymore)
that progress has been quite fast lately, so that it's not intuitive to me that there's a lot of room left to go (sure, agency and reliability and "get even better at reasoning")
that we're pushing through compute milestones rather quickly because scaling is still strong with some more room to go, so on priors, the chance that we cross AGI compute thresholds during this scale-up is higher than that we'd cross it once compute increases slow down
that o3 seems to me like significant progress in reliability, one of the things people thought would be hard to make progress on
Given all that, it seems obvious that we should have quite a lot of probability of getting to AGI in a short time (e.g., 3 years). Placing the 50% forecast feels less obvious because I have some sympathy for the view that says these things are notoriously hard to forecast and we should smear out uncertainty more than we'd intuitively think (that said, lately the trend has been that people consistently underpredict progress, and maybe we should just hard-update on that.) Still, even on that "it's prudent to smear out the uncertainty" view, let's say that implies that the median would be like 10-20 years away. Even then, if we spread out the earlier half of probability mass uniformly over those 10-20 years, with an added probability bump in the near-term because of the compute scaling arguments (we're increasing training and runtime compute now but this will have to slow down eventually if AGI isn't reached in the next 3-6 years or whatever), that IMO very much implies at least 10% for the next 3 years. Which feels practically enormously significant. (And I don't agree with smearing things out too much anyway, so my own probability is closer to 50%.)

Lukas_Gloor1mo*51

Well, the update for me would go both ways.

On one side, as you point out, it would mean that the model's single pass reasoning did not improve much (or at all).

On the other side, it would also mean that you can get large performance and reliability gains (on specific benchmarks) by just adding simple stuff. This is significant because you can do this much more quickly than the time it takes to train a new base model, and there's probably more to be gained in that direction – similar tricks we can add by hardcoding various "system-2 loops" into the AI's chain of thought and thinking process.

You might reply that this only works if the benchmark in question has easily verifiable answers. But I don't think it is limited to those situations. If the model itself (or some subroutine in it) has some truth-tracking intuition about which of its answer attempts are better/worse, then running it through multiple passes and trying to pick the best ones should get you better performance even without easy and complete verifiability (since you can also train on the model's guesses about its own answer attempts, improving its intuition there).

Besides, I feel like humans do something similar when we reason: we think up various ideas and answer attempts and run them by an inner critic, asking "is this answer I just gave actually correct/plausible?" or "is this the best I can do, or am I missing something?."

(I'm not super confident in all the above, though.)

Lastly, I think the cost bit will go down by orders of magnitude eventually (I'm confident of that). I would have to look up trends to say how quickly I expect $4,000 in runtime costs to go down to $40, but I don't think it's all that long. Also, if you can do extremely impactful things with some model, like automating further AI progress on training runs that cost billions, then willingness to pay for model outputs could be high anyway.

Fertility Roundup #4

Lukas_Gloor2mo*4611

When the issue is climate change, a prevalent rationalist take goes something like this:

"Climate change would be a top priority if it weren't for technological progress. However, because technological advances will likely help us to either mitigate the harms from climate change or will create much bigger problems on their own, we probably shouldn't prioritize climate change too much."

We could say the same thing about these trends of demographic aging that you highlight. So, I'm curious why you're drawn to this topic and where the normative motivation in your writing is coming from.

In the post, you use normative language like, "This suggests that we need to lower costs along many fronts of both money and time, and also we need to stop telling people to wait until they meet very high bars." (In the context of addressing people's cited reasons for why they haven't had kids – money, insecurity about money, not being able to affords kids or the house to raise them in, and mental health.)

The way I conceptualize it, one can zoom in on different, plausibly-normatively-central elements of the situation:

(1) The perspective of existing people.

1a Nation-scale economic issues from an aging demographic, such as collapse of pension schemes, economic stagnation from the aging workforce, etc.

1b Individual happiness and life satisfaction (e.g., a claim that having children tends to make people happier, also applying to parents 'on the margin,' people who, if we hadn't enouraged them, would have decided against children).

(2) Some axiological perspective that considers the interests of both existing and newly created people/beings.

It seems uncontroversial that both 1a and 1b are important perspectives, but it's not obvious to me whether 1a is a practical priority for us in light of technological progress (cf the parallel to climate change) or how the empirics of 1b shake out (whether parents 'on the margin' are indeed happier). (I'm not saying 1b is necessarily controversial – for all I know, maybe the science already exists and is pretty clear. I'm just saying: I'm not personally informed on the topic even though I have read your series of posts on fertility.)

And then, (2) seems altogether subjective and controversial in the sense that smart people hold different views on whether it's all-things-considered good to encourage people to have lower standards for bringing new people into existence. Also, there are strong reasons (I've written up a thorough case for this here and here) why we shouldn't expect there to be an objective answer on "how to do axiology?."

This series would IMO benefit from a "Why I care about this?" note, because without it, I get the feeling of "Zvi is criticizing things government do/don't do in a way that might underhandedly bias readers into thinking that the implied normative views on population ethics are unquestioningly correct." The way I see it, governments are probably indeed behaving irrationally here given them not being bought into the prevalent rationalist worldview on imminent technological progress (and that's an okay thing to sneer at), but this doesn't mean that we have to go "boo!" to all things associated with not choosing children, and "yeah!" to all things associated with choosing them.

That said, I still found the specific information in these roundups interesting, since this is clearly a large societal trend and it's interesting to think through causes, implications, etc.

AI #92: Behind the Curve

Lukas_Gloor2mo72

The tabletop game sounds really cool!

Interesting takeaways.

The first was exactly the above point, and that at some point, ‘I or we decide to trust the AIs and accept that if they are misaligned everyone is utterly f***ed’ is an even stronger attractor than I realized.

Yeah, when you say it like that... I feel like this is gonna be super hard to avoid!

The second was that depending on what assumptions you make about how many worlds are wins if you don’t actively lose, ‘avoid turning wins into losses’ has to be a priority alongside ‘turn your losses into not losses, either by turning them around and winning (ideal!) or realizing you can’t win and halting the game.’

There's also the option of, once you realize that winning is no longer achievable, trying to lose less badly than you could have otherwise. For instance, if out of all the trajectories where humans lose, you can guess that some of them seem more likely to bring about some extra bad dystopian scenario, you can try to prevent at least those. Some examples that I'm thinking of are AIs being spiteful or otherwise anti-social (on top of not caring about humans) or AIs being conflict-prone in AI-vs-AI interactions (including perhaps AIs aligned to alien civilizations). Of course, it may not be possible to form strong opinions over what makes for a better or worse "losing" scenario – if you remain very uncertain, all losing will seem roughly equally not valuable.

The third is that certain assumptions about how the technology progresses had a big impact on how things play out, especially the point at which some abilities (such as superhuman persuasiveness) emerge.

Yeah, but I like the idea of rolling dice for various options that we deem plausible (and having this built into the game).

I'm curious to read takeaways from more groups if people continue to try this. Also curious on players' thoughts on good group sizes (how many people played at once and whether you would have preferred more or fewer players).

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)

Lukas_Gloor2mo4025

I agree that it sounds somewhat premature to write off Larry Page based on attitudes he had a long time ago, when AGI seemed more abstract and far away, and then not seek/try communication with him again later on. If that were Musk's true and only reason for founding OpenAI, then I agree that this was a communication fuckup.

However, my best guess is that this story about Page was interchangeable with a number of alternative plausible criticisms of his competition on building AGI that Musk would likely have come up with in nearby worlds. People like Musk (and Altman too) tend to have a desire to do the most important thing and the belief that they can do this thing a lot better than anyone else. On that assumption, it's not too surprising that Musk found a reason for having to step in and build AGI himself. In fact, on this view, we should expect to see surprisingly little sincere exploration of "joining someone else's project to improve it" solutions.

I don't think this is necessarily a bad attitude. Sometimes people who think this way are right in the specific situation. It just means that we see the following patterns a lot:

Ambitious people start their own thing rather than join some existing thing.
Ambitious people have fallouts with each other after starting a project together where the question of "who eventually gets de facto ultimate control" wasn't totally specified from the start.

(Edited away a last paragraph that used to be here 50mins after posting. Wanted to express something like "Sometimes communication only prolongs the inevitable," but that sounds maybe a bit too negative because even if you're going to fall out eventually, probably good communication can help make it less bad.)

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)

Lukas_Gloor2mo9348

I thought the part you quoted was quite concerning, also in the context of what comes afterwards:

Hiatus: Sam told Greg and Ilya he needs to step away for 10 days to think. Needs to figure out how much he can trust them and how much he wants to work with them. Said he will come back after that and figure out how much time he wants to spend.

Sure, the email by Sutskever and Brockman gave some nonviolent communication vibes and maybe it isn't "the professional thing" to air one's feelings and perceived mistakes like that, but they seemed genuine in what they wrote and they raised incredibly important concerns that are difficult in nature to bring up. Also, with hindsight especially, it seems like they had valid reasons to be concerned about Altman's power-seeking tendencies!

When someone expresses legitimate-given-the-situation concerns about your alignment and your reaction is to basically gaslight them into thinking they did something wrong for finding it hard to trust you, and then you make it seem like you are the poor victim who needs 10 days off of work to figure out whether you can still trust them, that feels messed up! (It's also a bit hypocritical because the whole "I need 10 days to figure out if I can still trust you for thinking I like being CEO a bit too much," seems childish too.)

(Of course, these emails are just snapshots and we might be missing things that happened in between via other channels of communication, including in-person talks.)

Also, I find it interesting that they (Sutskever and Brockman) criticized Musk just as much as Altman (if I understood their email correctly), so this should make it easier for Altman to react with grace. I guess given Musk's own annoyed reaction, maybe Altman was calling the others' email childish to side with Musks's dismissive reaction to that same email.

Lastly, this email thread made me wonder what happened between Brockman and Sutskever in the meantime, since it now seems like Brockman no longer holds the same concerns about Altman even though recent events seem to have given a lot of new fire to them.

LESSWRONG
LW

Posts

Wiki Contributions

Comments