All of Erich_Grunewald's Comments + Replies

This is a great post, thanks for writing it. I agree that, when it comes to creative endeavours, there's just no "there" there with current AI systems. They just don't "get it". I'm reminded of this tweet:

Mark Cummins: After using Deep Research for a while, I finally get the “it’s just slop” complaint people have about AI art.

Because I don’t care much about art, most AI art seems pretty good to me. But information is something where I’m much closer to a connoisseur, and Deep Research is just nowhere near a good human output. It’s not useless, I think may

... (read more)
2JustisMills
Yeah, I agree that I'm probably too attached to the attractor basin idea here. It seems like some sort of weighted combination between that and what you suggest, though I'd frame the "all over the place" as the chatbots not actually having enough of something (parameters? training data? oomph?) to capture the actual latent structure of very good short (or longer) fiction. It could be as simple as there being an awful lot of terrible poetry that doesn't have the latent structure that great stuff has, online. If that's a big part of the problem, we should solve it sooner than I'd otherwise expect.

I think this article far overstates the extent to which these AI policy orgs (maybe with the exception of MIRI? but I don’t think so) are working towards an AI pause, or see the goal of policy/regulation as slowing AI development. (I mean policy orgs, not advocacy orgs.) I see as much more common policy objectives: creating transparency around AI development, directing R&D towards safety research, laying groundwork for international agreements, slowing Chinese AI development, etc. — things that (is the hope) are useful on their own, not because of any effect on timelines.

On the advice of @adamShimi, I recently read Hasok Chang's Inventing Temperature. The book is terrific and full of deep ideas, many of which relate in interesting ways to AI safety. What follows are some thoughts on that relationship, from someone who is not an AI safety researcher and only somewhat follows developments there, and who probably got one or two things wrong.

(Definitions: By "operationalizing", I mean "giving a concept meaning by describing it in terms of measurable or closer-to-measurable operations", whereas "abstracting" means "removing pro... (read more)

Hmm, if the Taiwan tariff announcement caused the NVIDIA stock crash, then why did Apple stock (which should be similarly impacted by those tariffs) go up that day? I think DeepSeek -- as illogical as it is -- is the better explanation.

Those diffusion regulations were projected by Nvidia to not have a substantive impact on their bottom line in their official financial statement.

 

I don't think that's true? AFAIK there's no requirement for companies to report material impact on an 8-K form. In a sense, the fact that NVIDIA even filed an 8-K form is a signal that the diffusion rule is significant for their business -- which it obviously is, though it's not clear whether the impact will be substantially material. I think we have to wait for their 10-Q/10-K filings to see what NVIDIA sig... (read more)

You talk later about evolution to be selfish; not only is the story for humans is far more complicated (why do humans often offer an even split in the ultimatum game?), but also humans talk a nicer game than they act (See construal level theory, or social-desirability bias.). Once you start looking at AI agents who have similar affordances and incentives that humans have, I think you'll see a lot of the same behaviors.

Some people have looked at this, sorta:

... (read more)

In the New York example, it could be that when someone says “Guys, we should really buy those Broadway tickets. The trip to New York is next month already.” they prompt the response “What? I thought we were going the month after!”, hence the disagreement. If this detail had been discussed earlier, there might have been the “February trip” and the “March trip” in order to disambiguate the trip(s) to New York.

I guess I don't understand what focusing on disagreements adds. Sure, in this situation, the disagreement stems from some people thinking the trip is n... (read more)

1tangerine
And what do you conclude based on that? The relation between the real world and our intuition is an interesting topic. When people’s intuitions are violated (e.g., the Turing test is passed but it doesn’t “feel like” AGI), there’s a temptation to try to make the real world fit the intuition, when it is more productive to accept that the intuition is wrong. That is, maybe achieving AGI doesn’t feel like you expect. But that can be a fine line to walk. In any case, privileging an intuitive map above the actual territory is about as close as you can get to a “cardinal sin” for someone who claims to be rational. (To be clear, I’m not saying you are doing that.)

I think that, in your New York example, the increasing disagreement is driven by people spending more time thinking about the concrete details of the trip. They do so because it is obviously more urgent, because they know the trip is happening soon. The disagreements were presumably already there in the form of differing expectations/preferences, and were only surfaced later on as they started discussing things more concretely. So the increasing disagreements are driven by increasing attention to concrete details.

It seems likely to me that the increasing d... (read more)

3tangerine
They spend more time thinking about the concrete details of the trip, not because they know the trip is happening soon, but because some think the trip is happening soon. Disagreement on and attention to concrete details is driven by only some people saying that the current situation looks like, or is starting to look like the event occurring according to their interpretation. If the disagreement had happened at the beginning, they would soon have started using different words. In the New York example, it could be that when someone says “Guys, we should really buy those Broadway tickets. The trip to New York is next month already.” they prompt the response “What? I thought we were going the month after!”, hence disagreement. If this detail had been discussed earlier, there might have been the “February trip” and the “March trip” in order to disambiguate the trip(s) to New York. In the case of AGI, some people’s alarm bells are currently going off, prompting others to say that more capabilities are required. What seems to have happened is that people at one point latched on to the concept of AGI, thinking that their interpretation was virtually the same as those of others because of its lack of definition. Again, if they had disagreed with the definition to begin with, they would have used a different word altogether. Now that some people are claiming that AGI is here or here soon, it turns out that the interpretations do in fact differ. The most obnoxious cases are when people disagree with their own past interpretation once that interpretation is threatened to be satisfied, on the basis of some deeper, undefined intuition (or, in the case of OpenAI and Microsoft, ulterior motives). This of course is also known as “moving the goalposts”. Once upon a time, not that long ago, AGI was interpreted by many as “it can beat anyone at chess”, “it can beat anyone at go” or “it can pass the Turing test”. We are there now, according to those interpretations. Whether or not

Thanks for writing this -- it's very concrete and interesting.

Have you thought about using company market caps as an indicator of AGI nearness? I would guess that an AI index -- maybe NVIDIA, Alphabet, Meta, and Microsoft -- would look really significantly different in the two scenarios you paint. To control for general economic conditions, you could look at the those companies relative to the NASDAQ-100 (minus AI companies). An advantage of this is that it tracks a lot of different indicators, including ones that are really fuzzy or hard to discover, thro... (read more)

The jury is still out, but it's currently available even in Direct Chat on Chatbot Arena, there will be more data on this soon.

Fyi, it's also available on https://chat.deepseek.com/, as is their reasoning model DeepSeek-R1-Lite-Preview ("DeepThink"). (I suggest signing up with a throwaway email and not inputting any sensitive queries.) From quickly throwing it a few requests I recently asked 3.5 Sonnet, DeepSeek-V3 seems slightly worse, but nonetheless solid.

I'm not totally sure of this, but it looks to me like there's already more scientific consensus around mirror life being a threat worth taking seriously, than is the case for AI. E.g., my impression is that this paper was largely positively received by various experts in the field, including experts that weren't involved in the paper. AI risk looks much more contentious to me even if there are some very credible people talking about it. That could be driving some of the difference in responses, but yeah, the economic potential of AI probably drives a bunch of the difference too.

I sorta agree, but sorta don't. Remember the CAIS statement? There have been plenty of papers about AI risk that were positively received by various experts in the field who were uninvolved in those papers. I agree that there is more contention about AI risk than about chirality risk though... which brings me to my other point, which is that part of the contention around AGI risks seems to be downstream of the incentives rather than downstream of scientific disputes. Like, presumably the fact that there are already powerful corporations that stand to make ... (read more)

To add to that, Oeberst (2023) argues that all cognitive biases at heart are just confirmation bias based around a few "fundamental prior" beliefs. (A "belief" would be a hypothesis about the world bundled with an accuracy.) The fundamental beliefs are:

  • My experience is a reasonable reference
  • I make correct assessments of the world
  • I am good
  • My group is a reasonable reference
  • My group (members) is (are) good
  • People's attributes (not context) shape outcomes

That is obviously rather speculative, but I think it's some further weak reason to think motivated reasoning... (read more)

It seems like an obviously sensible thing to do from a game-theoretic point of view.

Hmm, seems highly contingent on how well-known the gift would be? And even if potential future Petrovs are vaguely aware that this happened to Petrov's heirs, it's not clear that it would be an important factor when they make key decisions, if anything it would probably feel pretty speculative/distant as a possible positive consequence of doing the right thing. Especially if those future decisions are not directly analogous to Petrov's, such that it's not clear whether i... (read more)

Specific examples might include criticisms of RSPs, Kelsey’s coverage of the OpenAI NDA stuff, alleged instances of labs or lab CEOs misleading the public/policymakers, and perspectives from folks like Tegmark and Leahy (who generally see a lot of lab governance as safety-washing and probably have less trust in lab CEOs than the median AIS person).

Isn't much of that criticism also forms of lab governance? I've always understood the field of "lab governance" as something like "analysing and suggesting improvements for practices, policies, and organisational... (read more)

3Orpheus16
Yeah, I think there's a useful distinction between two different kinds of "critiques:" * Critique #1: I have reviewed the preparedness framework and I think the threshold for "high-risk" in the model autonomy category is too high. Here's an alternative threshold. * Critique #2: The entire RSP/PF effort is not going to work because [they're too vague//labs don't want to make them more specific//they're being used for safety-washing//labs will break or weaken the RSPs//race dynamics will force labs to break RSPs//labs cannot be trusted to make or follow RSPs that are sufficiently strong/specific/verifiable].  I feel like critique #1 falls more neatly into "this counts as lab governance" whereas IMO critique #2 falls more into "this is a critique of lab governance." In practice the lines blur. For example, I think last year there was a lot more "critique #1" style stuff, and then over time as the list of specific object-level critiques grew, we started to see more support for things in the "critique #2" bucket.

Yes, this seems right to me. The OP says

The key point I will make is that, from a game-theoretic point of view, this race is not an arms race but a suicide race. In an arms race, the winner ends up better off than the loser, whereas in a suicide race, both parties lose massively if either one crosses the finish line.

But from a game-theoretic perspective, it can still make sense for the US to aggressively pursue AGI, even if one believes there's a substantial risk of an AGI takeover in the case of a race, especially if the US acts in its own self interest. ... (read more)

for people who are not very good at navigating social conventions, it is often easier to learn to be visibly weird than to learn to adapt to the social conventions.


are you basing this on intuition or personal experience or something else? I guess we should avoid basing it on observations of people who did succeed in that way. People who try and succeed in adapting to social conventions are likely much less noticeable/salient than people who succeed at being visibly weird. 

Yeah that makes sense. I think I underestimated the extent to which "warning shots" are largely defined post-hoc, and events in my category ("non-catastrophic, recoverable accident") don't really have shared features (or at least features in common that aren't also there in many events that don't lead to change).

One man's 'warning shot' is just another man's "easily patched minor bug of no importance if you aren't anthropomorphizing irrationally", because by definition, in a warning shot, nothing bad happened that time. (If something had, it wouldn't be a 'warning shot', it'd just be a 'shot' or 'disaster'.

I agree that "warning shot" isn't a good term for this, but then why not just talk about "non-catastrophic, recoverable accident" or something? Clearly those things do sometimes happen, and there is sometimes a significant response going beyond "we can just p... (read more)

gwern*337

I think all of your examples are excellent demonstrations of why there is no natural kind there, and they are defined solely in retrospect, because in each case there are many other incidents, often much more serious, which however triggered no response or are now so obscure you might not even know of them.

  • Three Mile Island: no one died, unlike at least 8 other more serious nuclear accidents like (not to mention, Chernobyl or Fukushima). Why did that trigger such a hysterical backlash?

    (The fact that people are reacting with shock and bafflement that "Am

... (read more)

I don't really have a settled view on this; I'm mostly just interested in hearing a more detailed version of MIRI's model. I also don't have a specific expert in mind, but I guess the type of person that Akash occasionally refers to -- someone who's been in DC for a while, focuses on AI, and has encouraged a careful/diplomatic communication strategy.

“Be careful what you say, try to look normal, and slowly accumulate political capital and connections in the hope of swaying policymakers long-term” isn’t an unconditionally good strategy, it’s a strategy ada

... (read more)

That's one reason why an outspoken method could be better. But it seems like you'd want some weighing of the pros and cons here? (Possible drawbacks of such messaging could include it being more likely to be ignored, or cause a backlash, or cause the issue to become polarized, etc.)

Like, presumably the experts who recommend being careful what you say also know that some people discount obviously political speech, but still recommend/practice being careful what you say. If so, that would suggest this one reason is not on its own enough to override the experts' opinion and practice.

4Rob Bensinger
Could we talk about a specific expert you have in mind, who thinks this is a bad strategy in this particular case? AI risk is a pretty weird case, in a number of ways: it's highly counter-intuitive, not particularly politically polarized / entrenched, seems to require unprecedentedly fast and aggressive action by multiple countries, is almost maximally high-stakes, etc. "Be careful what you say, try to look normal, and slowly accumulate political capital and connections in the hope of swaying policymakers long-term" isn't an unconditionally good strategy, it's a strategy adapted to a particular range of situations and goals. I'd be interested in actually hearing arguments for why this strategy is the best option here, given MIRI's world-model. (Or, separately, you could argue against the world-model, if you disagree with us about how things are.)

Everything that happened since then has made it clear that this is not the case; that all these big flashy commitments like Superalignment were just safety-washing and virtue signaling. They were only going to do alignment work inasmuch as that didn't interfere with racing full-speed towards greater capabilities.

It's not clear to me that it was just safety-washing and virtue signaling. I think a better model is something like: there are competing factions within OAI that have different views, that have different interests, and that, as a result, prioritize... (read more)

9Thane Ruthenis
Sure, that's basically my model as well. But if the faction (b) only cares about alignment due to perceived PR benefits or in order to appease faction (a), and faction (b) turns out to have overriding power such that it can destroy or drive out faction (a) and then curtail all the alignment efforts, I think it's fair to compress all that into "OpenAI's alignment efforts are safety-washing". If (b) has the real power within OpenAI, then OpenAI's behavior and values can be approximately rounded off to (b)'s behavior and values, and (a) is a rounding error. Not if (b) is concerned about fortifying OpenAI against future challenges, such as hypothetical futures in which the AGI Doomsayers get their way and the government/the general public wakes up and tries to nationalize or ban AGI research. In that case, having a prepared, well-documented narrative of going above and beyond to ensure that their products are safe, well before any other parties woke up to the threat, will ensure that OpenAI is much more well-positioned to retain control over its research. (I interpret Sam Altman's behavior at Congress as evidence for this kind of longer-term thinking. He didn't try to downplay the dangers of AI, which would be easy and what someone myopically optimizing for short-term PR would do. He proactively brought up the concerns that future AI progress might awaken, getting ahead of it, and thereby established OpenAI as taking them seriously and put himself into the position to control/manage these concerns.) And it's approximately what I would do, at least, if I were in charge of OpenAI and had a different model of AGI Ruin. And this is the potential plot whose partial failure I'm currently celebrating.

Fwiw, there is also AI governance work that is neither policy nor lab governance, in particular trying to answer broader strategic questions that are relevant to governance, e.g., timelines, whether a pause is desirable, which intermediate goals are valuable to aim for, and how much computing power Chinese actors will have access to. I guess this is sometimes called "AI strategy", but often the people/orgs working on AI governance also work on AI strategy, and vice versa, and they kind of bleed into each other.

How do you feel about that sort of work relati... (read more)

Let's go through those:

  • Timelines appear to me to be at least one and maybe two orders-of-magnitude more salient than they are strategically relevant, in EA/rationalist circles. I think the right level of investment in work on them is basically "sometimes people who are interested write blogposts on them in their spare time", and it is basically not worthwhile for anyone to focus their main work on timelines at current margins. Also, the "trying to be serious" efforts on timelines typically look-to-me to be basically bullshit - i.e. they make basically-impl
... (read more)

Open Philanthropy did donate $30M to OpenAI in 2017, and got in return the board seat that Helen Toner occupied until very recently. However, that was when OpenAI was a non-profit, and was done in order to gain some amount of oversight and control over OpenAI. I very much doubt any EA has donated to OpenAI unconditionally, or at all since then.

They often do things of the form "leaving out info, knowing this has misleading effects"

On that, here are a few examples of Conjecture leaving out info in what I think is a misleading way.

(Context: Control AI is an advocacy group, launched and run by Conjecture folks, that is opposing RSPs. I do not want to discuss the substance of Control AI’s arguments -- nor whether RSPs are in fact good or bad, on which question I don’t have a settled view -- but rather what I see as somewhat deceptive rhetoric.)

One, Control AI’s X account features a banner image with ... (read more)

2momom2
I'm surprised to hear they're posting updates about CoEm. At a conference held by Connor Leahy, I said that I thought it was very unlikely to work, and asked why they were interested in this research area, and he answered that they were not seriously invested in it. We didn't develop the topic and it was several months ago, so it's possible that 1- I misremember or 2- they changed their minds 3- I appeared adversarial and he didn't feel like debating CoEm. (For example, maybe he actually said that CoEm didn't look promising and this changed recently?) Still, anecdotal evidence is better than nothing, and I look forward to seeing OliviaJ compile a document to shed some light on it.

I think it is reasonable to treat this as a proxy for the state of the evidence, because lots of AI policy people specifically praised it as a good and thoughtful paper on policy.

All four of those AI policy people are coauthors on the paper -- that does not seem like good evidence that the paper is widely considered good and thoughtful, and therefore a good proxy (though I think it probably is an ok proxy).

9Sean_o_h
(disclaimer: one of the coauthors) Also, none of the linked comments by the coauthors actually praise the paper as good and thoughtful? They all say the same thing, which is "pleased to have contributed" and "nice comment about the lead author" (a fairly early-career scholar who did lots and lots of work and was good to work with). I called it "timely", as the topic of open-sourcing was very much live at the time.   (FWIW, I think this post has valid criticism re: the quality of the biorisk literature cited and the strength with which the case was conveyed; and I think this kind of criticism is very valuable and I'm glad to see it).

When Jeff Kaufman shared one of the papers discussed here on the EA Forum, there was a highly upvoted comment critical of the paper (more upvoted than the post itself). That would suggest to me that this post would be fairly well received on the EA Forum, though its tone is definitely more strident than that comment, so maybe not.

ARC & Open Philanthropy state in a press release “In a sane world, all AGI progress should stop. If we don’t, there’s more than a 10% chance we will all die.”

Could you spell out what you mean by "in a sane world"? I suspect a bunch of people you disagree with do not favor a pause due to various empirical facts about the world (e.g., there being competitors like Meta).

Well, it's not like vegans/vegetarians are some tiny minority in EA. Pulling together some data from the 2022 ACX survey, people who identify as EA are about 40% vegan/vegetarian, and about 70% veg-leaning (i.e., vegan, vegetarian, or trying to eat less meat and/or offsetting meat-eating for moral reasons). (That's conditioning on identifying as an LW rationalist, since anecdotally I think being vegan/vegetarian is somewhat less common among Bay Area EAs, and the ACX sample is likely to skew pretty heavily rationalist, but the results are not that differen... (read more)

Israel's strategy since the Hamas took the strip over in 2007 has been to try and contain it, and keeping it weak by periodic, limited confrontations (the so called Mowing the Lawn doctorine), and trying to economically develop the strip in order to give Hamas incentives to avoid confrontation. While Hamas grew stronger, the general feeling was that the strategy works and the last 15 years were not that bad.

I am surprised to read the bolded part! What actions have the Israeli government taken to develop Gaza, and did Gaza actually develop economically in t... (read more)

Sorry it took me some time. 
I agree with your asessment. I did say Israel tried to do that, but it's a hard problem. I didn't want to elaborate on this point in the original comment since it felt off topic, so here goes:

TL;DR: Blockade is the baseline from which we try to improve, since Hamas are genocidal terrorists and use any aid to military needs. Under that constraint Israel has supplied water, food, electricity, and tried to build more generators and let palestinians work within its borders.

Some links will be in hebrew, sorry in advance. I'll on... (read more)

5homunq_homunq
"Economically develop" is only meaningful against some baseline. Israel has had policies that clearly massively harm Gaza's development, and other policies that somewhat help it.  There are also other factors Israel doesn't control, which probably are a net positive; economies in general tend to develop over time. So if the baseline is some past year, or if it's the counterfactual situation with a blockade and no mitigating policies, there's been development. But if it's the counterfactual with no blockade and no mitigating policies, I'd guess not so much. In other words: Israel's "strategy" has included at least some things that in themselves help Gaza's development, but Israel has still hurt its development overall / on net. (I'm not an expert or an insider here.)

Assuming you have the singular "you" in mind, no, I do not think I am not running a motte and bailey. I said above that if you accept the assumptions, I think using the ranges as (provisional, highly uncertain) moral weights is pretty reasonable, but I also think it's reasonable to reject the assumptions. I do think it is true that some people have (mis)interpreted the report and made stronger claims than is warranted, but the report is also full of caveats and (I think) states its assumptions and results clearly.

The report:

Instead, we’re usually comparing

... (read more)

e.g. 12 (ETA: 14) bees are worth 1 human

This is a misrepresentation of what the report says. The report says that, conditional on hedonism, valence symmetry, the animals being sentient, and other assumptions, the intensity of positive/negative valence that a bee can experience is 7% that of the positive/negative intensity that a human can experience. How to value creatures based on the intensities of positively/negatively valenced states they are capable of is a separate question, even if you fully accept the assumptions. (ETA: If you assume utilitarianism... (read more)

0Richard_Kennaway
I'm going by the summary by jefftk that I linked to. Having glanced at the material it's based on, and your links, I am not inclined to root through it all to make a more considered assessment. I suspect I would only end up painting a similar picture with a finer brush. Their methods of getting to the strange places they end up already appear to require more of my attention to understand than I am willing to spend on the issue.

e.g. 12 (ETA: 14) bees are worth 1 human

This is a misrepresentation of what the report says.

The report:

Instead, we’re usually comparing either improving animal welfare (welfare reforms) or preventing animals from coming into existence (diet change → reduction in production levels) with improving human welfare or saving human lives.


I don't think he's misrepresenting what the report says at all. Trevor gets the central point of the post perfectly. The post's response to the heading "So you’re saying that one person = ~three chickens?" is, no, t... (read more)

Answer by Erich_Grunewald10

I think this is a productivity/habit question disguised as something else. You know you want to do thing X, but instead procrastinate by doing thing Y. Here are some concrete suggestions for getting out of this trap:

  • Try Focusmate. Sign up and schedule a session. The goal of your first session will be to come up with a concrete project/exercise to do, if you have not already done so. The goal of your second session will be to make some progress on that project/exercise (e.g., write 1 page).
    • You can also use the same accountability technique with a friend, bu
... (read more)
1matto
I can see how this can look like procrastination from the outside. But I think in my case, it really is some weird jedi trickery where meta-level replaced the object-level (at much less energy cost--so why would I ever do object level?) I've written more this week than in a long time just by clearly asking myself whether I'm doing something meta (fun, leisure) or object-level (building stuff) and there's no ugh-field at all!

Kelsey Piper wrote this comment on the EA Forum:

It could be that I am misreading or misunderstanding these screenshots, but having read through them a couple of times trying to parse what happened, here's what I came away with:

On December 15, Alice states that she'd had very little to eat all day, that she'd repeatedly tried and failed to find a way to order takeout to their location, and tries to ask that people go to Burger King and get her an Impossible Burger which in the linked screenshots they decline to do because they don't want to get fast food. S

... (read more)

Rob Bensinger replied with:

I think that there's a big difference between telling everyone "I didn't get the food I wanted, but they did get/offer to cook me vegan food, and I told them it was ok!" and "they refused to get me vegan food and I barely ate for 2 days".

Agreed.

And this:

This also updates me about Kat's take (as summarized by Ben Pace in the OP):

> Kat doesn’t trust Alice to tell the truth, and that Alice has a history of “catastrophic misunderstandings”.

When I read the post, I didn't see any particular reason for Kat to think this, and I worrie

... (read more)

Some possibly relevant data:

  • As of 2020, anti-government protests in North America rose steadily from 2009 to 2017 where it peaked (at ~7x the 2009 number) and started to decline (to ~4x the 2009 number in 2019).
  • Americans' trust in the US government is very low (only ~20% say they trust the USG to do what's right most of the time) and has been for over a decade. It seems to have locally peaked at ~50% after 9/11, and then declined to ~15% in 2010, after the financial crisis.
  • Congressional turnover rates have risen somewhat since the 90s, and are now at about
... (read more)

Actually Charles Babbage was not trying to disrupt the industry of printed logarithmic tables, he was trying to print accurate tables.

Hmm, Babbage wanted to remove errors from tables by doing the calculations by steam. He was also concerned with how tedious and time-consuming those calculations were, though, and I guess the two went hand in hand. ("The intolerable labour and fatiguing monotony of a continued repetition of similar arithmetical calculation, first excited the desire and afterwards suggested the idea, of a machine, which, by the aid of gravity... (read more)

Great post!

But let's back up and get some context first. The year was 1812, and mathematical tables were a thing.

What are mathematical tables, you ask? Imagine that you need to do some trigonometry. What's sin(79)?

Well, today you'd just look it up online. 15 years ago you'd probably grab your TI-84 calculator. But in the year 1812, you'd have to consult a mathematical table. Something like this:

They'd use computers to compute all the values and write them down in books. Just not the type of computers you're probably thinking of. No, they'd use human comput

... (read more)
1Jon "maddog" Hall
Actually Charles Babbage was not trying to disrupt the industry of printed logarithmic tables, he was trying to print accurate tables.   His difference engine included a mechanism to transfer the calculated tables directly to a print plate so there would be no transcription errors between the calculated numbers and going to the printer. Babbage's work on the engine started when he was working with another engineer doing calculations in parallel as was often done in those days.  They did one set of calculations and got different answers.   They retried the calculations and each got their same answer, but again they were different.   Then they looked at the values in the tables and realized that the two books had two different numbers in the table.   This frightened Babbage because he realized that if they had been using the same (wrong) book, they would not have discovered the error.   So he set out to create a machine that would calculate the tables correctly every time and create the print plate so transcription errors would not happen. The Difference Engine #2, built for the CHM to Babbage's plans, created these print plates perfectly. As a side note, in 2008 Linus Torvalds was inducted to the Hall of Fellows for the CHM.  The only way I (who had nominated him for the Fellowship) could get Linus to attend the ceremony was to tell him he could turn the crank of the Difference Engine.  And it was so.

I think your analysis makes sense if using a "center" name really should require you to have some amount of eminence or credibility first. I've updated a little bit in that direction now, but I still mostly think it's just synonymous with "institute", and on that view I don't care if someone takes a "center" name (any more than if someone takes an "institute" name). It's just, you know, one of the five or so nouns non-profits and think tanks use in their names ("center", "institute", "foundation", "organization", "council", blah).

Or actually, maybe it's mo... (read more)

Ben Pace*2211

I think your analysis makes sense if using a "center" name really should require you to have some amount of eminence or credibility first... I still mostly think it's just synonymous with "institute"

I'm finding it hard to explain why I think that naming yourself "The Center for AI Safety" is taking ownership of a prime piece of namespace real estate and also positioning yourselves as representing the field moreso than if you call yourself "Conjecture" or "Redwood Research" or "Anthropic". 

Like, consider an outsider trying to figure out who to go to in... (read more)

The only criticism of you and your team in the OP is that you named your team the "Center" for AI Safety, as though you had much history leading safety efforts or had a ton of buy-in from the rest of the field.

Fwiw, I disagree that "center" carries these connotations. To me it's more like "place where some activity of a certain kind is carried out", or even just a synonym of "institute". (I feel the same about the other 5-10 EA-ish "centers/centres" focused on AI x-risk-reduction.) I guess I view these things more as "a center of X" than "the center of X". Maybe I'm in the minority on this but I'd be kind of surprised if that were the case.

1Ben Pace
Interesting. Good point about there being other examples. I'll list some of them that I can find with a quick search, and share my impressions of whether the names are good/bad/neutral. * UC Berkeley's Center for Human-Compatible AI * Paul Christiano's Alignment Research Center * Center for Long-Term Risk * Center for Long-Term Resilience The first feels pretty okay to me because (a) Stuart Russell has a ton of leadership points to spend in the field of AI, given he wrote the standard textbook (prior to deep learning), and (b) this is within the academic system, where I expect the negotiation norms for naming have been figured out many decades ago. I thought about the second one at the time. It feels slightly like it's taking central namespace, but I think Paul Christiano is on a shortlist of people who can reasonably take the mantle as foremost alignment researcher, so I am on net supportive of him having this name (and I expect many others in the field are perfectly fine with it too). I also mostly do not expect Paul to use the name for that much politicking, who in many ways seems to try to do relatively inoffensive things. I don't have many associations with the third and fourth, and don't see them as picking up much political capital from the names. Insofar as they're putting themselves as representative of the 'longtermism' flag, I don't particularly feel connected to that flag and am not personally interested in policing its use. And otherwise, if I made a non-profit called (for example) "The Center for Mathematical Optimization Methods" I think that possibly one or two people would be annoyed if they didn't like my work, but mostly I don't think there's a substantial professional network or field of people that I'd be implicitly representing, and those who knew about my work would be glad that anyone was making an effort. I'll repeat that one of my impressions here is that Dan is picking up a lot of social and political capital that others have earn

It's valuable to flag the causal process generating an idea, but it's also valuable to provide legible argumentation, because most people can't describe the factors which led them to their beliefs in sufficient detail to actually be compelling.

To add to that, trying to provide legible argumentation can also be good because it can convince you that your idea actually doesn't make sense, or doesn't make sense as stated, if that is indeed the case.

Have you considered writing (more) shortforms instead? If not, this comment is a modest nudge for you to consider doing so.

4MSRayne
I'm never really sure what there's any point in saying. My main interests have nothing to do with AI alignment, which seems to be the primary thing people talk about here. And a lot of my thoughts require the already existing context of my previous thoughts. Honestly, it's difficult for me to communicate what's going on in my head to anyone.
  1. Seems to me like (a), (b) and maybe (d) are true for the airplane manufacturing industry, to some degree.
  2. But I'd still guess that flying is safer with substantial regulation than it would be in a counterfactual world without substantial regulation.

That would seem to invalidate your claim that regulation would make AI x-risk worse. Do you disagree with (1), and/or with (2), and/or see some important dissimilarities between AI and flight that make a difference here?

1a3orn3514

I do think there are important dissimilarities between AI and flight.

For instance: People disagree massively over what is safe for AI in ways they do not over flight; i.e., are LLMs going to plateau and provide us a harmless and useful platform for exploring interpretability, while maybe robustifying the world somewhat; or are they going to literally kill everyone?

I think pushing for regulations under such circumstances is likely to promote the views of an accidental winner of a political struggle; or to freeze in amber currently-accepted views that everyo... (read more)

It's not clear whether that will mean the end of humanity in the sense of the systems we've created destroying us. It's not clear if that's the case, but it's certainly conceivable. If not, it also just renders humanity a very small phenomenon compared to something else that is far more intelligent and will become incomprehensible to us, as incomprehensible to us as we are to cockroaches.

It's interesting that he seems so in despair over this now. To the extent that he's worried about existential/catastrophic risks, I wonder if he is unaware of efforts to m... (read more)

I am working on human capability enhancement via genetics. I think it's quite plausible that we could create humans smarter than any that have ever lived within a decade. But even I think that digital intelligence wins in the end.

Like it just seems obvious to me. The only reason I'm even working in the field is because I think that enhanced humans could play an extremely critical role in the development of aligned AI. Of course this requires time for them to grow up and do research, which we are increasingly short of. But in case AGI takes longer than projected or we get our act together and implement a ban on AI capabilities improvements until alignment is solved, it still seems worth continuing the work to me.

I’m confused about the parallelization part and what it implies. It says the model was trained on 2K GPUs but GPT-4 was probably trained on 1 OOM more than that right?

5sanxiyn
Parallelization part (data parallelism, tensor parallelism, pipeline parallelism, ZeRO) is completely standard. See Efficient Training on Multiple GPUs by Hugging Face for a standard description. Failure recovery part is relatively unusual.

They state that their estimated probability for each event is conditional on all previous events happening.

4followthesilence
Thanks, I suppose I'm taking issue with sequencing five distinct conditional events that seem to be massively correlated with one another. The likelihoods of Events 1-5 seem to depend upon each other in ways such that you cannot assume point probabilities for each event and multiply them together to arrive at 1%. Event 5 certainly doesn't require Events 1-4 as a prerequisite, and arguably makes Events 1-4 much more likely if it comes to pass.

I think this is an excellent, well-researched contribution and am confused about why it's not being upvoted more (on LW that is; it seems to be doing much better on EAF, interestingly).

3RobertM
At a guess (not having voted on it myself): because most of the model doesn't engage with the parts of the question that those voting consider interesting/relevant, such as the many requirements laid out for "transformative AI" which don't see at all necessary for x-risk.  While this does seem to be targeting OpenPhil's given definition of AGI, they do say in a footnote: While some people do have AI x-risk models that route through ~full automation (or substantial automation, with a clearly visible path to full automation), I think most people here don't have models that require that, or even have substantial probability mass on it.

I see, that makes sense. I agree that holding all else constant more neurons implies higher intelligence.

Within a particular genus or architecture, more neurons would be higher intelligence.

I'm not sure that's necessarily true? Though there's probably a correlation. See e.g. this post:

[T]he raw number of neurons an organism possesses does not tell the full story about information processing capacity. That’s because the number of computations that can be performed over a given amount of time in a brain also depends upon many other factors, such as (1) the number of connections between neurons, (2) the distance between neurons (with shorter distances allowing f

... (read more)
1boazbarak
Yes , the point is that once you fixed architecture and genus (eg connections etc), more neurons/synapses leads to more capabilities

Once we start searching over policies that understand the world well enough, we run into a problem: any influence-seeking policies we stumble across would also score well according to our training objective, because performing well on the training objective is a good strategy for obtaining influence.

I'm slightly confused by this. It sounds like "(1) ML systems will do X because X will be rewarded according to the objective, and (2) X will be rewarded according to the objective because being rewarded will accomplish X". But (2) sounds circular -- I see that... (read more)

5paulfchristiano
Consider a competent policy that wants paperclips in the very long run. It could reason "I should get a low loss to get paperclips," and then get a low loss. As a result, it could be selected by gradient descent.

I vaguely remember OpenAI citing US law as a reason they don't allow Chinese users access, maybe legislation passed as part of the chip ban?

Nah, the export controls don't cover this sort of thing. They just cover chips, devices that contain chips (i.e. GPUs and AI ASICs), and equipment/materials/software/information used to make those. (I don't know the actual reason for OpenAI's not allowing Chinese customers, though.)

6M. Y. Zuo
  Any online service hosted outside of mainland China wanting to sell within has to meet quite onerous regulations.  I've thought about for some time why these exist and develop to the extent that it's harmful (and why they exist nearly everywhere). It's partly by design, similar to the EU's regulatory apparatus that created GDPR, partly because of the natural distrust of government officials towards the intentions of foreign actors, but also because of the office dynamics.  As scoring points against foreigners is low hanging fruit for the career prospects of middle-management government officials in many, many, government offices. If their attempts succeed in delivering proof of misdeeds, or a public apology, then it's a guaranteed promotion. If their attempts fail, how will the overseas stakeholders retaliate against these nameless official(s) without pointing the finger at the department/division/ministry/government/China... as a whole?  If they do anyways, it would turn into a status fight, thus facilitating a promotion for whoever proposed it. And then some stricter regulations will be imposed as retaliation, which increases the power and authority of the instigators. i.e. In either case the instigator wins, so it becomes a ratcheting mechanism that encourages ever more extreme proposals and tit-for-tat behaviour, and that's even before the geopolitics come into play In this sense increasing geopolitical tension might ironically reduce the day-to-day risk of being on the unfortunate end of such schemes, since higher level officials will be extra motivated to make sure their subordinates are more disciplined.

If only we could spread the meme of irresponsible Western powers charging head-first into building AGI without thinking through the consequences and how wise the Chinese regulation is in contrast.

That sort of strategy seems like it could easily backfire, where people only pick up the first part of that statement ("irresponsible Western powers charging head-first into building AGI") and think "oh, that means we need to speed up". Or maybe that's what you mean by "if only" -- that it's hard to spread even weakly nuanced messages?

Load More