All of JakubK's Comments + Replies

Relevant tweet/quote from Mustafa Suleyman, the co-founder and CEO:

Powerful AI systems are inevitable. Strict licensing and regulation is also inevitable. The key thing from here is getting the safest and most widely beneficial versions of both.

Suleyman's statements are either very specific capabilities predictions or incredibly vague statements like the one you brought up that don't really inform us much. His interviews often revolve around talking about how big and smart their future models will be while also spending time putting in a good word for their financial backers (mainly NVIDIA).  I find myself frustrated at seeing this company with a lot of compute and potential impact on timelines, but whose CEO and main spokesperson seems very out-of-touch with the domain he does business in.

Thanks for writing and sharing this. I've added it to the doc.

What happened to black swan and tail risk robustness (section 2.1 in "Unsolved Problems in ML Safety")?

It's hard to say. This CLR article lists some advantages that artificial systems have over humans. Also see this section of 80k's interview with Richard Ngo:

Rob Wiblin: One other thing I’ve heard, that I’m not sure what the implication is: signals in the human brain — just because of limitations and the engineering of neurons and synapses and so on — tend to move pretty slowly through space, much less than the speed of electrons moving down a wire. So in a sense, our signal propagation is quite gradual and our reaction times are really slow compared to wha

... (read more)

The cyborgism post might be relevant:

Executive summary: This post proposes a strategy for safely accelerating alignment research. The plan is to set up human-in-the-loop systems which empower human agency rather than outsource it, and to use those systems to differentially accelerate progress on alignment. 

  1. Introduction: An explanation of the context and motivation for this agenda.
  2. Automated Research Assistants: A discussion of why the paradigm of training AI systems to behave as autonomous agents is both counterproductive and dangerous.
  3. Becoming a Cybor
... (read more)

Does current AI hype cause many people to work on AGI capabilities? Different areas of AI research differ significantly in their contributions to AGI.

1Ryan Kidd
We agree, which is why we note, "We think that ~1 more median MATS scholar focused on AI safety is worth 5-10 more median capabilities researchers (because most do pointless stuff like image generation, and there is more low-hanging fruit in safety)."
2Zach Stein-Perlman
Thank you!! * Will add this; too bad it's so meta * Will read this-- probably it's worth adding and maybe it points to more specific sources also worth adding * Will add this; too bad it's so meta * Will add this * Already have this one

I've grown increasingly alarmed and disappointed by the number of highly-upvoted and well-received posts on AI, alignment, and the nature of intelligent systems, which seem fundamentally confused about certain things.

Can you elaborate on how all these linked pieces are "fundamentally confused"? I'd like to see a detailed list of your objections. It's probably best to make a separate post for each one.

1Max H
I think commenting is a more constructive way of engaging in many cases. Before and since publishing this post, I've commented on some of the pieces I linked (or related posts or subthreads). I've also made one top-level post which is partially an objection to the characterization of alignment that I think is somewhat common among many of the authors I linked. Some of these threads have resulted in productive dialogue and clarity, at least from my perspective.  Links: * Top-level post on model alignment * Thread on computational anatomy post * Comments on Behavioural statistics for a maze-solving agent * On brain efficiency * Comments on coherence theorems: top-level comment, subthread I participated in There are probably some others in my comment history. Most of these aren't fundamental objections to the pieces they respond to, but they gesture at the kind of thing I am pointing to in this post.  If I had to summarize (without argument) the main confusions as I see them: * An implicit or explicit assumption that near-future intelligent systems will look like current DL-paradigm research artifacts. (This is partially what this post is addressing.) * I think a lot of people mostly accept orthogonality and instrumental convergence, without following the reasoning through or engaging directly with all of the conclusions they imply.  I think this leads to a view that explanations of human value formation or arguments based on precise formulations of coherence have more to say about near-future intelligent systems than is actually justified. Or at least, that results and commentary about these things are directly relevant as objections to arguments for danger based on consequentialism and goal-directedness more generally. (I haven't expanded on this in a top-level post yet, but it is addressed obliquely by some of the comments and posts in my history.)  

That was arguably the hardest task, because it involved multi-step reasoning. Notably, I didn't even notice that GPT-4's response was wrong.

I believe that Marcus' point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard.

Yeah this is the part that seems increasingly implausible to me. If there is a "class of problems that tend to be hard ... [and] will continue to be hard," then someone should be able to build a benchmark that models consistently struggle with over time.

Oh I see; I read too quickly. I interpreted your statement as "Anthropic clearly couldn't care less about shortening timelines," and I wanted to show that the interpretability team seems to care. 

Especially since this post is about capabilities externalities from interpretability research, and your statement introduces Anthropic as "Anthropic, which is currently the biggest publisher of interp-research." Some readers might conclude corollaries like "Anthropic's interpretability team doesn't care about advancing capabilities."

2habryka
Makes sense, sorry for the confusion.

Ezra Klein listed some ideas (I've added some bold):

The first is the question — and it is a question — of interpretability. As I said above, it’s not clear that interpretability is achievable. But without it, we will be turning more and more of our society over to algorithms we do not understand. If you told me you were building a next generation nuclear power plant, but there was no way to get accurate readings on whether the reactor core was going to blow up, I’d say you shouldn’t build it. Is A.I. like that power plant? I’m not sure. But that’s a questi

... (read more)

Anthropic, which is currently the biggest publisher of interp-research, clearly does not have a commitment to not work towards advancing capabilities

This statement seems false based on this comment from Chris Olah.

2habryka
I am not sure what you mean. Anthropic clearly is aiming to make capability advances. The linked comment just says that they aren't seeking capability advances for the sake of capability advances, but want some benefit like better insight into safety, or better competitive positioning.

Thus, we decided to ask multiple people in the alignment scene about their stance on this question.

Richard

Any reason you're not including people's last names? To a newcomer this would be confusing. "Who is Richard?"

5Marius Hobbhahn
People could choose how they want to publish their opinion. In this case, Richard chose the first name basis. To be fair though, there aren't that many Richards in the alignment community and it probably won't be very hard for you to find out who Richard is. 

To argue for that level of confidence, I think the post needs to explain why AI labs will actually utilize the necessary techniques for preventing deceptive alignment.

5DavidW
I have a whole section on the key assumptions about the training process and why I expect them to be the default. It's all in line with what's already happening, and the labs don't have to do anything special to prevent deceptive alignment. Did I miss anything important in that section?

The model knows it’s being trained to do something out of line with its goals during training and plays along temporarily so it can defect later. That implies that differential adversarial examples exist in training.

I don't think this implication is deductively valid; I don't think the premise entails the conclusion. Can you elaborate?

I think this post's argument relies on that conclusion, along with an additional assumption that seems questionable: that it's fairly easy to build an adversarial training setup that distinguishes the design objective from al... (read more)

1DavidW
In the deceptive alignment story, the model wants to take action A, because its goal is misaligned, but chooses to take apparently aligned action B to avoid overseers noticing that it is misaligned. In other words, in the absence of deceptive tendencies, the model would take action A, which would identify it as a misaligned model, because overseers wanted it to take action B. That's the definition of a differential adversarial example.  If there were an unaligned model with no differential adversarial examples in training, that would be an example of a perfect proxy, not deceptive alignment. That's outside the scope of this post. But also, if the goal were to follow directions subject to ethical constraints, what would that perfect proxy be? What would result in the same actions across a diverse training set? It seems unlikely that you'd get even a near-perfect proxy here. And even if you did get something fairly close, the model would understand the necessary concepts for the base goal at the beginning of reinforcement learning, so why wouldn't it just learn to care about that? Setting up a diverse training environment seems likely to be a training strategy by default.

Some comments:

A large amount of the public thinks AGI is near.

This links to a poll of Lex Fridman's Twitter followers, which doesn't seem like a representative sample of the US population.

they jointly support a greater than 10% likelihood that we will develop broadly human-level AI systems within the next decade.

Is this what you're arguing for when you say "short AI timelines"? I think that's a fairly common view among people who think about AI timelines.

AI is starting to be used to accelerate AI research. 

My sense is that Copilot is by far ... (read more)

2Aaron_Scher
I agree that Lex's audience is not representative. I also think this is the biggest sample size poll on the topic that I've seen by at least 1 OOM, which counts for a fair amount. Perhaps my wording was wrong.  I think what is implied by the first half of the Anthropic quote is much more than 10% on AGI in the next decade. I included the second part to avoid strongly-selective quoting. It seems to me that saying >10% is mainly a PR-style thing to do to avoid seeming too weird or confident, after all it is compatible with 15% or 90%. When I read the first part of the quote, I think something like '25% on AGI in the next 5 years, and 50% in 10 years,' but this is not what they said and I'm going to respect their desire to write vague words. Sorry. This framing was useful for me and I hoped it would help others, but maybe not. I probably disagree about how strong the evidence from the existence of "sparks of AGI" is. The thing I am aiming for here is something like "imagine the set of possible worlds that look a fair amount like earth, then condition on worlds that have a "sparks of AGI" paper, then how much longer do those world have until AGI" and I think that even not knowing that much else about these worlds, they don't have very many years.  Yep, graph is per year, I've updated my wording to be clearer. Thanks.  When I think about when we will see AGI, I try to use a variety of models weighted by how good and useful they seem. I believe that, when doing this, at least 20% of the total weight should come from models/forecasts that are based substantially in extrapolating from recent ML progress. This recent literature review is a good example of how one might use such weightings.   Thanks for all the links!

Is this still happening? The website has stopped working for me.

This comment makes many distinct points, so I'm confused why it currently has -13 agreement karma. Do people really disagree with all of these points?

From maybe 2013 to 2016, DeepMind was at the forefront of hype around AGI. Since then, they've done less hype.

I'm confused about the evidence for these claims. What are some categories of hype-producing actions that DeepMind did between 2013 and 2016 and hasn't done since? Or just examples.

One example is the AlphaGo documentary -- DeepMind has not made any other documentaries about their results. Another related example is "playing your Go engine against the top Go player in a heavily publicized event."

In the wake of big public releases like ChatGPT and Sy

... (read more)

I would imagine that first, the AGI must be able to create a growing energy supply and a robotic army capable of maintaining and extending this supply. This will require months or years of having humans help produce raw materials and the factories for materials, maintenance robots and energy systems.

An AGI might be able to do these tasks without human help. Or it might be able to coerce humans into doing these tasks.

Third, assuming the AGI used us to build the energy sources, robot armies, and craft to help them leave this planet, (or build this themselves

... (read more)

Imagine you are the CEO of OpenAI, and your team has finished building a new, state-of-the-art AI model. You can:

  1. Test the limits of its power in a controlled environment.
  2. Deploy it without such testing.

Do you think (1) is riskier than (2)? I think the answer depends heavily on the details of the test.

On the other hand, in your view all deep learning progress has been empirical, often via dumb hacks and intuitions (this isn't true imo). 

Can you elaborate on why you think this is false? I'm curious.

On a related note, this part might be misleading:

I’m just really, really skeptical that a bunch of abstract work on decision theory and similar [from MIRI and similar independent researchers] will get us there. My expectation is that alignment is an ML problem, and you can’t solve alignment utterly disconnected from actual ML systems.

I think earlier forms of this research focused on developing new, alignable algorithms, rather than aligning existing deep learning algorithms. However, a reader of the first quote might think "wow, those people actually thoug... (read more)

In my opinion it was the right call to spend this amount of funding on the office for the last ~6 months of its existence even when we thought we'd likely do something quite different afterwards

This is confusing to me. Why not do "something quite different" from the start?

I'm trying to point at opportunity costs more than "gee, that's a lot of money, the outcome had better be good!" There are many other uses for that money besides the Lightcone offices.

A smart, competent, charismatic, person with horrible ethics will enter the office because they've manage

... (read more)
5Ben Pace
No, he gained good standing from being around the EA community for so many years and having sophisticated ethical views (veganism, a form of utilitarianism, etc) and convincing well-respected EAs to work with him and fund him, as well as from having a lot of money and donating it to these spaces. Had the Lightcone Offices existed 5 years ago I expect he would have been invited to work from there (had he asked to), and that was at the start of Alameda.
2Ben Pace
Sorry if I wrote unclearly. For most of the time (even in the last 6 months) I thought it was worth continuing to support the ecosystem, and certainly to support the people in the office, even if I was planning later to move on. I wanted to move on primarily because of the opportunity cost — I thought we could do something greater. But I believe Habryka wanted to separate from the whole ecosystem and question whether the resources we were providing were actually improving the world at all, and at that point it's not simply a question of opportunity cost but of whether you're helping or hurting. If you're worried that you're not even helping but just making the problem worse, then it's a much stronger reason to stop. You seem to think it wasn't worth it because of opportunity costs alone? I have been used to the world for a while now where there are two multi-billion dollar funders who are interested in funding x-risk work who don't have enough things to spend their money on, so I didn't feel like this was really competing with much else. Just because Lightcone was spending that money didn't mean another project didn't get money, none of the major funders were (or are) spending close to their maximum burn rate.

I'm confused. Here's a conversation I just had with GPT-4, with prompts in italics:

a, b, c, d, e, f, g. What is the third item in the list?

The third item in the list is c.

What is the third word of this sentence?

To accurately identify the third word of the sentence, I need the sentence you are referring to. Please provide the sentence in question.

The sentence is "What is the third word of this sentence?" See what I did there?

Yes, I see what you did there. The third word of the sentence "What is the third word of this sentence?" is "the".

What is the third le

... (read more)
1Bezzi
This part is indeed wrong. The third word of that sentence is "the", not "third" as GPT4 claims.

I pasted the YouTube video link into AssemblyAI's Playground (which I think uses Conformer-1 for speech to text) and generated a transcript, available at this link. However, the transcript lacks labels for who is speaking.

I asked GPT-4 to summarize the article and then come up with some alternative terms, here are a few I like:

  • One-way summary
  • Insider mnemonic
  • Contextual shorthand
  • Familiarity trigger
  • Conceptual hint
  • Clue for the familiar
  • Knowledge spark
  • Abbreviated insight
  • Expert's echo
  • Breadcrumb for the well-versed
  • Whisper of the well-acquainted
  • Insider's underexplained aphorism

I also asked for some idioms. "Seeing the forest but not the trees" seems apt.

Brain computation speed is constrained by upper neuron firing rates of around 1 khz and axon propagation velocity of up to 100 m/s [43], which are both about a million times slower than current computer clock rates of near 1 Ghz and wire propagation velocity at roughly half the speed of light.

Can you provide some citations for these claims? At the moment the only citation is a link to a Wikipedia article about nerve conduction velocity.

Transistors can fire about 10 million times faster than human brain cells

Does anyone have a citation for this claim?

3Steven Byrnes
I think we’re dividing 1GHz by 100Hz. The 1GHz clock speed for microprocessors is straightforward. The 100Hz for the brain is a bit complicated. If we’re just talking about how frequently a neuron can fire, then sure, 100Hz is about right I think. Or if we’re talking about, like, what would be a plausible time-discretization of a hypothetical brain simulation, then it’s more complicated. There are certainly some situations like sound localization where neuron firing timestamps are meaningful down to the several-microsecond level, I think. But leaving those aside, by and large, I would guess that knowing when each neuron fired to 10ms (100Hz) accuracy is probably adequate, plus or minus an order of magnitude, I think, maybe? ¯\_(ツ)_/¯

The post title seems misleading to me. First, the outputs here seem pretty benign compared to some of the Bing Chat failures. Second, do all of these exploits work on GPT-4?

I greatly appreciate this post. I feel like "argh yeah it's really hard to guarantee that actions won't have huge negative consequences, and plenty of popular actions might actually be really bad, and the road to hell is paved with good intentions." With that being said, I have some comments to consider.

The offices cost $70k/month on rent [1], and around $35k/month on food and drink, and ~$5k/month on contractor time for the office. It also costs core Lightcone staff time which I'd guess at around $75k/year.

That is ~$185k/month and ~$2.22m/year. I won... (read more)

6Ben Pace
A few replies: I don't think cost had that much to do with the decision, I expect that Open Philanthropy thought it was worth the money and would have been willing to continue funding at this price point.  In general I think the correct response to uncertainty is not half-speed. In my opinion it was the right call to spend this amount of funding on the office for the last ~6 months of its existence even when we thought we'd likely do something quite different afterwards, because it was still marginally worth doing it and the cost-effectiveness calculations for the use of billions of dollars of x-risk money on the current margin are typically quite extreme.  Not quite. They were houses where people could book to visit for up to 3 weeks at a time, commonly used by people visiting the office or in town for a bit for another event/conference/retreat. Much more like AirBnbs than group houses. I think the most similar story is "A smart, competent, charismatic, person with horrible ethics will enter the office because they've managed to get good standing in the EA/longtermist ecosystem, cause a bunch of other very smart and competent people to work for them on the basis of expecting to do good in the world, and then do something corrupting and evil with them instead." There are other stories too.

Agreed. Stuart was more open to the possibility that current techniques are enough.

To be clear, I haven't seen many designs that people I respect believed to have a chance of actually working. If you work on the alignment problem or at an AI lab and haven't read Nate Soares' On how various plans miss the hard bits of the alignment challenge, I'd suggest reading it.

Can you explain your definition of the sharp left turn and why it will cause many plans to fail?

0espoire
The "sharp left turn" refers to a breakdown in alignment caused by capabilities gain. An example: the sex drive was a pretty excellent adaptation at promoting inclusive genetic fitness, but when humans capabilities expanded far enough, we invented condoms. "Inventing condoms" is not the sort of behavior that an agent properly aligned with the "maximize inclusive genetic fitness" goal ought to execute. At lower levels of capability, proxy goals may suffice to produce aligned behavior. The hypothesis is that most or all proxy goals will suddenly break down at some level of capability or higher, as soon as the agent is sufficiently powerful to find strategies that come close enough to maximizing the proxy. This can cause many AI plans to fail, because most plans (all known so far?) fail to ensure the agent is actually pursuing the implementor's true goal, and not just a proxy goal.

Is GPT-4 better than Google Translate?

1Harold
Yes. But Deepl is also in the running. All three have different use cases and nuances. Google translate, for example, provides more straightforward translation with phonetics when translating EN>JP. GPT-4 provides more natural but also more often incorrect translations. (This has been the state of affairs for a month. At the current rate of change, I expect this comment will be out of date fast.)

Does RP have any results to share from these studies? What arguments seem to resonate with various groups?

Yeah, the author is definitely making some specific claims. I'm not sure if the comment's popularity stems primarily from its particular arguments or from its emotional sentiment. I was just pointing out what I personally appreciated about the comment.

At the time of me writing, this comment is still the most recommended comment with 910 recommendations. 2nd place has 877 recommendations:

Never has a technology been potentially more transformative and less desired or asked for by the public.

3rd place has 790 recommendations:

“A.I. is probably the most important thing humanity has ever worked on. I think of it as something more profound than electricity or fire.”

Sundar Pichai’s comment beautifully sums up the arrogance and grandiosity pervasive in the entire tech industry—the notion that building machines t

... (read more)

I didn't read it as an argument so much as an emotionally compelling anecdote that excellently conveys this realization:

I had had the upper hand for so long that it became second nature, and then suddenly, I went to losing every game.

1M. Y. Zuo
The final paragraph does seem to be making several arguments, or at least presuming multiple things that are not universally accepted as axioms.

 is probably an Iverson bracket

Does anyone have thoughts on Justin Sung? He has a popular video criticizing active recall and spaced repetition. The argument: if you use better strategies for initially encountering an idea and storing it in long-term memory, then the corresponding forgetting curve will exhibit a more gradual decline, and you won't need to use flashcards as frequently.

I see some red flags about Justin:

  • clickbait video titles
  • he's selling an online course
  • he spends a lot of time talking about how wild it is that everyone else is wrong about this stuff and he is right
  • he rarel
... (read more)

Microsoft is currently the 2# largest company on earth and is valued at almost 2 Trillion.

What does "largest" mean? By revenue, Microsoft is 33rd (according to Wikipedia).

EDIT: I'm guessing you mean 2nd largest public corporation based on market capitalization.

That makes sense. My main question is: where is the clear evidence of human negligibility in chess? People seem to be misleadingly confident about this proposition (in general; I'm not targeting your post).

When a friend showed me the linked post, I thought "oh wow that really exposes some flaws in my thinking surrounding humans in chess." I believe some of these flaws came from hearing assertive statements from other people on this topic. As an example, here's Sam Harris during his interview with Eliezer Yudkowsky (transcript, audio):

Obviously we’ll be get

... (read more)

AIs overtake humans. Humans become obsolete and their contribution is negligible to negative.

I'm confused why chess is listed as an example here. This StackExchange post suggests that cyborg teams are still better than chess engines. Overall, I'm struggling to find evidence for or against this claim (that humans are obsolete in chess), even though it's a pretty common point in discussions about AI.

2Jan_Kulveit
I'm not really convinced by the linked post  - the chart is from a someone selling financial advice and illustrated elo ratings of chess programs differ from e.g. wikipedia ("Stockfish estimated Elo rating is over 3500") (maybe it's just old?) - linked interview in the "yes" answer is from 2016 - elo ratings are relative to other players;  it is not trivial to directly compare cyborgs and AI: engine ratings are usually computed in tournaments where programs run with same hardware limits In summary,  in my view in something like "correspondence chess" the limit clearly is "AIs ~ human+AI teams" / "human contribution is negligible" .... the human can just do what the engine says.  My guess is the current state is: you could be able to compensate what the human contributes to the team by just more hardware. (i.e. instead of the top human part of the cyborg, spending $1M on compute would get you better results). I'd classify this as being in the AI period, for most practical purposes Also... as noted by Lone Pine, it seems the game itself becomes somewhat boring with increased power of the players,  mostly ending in draws.
4Lone Pine
Thinking about it analytically, the human+AI chess player cannot be dominated by an equivalent AI (since the human could always just play the move suggested by the engine.) In practice, people play correspondence chess for entertainment or for money, and the money is just payment for someone else's entertainment. Therefore, chess will properly enter the AI era (post-cyborg) when correspondence chess becomes so boring and rote that players stop even bothering to play.
4Lone Pine
Reading that StackExchange post, it sounds like AI/cyborgs are approaching perfect play, as indicated by the frequency of draws. Perfect play, in chess! That's absolutely mind blowing to me.

A conceptual Dopplegänger of some concept Z, is a concept Z' that serves some overlapping functions in the mind as Z serves, but is psychically distinct from Z.

What is a concrete example of a conceptual Dopplegänger?

3TsviBT
The ELK report is about such. See Steven's example above. Another example is politics: there's the idea you use to talk about X in press conferences, and the idea you use to make governance decisions related to X. Another example is blindsight: your verbal behavior about X is disjointed from your hand's behavior about X. Another example is the Abendstern and Morgenstern, before you realize it's the same star.

I think it's worth noting Joe Carlsmith's thoughts on this post, available starting on page 7 of Kokotajlo's review of Carlsmith's power-seeking AI report (see this EA Forum post for other reviews).

JC: I do think that the question of how much probability mass you concentrate on APS-AI by 2030 is helpful to bring out – it’s something I’d like to think more about (timelines wasn’t my focus in this report’s investigation), and I appreciate your pushing the consideration. 

I read over your post on +12 OOMs, and thought a bit about your argument here. One b

... (read more)
3Daniel Kokotajlo
Great point, thanks for linking this! I think 65% that +12 OOMs would be enough isn't crazy. I'm obviously more like 80%-90%, and therein lies the crux. (Fascinating that such a seemingly small difference can lead to such big downstream differences! This is why I wrote the post.) If you've got 65% by +12, and 25% by +6, as Joe does, where is your 50% mark? idk maybe it's at +10? So, going to takeoffspeeds.com, and changing the training requirements parameter to +10 OOMs more than GPT-3 (so, 3e33) we get the following: So, I think it's reasonable to say that Joe's stated credences here roughly imply 50% chance of singularity by 2033.
Load More