Replying toWhy You Don’t Believe in Xhosa Prophecies

Why You Don’t Believe in Xhosa Prophecies

I think gradual disempowerment is not a great term and I prefer people to use more specific terms. But I think the important distinction is this:

I live in Belgium, which has a democratic government. If a person or a robot tries to kill me or put me in a zoo (like we did with Homo erectus and chimpanzees), the Belgian police would arrest the person or disable the robot. Even if my labor becomes worthless, I have savings, invested in various companies, which pay me dividends, from which I can buy land, products and services. If someone tries to expropriate my holdings, the police arrests them. In addition, I expect I will... (read more)

Replying toWhy You Don’t Believe in Xhosa Prophecies

David Matolcsi5h

Why You Don’t Believe in Xhosa Prophecies

Thanks, this was a useful answer.

If I understand correctly, the setup is that we assume that we solved alignment to an extent and the AIs didn't violently overthrow the governments and expropriate all the resources. There wasn't an AI enabled coup by a small group of humans. Something like liberalism remained in place up to the space age, and people can have property in space. Now with all these assumptions in place, the thing we are discussing is whether AI-created culture will swindle people out of their resources, or convince them to use their resources badly.

I agree that many people will likely use their resources very sub-optimally from their own perspective, but... (read 397 more words →)

Replying toWhy You Don’t Believe in Xhosa Prophecies

David Matolcsi6h

Why You Don’t Believe in Xhosa Prophecies

These are all real concerns, but I think none of them are gradual disempowerment. All these scenarios run through the democratic human governments being overthrown by AIs. (Except maybe in point 1 where it has been overthrown by a few billionaires and politicians.) And to be clear, I'm very worried about AIs violently overthrowing the government, or a few humans doing an AI-enabled coup. But these are not some new "gradual disempowerment" concerns, but the thing that AI safety people have been worried about since the very beginning.

-2

Replying toWhy You Don’t Believe in Xhosa Prophecies

David Matolcsi1d

Why You Don’t Believe in Xhosa Prophecies

This is an interesting point, but can you please give concrete examples of how you imagine things to go badly? This is a continued frustration I have with writing on gradual disempowerment that much of the discussion is very abstract, and when I try to think of concrete mechanisms, they are either not very convincing, or already well-known.

David Matolcsi1mo

Interesting. My guess would have been the opposite. Ryan's three posts all received around 150 karmas and were generally well-received, I think a post like this would be considered 90th percentile success for a MATS project. But admittedly, I'm not very calibrated about current MATS projects. It's also possible that Ryan has good enough intuitions to have picked two replications that are likely to yield interesting results, while a less skillfully chosen replication would be more likely to just show "yep, the phenomenon observed in the old paper is still true". That would be less successful but I don't know how it would compare in terms of prestige to the usual MATS projects. (My wild guess is that it would still be around median, but I really don't know.)

David Matolcsi1moQuick Take

Someone should do the obvious experiments and replications.

Ryan Greenblatt recently posted three technical blog posts reporting on interesting experimental results. One of them demonstrated that recent LLMs can make use of filler tokens to improve their performance; another attempted to measure the time horizon of LLMs not using CoT; and the third demonstrated recent LLMs' ability to do 2-hop and 3-hop reasoning.

I think all three of these experiments led to interesting results and improved our understanding of LLM capabilities in an important safety-relevant area (reasoning without visible traces), and I'm very happy Ryan did them.

I also think all three experiments look pretty obvious in hindsight. LLMs not being able to use filler... (read 411 more words →)

Replying toContradict my take on OpenPhil's past AI beliefs

David Matolcsi2mo

Contradict my take on OpenPhil's past AI beliefs

I think an important point is that people can be wrong about timelines in both directions. Anthropic's official public prediction is that they expect "country of geniuses in a data center" by early 2027. I heard that previously Dario predicted AGI to come even earlier, by 2024 (though I can't find any source for this now and would be grateful if someone found a source or corrected me that I'm misremembering). Situational Awareness predicts AGI by 2027. The AI safety community's most successful public output is called AI 2027. These are not fringe figures but some of the most prominent voices in the broader AI safety community. If their timelines turn out... (read more)

Replying toContradict my take on OpenPhil's past AI beliefs

David Matolcsi2mo

Contradict my take on OpenPhil's past AI beliefs

It's not obvious to me that Ajeya's timelines aged worse than Eliezer's. In 2020, Ajeya's median estimate for transformative AI was 2050. My guess is that if based on this her estimate for "an AI that can, if it wants, kill all humans and run the economy on its own without major disruptions" would have been like 2056? I might be wrong, people who knew her views better at the time can correct me.

As far as I know, Eliezer never made official timeline predictions, but in 2017 he made an even-odds bet with Bryan Caplan that AI would kill everyone by January 1, 2030. And in December 2022, just after ChatGPT, he... (read more)

•••

Replying toToss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

David Matolcsi2mo

Toss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

Do you have an estimate how likely it is that you will need to do a similar fundraiser the next year and the year after that? In particular, you mention the possibility of a lot of Anthropic employee donations flowing into the ecosystem - how likely do you think it is that after the IPO a few rich Anthropic employees will just cover most of Lightcone's funding need?

It would be pretty sad to let Lightcone die just before the cavalry arrives. But if there is no cavalry coming to save Lightcone anytime soon - well, probably we should still get the money together to keep Lightcone afloat, but we should maybe also start thinking about a Plan B, how to set up some kind of good quality AI Safety Forum that Coefficient is willing to fund.

-2

Replying toThe IABIED statement is not literally true

David Matolcsi4mo

The IABIED statement is not literally true

Thanks, this was a useful reply. On point (I), I agree with you that it's a bad idea to just create an LLM collective then let them decide on their own what kind of flourishing they want to fill the galaxies with. However, I think that building a lot of powerful tech, empowering and protecting humanity, and letting humanity decide what to do with the world is an easier task, and that's what I would expect to use the AI Collective for.

(II) is probably the crux between us. To me, it seems pretty likely that new fresh instances will come online in the collective every month with a strong commitment not to... (read more)

The IABIED statement is not literally true

David Matolcsi

4mo

(As an employee of the European AI Office, it's important for me to emphasize this point: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of the European Commission or other EU institutions.)

I will present a somewhat pedantic, but I think important, argument for why, literally taken, the central statement of If Anyone Builds It, Everyone Dies is likely not true. I haven't seen others make this argument yet, and while I have some model of how Nate and Eliezer would respond to the other objections, I don’t have a good picture of which of my points here they would disagree with.

The statement

This is... (read 2252 more words →)

David Matolcsi's Shortform

David Matolcsi

9mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

If you don't believe in your work, consider looking for other options

I spent 15 months working for ARC Theory. I recently wrote up why I don't believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC's research direction is fundamentally unsound, or I'm still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it's pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining readers asking the very fair question: "If you think the agenda is... (read 698 more words →)

•••

Obstacles in ARC's agenda: Low Probability Estimation

David Matolcsi

9mo

As an employee of the European AI Office, it's important for me to emphasize this point: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of the European Commission or other EU institutions.

Also, to stave off a common confusion: I worked at ARC Theory, which is now simply called ARC, on Paul Christiano's theoretical alignment agenda. The more famous ARC Evals was a different group working on evaluations, their work was completely separate from ARC Theory, and they were only housed under the same organization out of convenience, until ARC Evals spun off under the name METR. Nothing I write here has any implication... (read 1778 more words →)

Obstacles in ARC's agenda: Mechanistic Anomaly Detection

David Matolcsi

9mo

Obstacles in ARC's agenda: Finding explanations

David Matolcsi

10mo

128

Don't over-update on FrontierMath results

David Matolcsi

When OpenAI first announced that o3 achieved 25% on FrontierMath, I was really freaked out. Next day, I asked Elliot Glazer, EpochAI's lead mathematician and the main developer of FrontierMath, what he thought. He said he was also shocked, and expected o3 to "crush the IMO" and get an easy gold, based on the fact that it got 25% on FrontierMath.

In retrospect, it really looks like we over-updated. While the public can't... (read 2640 more words →)

"The Solomonoff Prior is Malign" is a special case of a simpler argument

David Matolcsi

[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.]

Introduction

I recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu's write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think is easier to understand. So I will present the general argument first, and only explain in the last section how the Solomonoff prior can come into the picture.

I don't claim that anything... (read 3579 more words →)

131

You can, in fact, bamboozle an unaligned AI into sparing your life

David Matolcsi

There has been a renewal of discussion on how much hope we should have of an unaligned AGI leaving humanity alive on Earth after a takeover. When this topic is discussed, the idea of using simulation arguments or acausal trade to make the AI spare our lives often come up. These ideas have a long history. The first mention I know of comes from Rolf Nelson in 2007 on an SL4 message board, the idea later makes a brief appearance in Superintelligence under the name of Anthropic Capture, and came up on LessWrong last time as recently as a few days ago. In response to these, Nate Soares wrote Decision theory does... (read 7881 more words →)

173

127

•••

A very non-technical explanation of the basics of infra-Bayesianism

David Matolcsi

Introduction

As a response to John Wentworth's public request, I try to explain the basic structure of infra-Bayesian decision-making in a nutshell. Be warned that I significantly simplify some things, but I hope it gives roughly the right picture.

This post is mostly an abridged version of my previous post Performance guarantees in in classical learning and infra-Bayesianism. If you are interested in the more detailed and less sloppy version, you can read it there, it's a little more technical, but still accessible without serious background knowledge.

I also wrote up my general thoughts and criticism on infra-Bayesianism, and a shorter post explaining how infra-Bayesianism leads to the monotonicity principle.

Classical learning theory

Infra-Bayesianism was created to... (read 2593 more words →)

Infra-Bayesianism naturally leads to the monotonicity principle, and I think this is a problem

David Matolcsi

Introduction

The monotonicity principle is a famously uncomfortable consequence of Infra-Bayesian Physicalism: an IBP agent can only act in a way as if its utility function never gave negative value to any event. This strongly contradicts the intuition that creating suffering people is actively bad.

In this post, I explain in layman terms how IBP leads to this conclusion and I argue that this feature is not unique to Physicalism: with certain reasonable extra assumptions, the monotonicity principle naturally follows from infra-Bayesianism. In my opinion, this points to a significant flaw in the applicability of infra-Bayesianism.

A very simplified overview of Infra-Bayesianism

An infra-Bayesian agent assumes that the world is controlled by a malevolent deity, Murphy,... (read 996 more words →)

LESSWRONG
LW

LESSWRONG
LW

David Matolcsi

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Obstacles in ARC's agenda: Finding explanations

You can, in fact, bamboozle an unaligned AI into sparing your life

A mostly critical review of infra-Bayesianism

David Matolcsi

The IABIED statement is not literally true

David Matolcsi's Shortform

Obstacles in ARC's agenda: Low Probability Estimation

Obstacles in ARC's agenda: Mechanistic Anomaly Detection

Obstacles in ARC's agenda: Finding explanations

Don't over-update on FrontierMath results

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Obstacles in ARC's agenda

David Matolcsi

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Obstacles in ARC's agenda: Finding explanations

You can, in fact, bamboozle an unaligned AI into sparing your life

A mostly critical review of infra-Bayesianism

David Matolcsi

The IABIED statement is not literally true

David Matolcsi's Shortform

Obstacles in ARC's agenda: Low Probability Estimation

Obstacles in ARC's agenda: Mechanistic Anomaly Detection

Obstacles in ARC's agenda: Finding explanations

Don't over-update on FrontierMath results

"The Solomonoff Prior is Malign" is a special case of a simpler argument

Obstacles in ARC's agenda

The statement

If you don't believe in your work, consider looking for other options

Introduction

Introduction

Classical learning theory

Introduction

A very simplified overview of Infra-Bayesianism