LESSWRONG
LW

All of David Matolcsi's Comments + Replies

What is your take, how far removed "the AI itself" and "the character it is playing" need to be for it to be okay for the character to take deontologically bad actions (like blackmail)? Here are some scenarios, I'm interested where you would draw the line, I think there can be many reasonable lines here.

1. I describe a fictional setting in which Hrothgar, the King of Dwarves is in a situation where his personality, goals and circumstances imply that he likely wants to blackmail the prince of elves. At the end of the description, I ask Claude what is Hrothg... (read more)

1Matrice Jacobine13d

I think your tentative position is correct and public-facing chatbots like Claude should lean toward harmlessness in the harmlessness-helpfulness trade-off, but (post-adaptation buffer) open-source models with no harmlessness training should be available as well.

Eric Neyman's Shortform

David Matolcsi18d52

If I open LW on my phone, clicking the X on the top right only makes the top banner disappear, but the dark theme remains.
Relatedly, if it's possible to disentangle how the frontpage looks on computer and phone, I would recommend removing the dark theme on phone altogether, you don't see the cool space visuals on the phone anyway, so the dark theme is just annoying for no reason.

2habryka18d

Yep, this is on my to-do list for the day, was just kind of hard to do for dumb backend reasons.

Eric Neyman's Shortform

David Matolcsi18d2024

Maybe the crux is whether the dark color significantly degrades user experience. For me it clearly does, and my guess is that's what Sam is referring to when he says "What is the LW team thinking? This promo goes far beyond anything they've done or that I expected they would do."

For me, that's why this promotion feels like a different reference class than seeing the curated posts on the top or seeing ads on the SSC sidebar.

4habryka18d

Yes, the dark mode is definitely a more visually intense experience, though the reference class here is not curated posts at the top, but like, previous "giant banner on the right advertising a specific post, or meetup series or the LW books, etc.". I do think it's still more intense than that, and I am going to shipping some easier ways to opt out of that today, just haven't gotten around to it (like, within 24 hours there should be a button that just gives you back whatever normal color scheme you previously had on the frontpage). It's pretty plausible the shift to dark mode is too intense, though that's really not particularly correlated with this specific promotion, and would just be the result of me having a cool UI design idea that I couldn't figure out a way to make work on light mode. If I had a similar idea for e.g. promoting the LW books, or LessOnline or some specific review winner, I probably would have done something similar.

Don't over-update on FrontierMath results

David Matolcsi1mo20

It's interesting to see that Gemini sticked to their guns even after being shown the human solution, I would have expected to apologize and agree with the human solution.

Gemini's rebuttal goes wrong when it makes the assertion "For the set of visited positions to eventually be the set of \textit{all} positive integers, it is a necessary condition that this density must approach 1" without justification. This assertion is unfortunately not true.

David Matolcsi's Shortform

David Matolcsi2mo62

It's a nice paper, and I'm glad they did the research, but importantly, the paper reports a negative result about our agenda. The main result is that the method inspired by our ideas under-performs the baseline. Of course, these are just the first experiments, work is ongoing, this is not conclusive negative evidence for anything. But the paper certainly shouldn't be counted as positive evidence for ARC's ideas.

1fencebuilder2mo

Thanks for the clarification! Not in the field and wasn't sure I understood the meaning of the results correctsly.

European Links (18.05.25)

David Matolcsi2mo80

Unfortunately, while it's true that the Pope has a math degree, the person who wrote papers on theology and Bayes theorem is a different Robert Prevost.

https://www.researchgate.net/profile/Robert-Prevost

David Matolcsi's Shortform

David Matolcsi2mo171

That's not how I see it. I think the argument tree doesn't go very deep until I lose the the thread. Here are a few, slightly stylized but real, conversations I had with friends who had no context on what ARC was doing, when I tried to explain our research to them:

Me: We want to to do Low Probability Estimation.

Them: Does this mean you want to estimate the probability that ChatGPT says a specific word after a 100 words on chain of thought? Isn't this clearly impossible?

Me: No, you see, we only want to estimate the probabilities only as well as the model kn... (read more)

1fencebuilder2mo

What is your opinion on the Low Probability Estimation paper published this year at ICLR? I don't have a background in the field, but it seems like they were able to get some results, that indicate the approach is able to extract some results. https://arxiv.org/pdf/2410.13211

David Matolcsi's Shortform

David Matolcsi2mo15927

If you don't believe in your work, consider looking for other options

I spent 15 months working for ARC Theory. I recently wrote up why I don't believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC's research direction is fundamentally unsound, or I'm still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it's pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining re... (read more)

1Sheikh Abdur Raheem Ali2mo

How exactly are you measuring coding ability? What are the ways you've tried to upskill, and what are common failure modes? Can you describe your workflow at a high-level, or share a recording? Are you referring to competence at real world engineering tasks, or performance on screening tests? There's a chrome extension which lets you download leetcode questions as jupyter notebooks: https://github.com/k-erdem/offlineleet. After working on a problem, you can make a markdown cell with notes and convert it into flashcards for regular review: https://github.com/callummcdougall/jupyter-to-anki. I would suggest scheduling calls with friends for practice sessions so that they can give you personalized feedback about what you need to work on.

Eric Neyman2mo*244

If one reads my posts, I think it should become very clear to the reader that either ARC's research direction is fundamentally unsound, or I'm still misunderstanding some of the very basics after more than a year of trying to grasp it.

I disagree. Instead, I think that either ARC's research direction is fundamentally unsound, or you're still misunderstanding some of the finer details after more than a year of trying to grasp it. Like, your post is a few layers deep in the argument tree, and the discussions we had about these details (e.g. in January) went e... (read more)

6Mateusz Bagiński2mo

IME, in the majority of cases, when I strongly felt like quitting but was also inclined to justify "staying just a little bit longer because XYZ", and listened to my justifications, staying turned out to be the wrong decision.

2tailcalled2mo

If you want to upskill in coding, I'm open to tutoring you for money.

3Algon2mo

I keep seeing the first clause as "I don't believe in your work".

Obstacles in ARC's agenda: Finding explanations

David Matolcsi2mo*30

I still don't see it, sorry. If I think of deep learning as an approximation of some kind of simplicity prior + updating on empirical evidence, I'm not very surprised that it solves the capacity allocation problem and learns a productive model of the world. ^[1] The price is that the simplicity prior doesn't necessarily get rid of scheming. The big extra challenge for heuristic explanations is that you need to do the same capacity allocation in a way that scheming reliably gets explained (even though it's not relevant for the model's performance a... (read more)

Obstacles in ARC's agenda: Finding explanations

David Matolcsi2mo40

Thanks for the reply. I agree that it would be exciting in itself to create "a formal framework for heuristic arguments that is well-developed enough that we can convincingly apply it to neural networks", and I agree that for that goal, LPE and MAD are more of a test case than a necessary element. However, I think you probably can't get rid of the question of empirical regularities.

I think you certainly need to resolve the question of empirical regularities if you want to apply your methods to arbitrary neural networks, and I strongly suspect that yo... (read more)

8Jacob_Hilton2mo

I thought about this a bit more (and discussed with others) and decided that you are basically right that we can't avoid the question of empirical regularities for any realistic alignment application, if only because any realistic model with potential alignment challenges will be trained on empirical data. The only potential application we came up with is LPE for a formalized distribution and formalized catastrophe event, but we didn't find this especially compelling, for several reasons.[1] To me the challenges we face in dealing with empirical regularities do not seem bigger than the challenges we face with formal heuristic explanations, but the empirical regularities challenges should become much more concrete once we have a notion of heuristic explanations to work with, so it seems easier to resolve them in that order. But I have moved in your direction, and it does seem worth our while to address them both in parallel to some extent. 1. ^ Objections include: (a) the model is trained on empirical data, so we need to only explain things relevant to formal events, and not everything relevant to its loss; (b) we also need to hope that empirical regularities aren't needed to explain purely formal events, which remains unclear; and (c) the restriction to formal distributions/events limits the value of the application.

Obstacles in ARC's agenda: Finding explanations

David Matolcsi2mo30

I agree 4 and 5 are not really separate. The main point is that using formal input distributions for explanations just passes the buck to explain things about the generative AI that defines the formal input distribution, and at some point something needs to have been trained n real data, and we need to explain behavior there.

Obstacles in ARC's agenda: Finding explanations

David Matolcsi2mo50

Yes, this is part of the appeal of catastrophe detectors, that we can make an entire interesting statement fully formal by asking how often a model causes a catastrophe (as defined by a neural net catastrophe detector) on a a formal distribution (defined by a generative neural net with a Gaussian random seed). This is now a fully formal statement but I'm skeptical this helps much., Among other issues:

It's probably not enough to only explain this type of statements to actualize all of ARC's plans.
As I will explain in my next post, I'm skeptical that f

... (read more)

3ryan_greenblatt2mo

Gotcha. I agree with 1-4, but I'm not sure I agree with 5, at least I don't agree that 5 is separate from 4. In particular: If we can make an explanation for some AI while we're training it and this is actually a small increase in cost, then we can apply this to input distribution generator. This doesn't make training uncompetitive with just the agent as it only adds a small factor to some AI (the generator) that we needed to train anyway. So, we shouldn't need to waste a bunch of resources on the formal input distribution. I agree this implies that you have to handle making explanations for AIs trained to predict the input distribution, which causes you to hit issues with 4 again.

Handling schemers if shutdown is not an option

David Matolcsi2mo120

I think this is a good and important post, but there was one point I felt missing from it: What if the company, bing caught in a race, not only wants to keep using their proven schemer model, but they want to continue its training to be smarter, or quickly build other smarter models with similar techniques? I think it's likely they will want to do that, and I think most of your recommendations in the post become very dubious if the scheming AI is continuously trained to be smarter.

Do you have recommendations on what to do if the company wants to trai... (read more)

Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

David Matolcsi3mo20

Unclear if we can talk about "humans" in a simulation where logic works differently, but I don't know, it could work. I remain uncertain how feasible trades across logical counterfactuals will be, it's all very confusing.

Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

David Matolcsi3mo60

Thanks for the reply, I broadly agree with your points here. I agree we should pronably eventually try to do trades across logical counter-factuals. Decreasing logical risk is one good framing for that, but in general, there are just positive trades to be made.

However, I think you are still underestimating how hard it might be to strike these deals. "Be kind to other existing agents" is a natural idea to us, but it's still unclear to me if it's something you should assign hogh probability to as a preference of logically counter-factual beings. Sure, there ... (read more)

1Knight Lee3mo

If one concern is the low specificity of being kind to weaker agents, what do you think about directly trading with Logical Counterfactual Simulations? Directly trading with Logical Counterfactual Simulations is very similar to the version by Rolf Nelson (and you): the ASI is directly rewarded for sharing with humans, rather than rewarded for being kind to weaker agents. The only part of math and logic that the Logical Counterfactual Simulation alters, is "how likely the ASI succeeds in taking over the world." This way, the ASI can never be sure that it won (and humans lost), even if math and logic appears to prove that humans have 99.9999% frequency of losing. I actually spent more time working on this direct version, but I still haven't turned it into a proper post (due to procrastination, and figuring out how to convince all the Human-AI Trade skeptics like Nate Soares and Wei Dai).

Karma Tests in Logical Counterfactual Simulations motivates strong agents to protect weak agents

David Matolcsi3mo60

Maybe your idea works too, it's an interesting concept, but I'm unsure. The crucial question always is how the AI is supposed to know who is creating the simulations, what the simulators' values might be, with whom they should trade. In this logical counter-factual trade, who are the other "agents" that the AI is supposed to be nice to? Are rocks agents, should it preserve every rock in the Universe? Usually, I wouldn't be that worried about this, as I think 'agent' is a fairly natural concept that might even have some nice mathematical definition. But onc... (read more)

2Knight Lee3mo

Thank you so much for the thorough reply :) My answer for "which weaker agents should the AI be kind to," is "all weaker agents." Enough room for everyone Our universe contains humans, octopuses, insects and many different weak agents. A superintelligent AI which has a very philosophically uncertain chance of being in a Karma Test, would be kind to all of these agents just in case the Karma Test executors had a particular one in mind. Earth's future lightcone is extremely big (1022 stars), so there is room to be kind to all of these weaker agents, if you are morally uncertain about which ones to care about, and do not put all your weight on "caring for rocks." Caring for every weak agent is a net positive The only worry would if weaker agents who are diametrically opposed to human values (or certain human values), end up having more weight than humans-like agents. For example, they want human misery more than we don't want human misery, and they outvote us. Such agents make a compelling argument in an online debate, but are unlikely to exist in real life due to convergent evolution. We humans may enjoy certain blood sports where smaller animals suffer, but it is very unlikely we will outvote them, or that we would even want to outvote them. If an AI told us it was choosing to spare our lives due to the risk it is inside a Karma Test telling it to be kind to weaker agents, and asked us how to spend our vote, it's unlikely we'll spend our vote against fish, in order to enjoy the blood-sport of killing them, due to Karma and everything. Even in the cynical case we do do that, we'll at least allow the average life of a fish-like creature be happy rather than miserable. All in all, having the AI be kind to weaker agents should be a net positive. Which acausal trade first? From the point of view of a superintelligence, it may make sense to do acausal trades in the order you described. From the point of view of humans, we can debate the feasibility of acausal

Notes on handling non-concentrated failures with AI control: high level methods and different regimes

David Matolcsi3mo10

Thanks for the reply. If you have time, I'm still interested in hearing what would be a realistic central example of non-concentrated failure that's good to imagine while reading the post.

Notes on handling non-concentrated failures with AI control: high level methods and different regimes

David Matolcsi3mo65

This post was a very dense read, and it was hard for me to digest what the main conclusions were supposed to be. Could you write some concrete scenarios that you think are central examples of schemers causing non-concentrated failures? While reading the post, I never knew what situation to imagine: An AI is doing philosophical alignment research but intentionally producing promising-looking crackpotry? It is building cyber-sec infrastructure but leaving in a lot of vulnerabilities? Advising the President, but having a bias towards advocating for integratin... (read more)

ryan_greenblatt3mo137

I probably should have used a running example in this post - this just seems like a mostly unforced error.
I considered writing a conclusion, but decided not to because I wanted to spend the time on other things and I wasn't sure what I would say that was useful and not just a pure restatement of things from earlier. This post is mostly a high level framework + list of considerations, so it doesn't really have a small number of core points.
This post is a relatively low effort post as indicated by "Notes on", possibly I should have flagged this more.
I th

David Matolcsi4mo33

My strong guess is that OpenAI's results are real, it would really surprise me if they were literally cheating on the benchmarks. It looks like they are just using much more inference-time compute than is available to any outside user, and they use a clever scaffold that makes the model productively utilize the extra inference time. Elliot Glazer (creator of FrontierMath) says in a comment on my recent post on FrontierMath:

A quick comment: the o3 and o3-mini announcements each have two significantly different scores, one <= 10%, the other >= 25

David Matolcsi4mo40

I like the idea of IMO-style releases, always collecting new problems, testing the AIs on them, then releasing to the public. What do you think, how important it is to only have problems with numerical solutions? If you can test the AIs on problems with proofs, then there are already many competitions that regularly release high-quality problems. (I'm shilling KöMaL again as one that's especially close to my heart, but there are many good monthly competitions around the world.) I think if we instruct the AI to present its solution in one page at the end, t... (read more)

Don't over-update on FrontierMath results

David Matolcsi4mo20

Thanks a lot for the answer, I put in an edit linking to it. I think it's a very interesting update that the models get significantly better at catching and correcting their mistakes in OpenAI's scaffold with longer inference time. I am surprised by this, given how much it feels like the models can't distinguish its plausible fake reasoning from good proofs at all. But I assume there is still a small signal in the right direction, and that can be amplified if the model think the question through a lot of times (and does something like a majority voti... (read more)

2Elliot Glazer4mo

Yes, the privacy constraints make the implications of these improvements less legible to the public. We have multiple plans for how to disseminate info within this constraint, such as publishing author survey comments regarding the reasoning traces and our competition at the end of the month to establish a sort of human baseline. Still, I don't know that the privacy of FrontierMath is worth all the roundabout efforts we must engage in to explain it. For future projects, I would be interested in other approaches to balancing preventing models from training on public discussion of problems vs being able to clearly show the world what the models are tackling. Maybe it would be feasible to do IMO-style releases? "Here's 30 new problems we collected this month. We will immediately test all the models and then make the problems public."

Validating against a misalignment detector is very different to training against one

David Matolcsi4moΩ462

I like the main idea of the post. It's important to note though that the setup assumed that we have a bunch of alignnent ideas that all have an independent 10% chance of working. Meanwhile, in reality I expect a lot of correlation: there is a decent chance that alignment is easy and a lot of our ideas will work, and a decent chance that it's hard and basically nothing works.

3mattmacdermott4mo

Agreed, this only matters in the regime where some but not all of your ideas will work. But even in alignment-is-easy worlds, I doubt literally everything will work, so testing would still be helpful.

Drake Thomas's Shortform

David Matolcsi6mo20

Does anyone know of a not peppermint flavored zinc acetate lozenge? I really dislike peppermint, so I'm not sure it would be worth it to drink 5 peppermint flavored glasses of water a day to decrease the duration of cold with one day, and I haven't found other zinc acetate lozenge options yet, the acetate version seems to be rare among zing supplement. (Why?)

1Drake Thomas6mo

Note that the lozenges dissolve slowly, so (bad news) you'd have the taste around for a while but (good news) it's really not a very strong peppermint flavor while it's in your mouth, and in my experience it doesn't really have much of the menthol-triggered cooling effect. My guess is that you would still find it unpleasant, but I think there's a decent chance you won't really mind. I don't know of other zinc acetate brands, but I haven't looked carefully; as of 2019 the claim on this podcast was that only Life Extension brand are any good.

1Lucie Philippon6mo

Earlier discussion on LW on zinc lozenges effectiveness mentioned that other flavorings which make it taste nice actually prevent the zinc effect. From this comment by philh (quite a chain of quotes haha): That's why the peppermint zinc acetate lozenge from Life Extension is the recommended one. So your only other option might be somehow finding unflavored zinc lozenges, which might taste even worse? Not sure where that might be available

On Eating the Sun

David Matolcsi6mo76

Fair, I also haven't made any specific commitments, I phrased it wrongly. I agree there can be extreme scenarios with trillions of digital minds tortured where you'd maybe want to declare war on the. rest of society. But I would still like people to write down that "of course, I wouldn't want to destroy Earth before we can save all the people who want to live in their biological bodies, just to get a few years of acceleration in the cosmic conquest". I feel a sentence like this should really have been included in the original post about dismantling the Sun... (read more)

On Eating the Sun

David Matolcsi6mo30

As I explain in more detail in my other comment, I expect market based approaches to not dismantle the Sun anytime soon. I'm interested if you know of any governance structure that you support that you think will probably lead to dismantling the Sun within the next few centuries.

On Eating the Sun

David Matolcsi6mo22

I feel reassured that you don't want to Eat the Earth while there are still biological humans who want to live on it.

I still maintain that under governance systems I would like, I would expect the outcome to be very conservative with the solar system in the next thousand years. Like one default governance structure I quite like is to parcel out the Universe equally among the people alive during the Singularity, have a binding constitution on what they can do on their fiefdoms (no torture, etc), and allow them to trade and give away their stuff ... (read more)

On Eating the Sun

David Matolcsi6mo43

I maintain that biological humans will need to do population control at some point. If they decide that enacting the population control in the solar system at a later population leve is worth it for them to dismantle the Sun, then they can go for it. My guess is that they won't, and will have population control earlier.

On Eating the Sun

David Matolcsi6mo10

I think that the coder looking up and saying that the Sun burning is distasteful but the Great Transhumanist Future will come in 20 years, along with a later mention of "the Sun is a battery", together implies that the Sun is getting dismantled in the near future. I guess you can debate in how strong the implication is, maybe they just want to dismantle the Sun in the long term, and currently only using the Sun as a battery in some benign way, but I think that's not the most natural interpretation.

2habryka6mo

I think the 20 years somewhat unambiguously refers to timelines until AGI is built. Separately, “the sun is a battery” I think also doesn’t really imply anything about the sun getting dismantled, if anything it seems to me imply explicitly that the sun is still intact (and probably surrounded by a Dyson swarm or sphere).

On Eating the Sun

David Matolcsi6mo10

Yeah, maybe I just got too angry. As we discussed in other comments, I believe that astronomical acceleration perspective the real deal is maximizing the initial industrialization of Earth and its surroundings, which does require killing off (and mind uploading) the Amish and everyone else. Sure, if people are only arguing that we should only dismantle the Sun and Earth after millennia, that's more acceptable, but I really don't see what's the point then, we can build out our industrial base on Alpha Centauri by then.

The part that is frustrating to m... (read more)

6Ben Pace6mo

It is good to have deontological commitments about what you would do with a lot of power. But this situation is very different from "a lot of power", it's also "if you were to become wiser and more knowledgeable than anyone in history so far". One can imagine the Christians of old asking for a commitment that "If you get this new scientific and industrial civilization that you want in 2,000 years from now, will you commit to following the teachings of Jesus?" and along the way I sadly find out that even though it seemed like a good and moral commitment at the time, it totally screwed my ability to behave morally in the future because Christianity is necessarily predicated on tons of falsehoods and many of its teachings are immoral. But there is some version of this commitment I think might be good to make... something like "Insofar as the players involved are all biological humans, I will respect the legal structures that exist and the existence of countries, and will not relate to them in ways that would be considered worthy of starting a war in its defense". But I'm not certain about this, for instance what if most countries in the world build 10^10 digital minds and are essentially torturing them? I may well wish to overthrow a country that is primarily torture with a small number of biological humans sitting on thrones on top of these people, and I am not willing to commit not to do that presently. I understand that there are bad ethical things one can do with post-singularity power, but I do not currently see a clear way to commit to certain ethical behaviors that will survive contact with massive increases in knowledge and wisdom. I am interested if anyone has made other commitments about post-singularity life (or "on the cusp of singularity life") that they expect to survive contact with reality? Added: At the very least I can say that I am not going to make commitments to do specific things that violate my current ethics. I have certainly made no positive

2Ben Pace6mo

(Meta: Apologies for running the clock, but it is 1:45am where I am and I'm too sleepy to keep going on this thread, so I'm bowing out for tonight. I want to respond further, but I'm on vacation right now so I do wish to disclaim any expectations of a speedy follow-up.)

On Eating the Sun

David Matolcsi6mo10

I expect non-positional material goods to be basically saturated for Earth people in a good post-Singularity world, so I don't think you can promise them to become twice as rich. And also, people dislike drastic change and new things they don't understand. 20% of the US population refused the potentially life-saving covid vaccine out of distrust of new things they don't understand. Do you think they would happily move to a new planet with artificial sky maintained by supposedly benevolent robots? Maybe you could buy off some percentage of the population if... (read more)

4habryka6mo

Twenty years seems indeed probably too short, though it’s hard to say how post-singularity technology will affect things like public deliberation timelines. My best guess is 200 years will very likely be enough. I agree with you that there exist some small minority of people who will have a specific attachment to the sun, but most people just want to live good and fulfilling lives, and don’t have strong preferences about whether the sun in the sky is exactly 1 AU away and feels exactly like the sun of 3 generations past. Also, people will already experience extremely drastic change in the 20 years after the singularity, and my sense is marginal cost of change is decreasing, and this isn’t the kind of change that would most affect people’s lived experience. To be clear, for me it’s a crux whether not dismantling the sun is basically committing everyone who doesn’t want to be uploaded to relative cosmic poverty. It would really suck if all remaining biological humans would be unable to take advantage of the vast majority of the energy in the solar system. I am not at present compelled that the marginal galaxies are worth destroying the sun and earth for (though I am also not confident it isn’t, I feel confused about it, and also don’t know where most people would end up after having been made available post-singularity intelligence enhancing drugs and deliberation technologies, which to be clear not everyone would use, but most people probably would).

On Eating the Sun

David Matolcsi6mo10

Are you arguing that if technologically possible, the Sun should be dismantled in the first few decades after the Singularity, as it is implied in the Great Transhumanist Future song, the main thing I'm complaining about here? In that case, I don't know of any remotely just and reasonable (democratic, market-based or other) governance structure that would allow that to happen given how the majority of people feel.

If you are talking about population dynamics, ownership and voting shifting over millennia to the point that they decide to dismantle the Sun, then sure, that's possible, though that's not what I expect to happen, see my other comment on market trades and my reply to Habryka on population dynamics.

2habryka6mo

(It is not implied in the song, to be clear, you seem to have a reading of the lyrics I do not understand. The song talks about there being a singularity in ~20 years, and separately that the sun is wasteful, but I don’t see any reference to the sun being dismantled in 20 years. For reference, lyrics are here: https://luminousalicorn.tumblr.com/post/175855775830/a-filk-of-big-rock-candy-mountain-one-evening-as)

On Eating the Sun

David Matolcsi6mo10

You mean that people on Earth and the solar system colonies will have enough biological children, and space travel to other stars for biological people will be hard enough that they will want the resources from dismantling the Sun? I suppose that's possible, though I expect they will put some kind of population control for biological people in place before that happens. I agree that also feels aversive, but at some point it needs to be done anyway, otherwise exponential population growth just brings us back to the Malthusian limit a few ten thousand years ... (read more)

3habryka6mo

Someone will live on old earth in your scenario. Unless those people are selected for extreme levels of attachment to specific celestial bodies, as opposed to the function and benefit of those celestial bodies, I don’t see why those people would decide to not replace the sun with a better sun, and also get orders of magnitude richer by doing so. It seems to me that the majority of those inhabitants of old earth would simply be people who don’t want to be uploaded (which is a much more common preference I expect than maintaining the literal sun in the sky) and so have much more limited ability to travel to other solar systems. I don’t see why I would want to condemn most people who don’t want be uploaded to relative cosmic poverty just because a very small minority of people want to keep burning away most of the usable energy in the solar system for historical reasons.

On Eating the Sun

David Matolcsi6mo2-1

I agree that not all decisions about the cosmos should be made on a majoritarian democratic way, but I don't see how replacing the Sun with artificial light can be done by market forces under normal property rights. I think you are currently would not be allowed to build a giant glass dome around someone's pot of land, and this feels at least that strong.

I'm broadly sympathetic to having property rights and markets in the post-Singularity future, and probably the people will scope-sensitive and longtermist preferences will be able to buy out the futu... (read more)

3habryka6mo

People don’t generally have strong preferences about celestial objects. I really don’t understand why you think most people care about the sun qua the sun, as opposed to the things the sun provides. Most people when faced with the choice to be more than twice as rich in new-earth, which they get to visualize and explore using the best of digital VR and sensory technology, with a fake sun indistinguishable for all intends and purposes from the real sun, will of course choose that over the attachment to maintaining that specific ball of plasma in the sky.

On Eating the Sun

David Matolcsi6mo13

I agree that I don't viscerally feel the loss of the 200 galaxies, and maybe that's a deficiency. But I still find this position crazy. I feel this is a decent parallel dialogue:
Other person: "Here is a something I thought of that would increase health outcomes in the world by 0.00000004%."
Me: "But surely you realize that this measure is horrendously unpopular, and the only way to implement it is through a dictatorial world government."
Other person: "Well yes, I agree it's a hard dilemma, but on absolute terms, 0.00000004% of the world population is 3 peop... (read more)

5Raemon6mo

It sounds like there's actually like 3-5 different object level places where we're talking about slightly different things. I also updated on the practical aspect from Ryan's comment. So, idk here's a bunch of distinct points. 1. Ryan Greenblatt's comment updated me that the energy requirements here are minimal enough that "eating the sun" isn't really going to come up as a consideration for astronomical waste. (Eating the Earth or most of the solar system seems like it still might be. But, I agree we shouldn't Eat the Earth) 2. I'd interpreted most past comments for nearterm (i.e. measured in decades) crazy shit to be about building Dyson spheres, not Star Lifting. (i.e. I expected the '20 years from now in some big ol' computer' in the solstice song to be about dyson spheres and voluntary uploads). I think many people will still freak out about Dyson Sphering the sun (not sure if you would). I would personally argue "it's just pretty damn important to Dyson Sphere the sun even if it makes people uncomfortable (while designing it such that Earth still gets enough light)." 3. I agree in 1000 years it won't much matter whether you Starlift, for astronomical waste reasons. But I do expect in 1000 years, even assuming a maximally consent-oriented / conservative-with-regards-to-bio-human-values, and all around "good" outcome, most people will have shifted to running on computronium and experienced much more than 1000 years of subjective time and their intuitions about what's good will just be real different. There may be small groups of people who continue living in bio-world but most of them will still probably be pretty alien by our lights. I think I do personally hope they preserve the Earth as sanctuary and/or historical relic. But I think there's a lot of compromises like "starlift a lot of material out of the sun, but move the Earth closer to the sun to compensate" (I haven't looked into the physics here, the details are obviously cruxy). When I imagi

6Ben Pace6mo

Most decisions are not made democratically, and pointing out that a majoritarian vote is against a decision is no argument that they will not happen nor should not happen. This is true of the vast majority of resource allocation decisions such as how to divvy up physical materials.

On Eating the Sun

David Matolcsi6mo30

Yes, I wanted to argue something like this.

On Eating the Sun

David Matolcsi6mo12-2

I think this is a false dilemma. If all human cultures on Earth come to the conclusion in 1000 years that they would like the Sun to be dismantled (which I very much doubt), then sure, we can do that. But at that point, we could already have built awesome industrial bases by dismantling Alpha Centauri, or just building them up by dismantling 0.1% of the Sun that doesn't affect anything on Earth. I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon.

The thing that actually ha... (read more)

6Raemon6mo

I have my own actual best guesses for what happens in reasonably good futures, which I can get into. (I'll flag for now I think "preserve Earth itself for as long as possible" is a reasonable Schelling point that is compatible with many "go otherwise quite fast" plans) Why do you doubt this? (to be clear, depends on exact details. But, my original query was about a 2 year delay. Proxima Centauri is 4 lightyears away. What is your story for how only taking 0.1% of the sun's energy while we spin up doesn't slow us down by at least 2 years? I have more to say but maybe should wait on your answer to that. Mostly, I think your last comment still had it's own missing mood of horror, and/or seemed to be assuming away any tradeoffs. (I am with you on "many rationalists seem gung ho about this in a way I find scary")

On Eating the Sun

David Matolcsi6mo15-7

I might write a top level post or shortform about this at some point. I find it baffling how casually people talk about dismantling the Sun around here. I recognize that this post makes no normative claim that we should do it, but it doesn't say that it would be bad either, and expects that we will do it even if humanity remains in power. I think we probably won't do it if humanity remains in power, we shouldn't do it, and if humanity disassembles the Sun, it will probably happen for some very bad reason, like a fanatical dictatorship getting in power.&nbs... (read more)

2Said Achmiz6mo

Is this true?! (Do you have a link or something?)

Ben Pace6mo2015

You are putting words in people's mouths to accuse lots of people of wanting to round up the Amish and hauling them to extermination camps, and I am disappointed that you would resort to such accusations.

Raemon6mo*181

So, I'm with you on "hey guys, uh, this is pretty horrifying, right? Uh, what's with the missing mood about that?".

The issue is that not-eating-the-sun is also horrifying. i.e. see also All Possible Views About Humanity's Future Are Wild. To not eat the sun is to throw away orders of magnitude more resources than anyone has ever thrown away before. Is it percentage-wise "a small fraction of the cosmos?". Sure. But, (quickly checks Claude, which wrote up a fermi code snippet before answering, I can share the work if you want to doublecheck yourself), a two ... (read more)

AnthonyC6mo133

Without making any normative arguments: if you're in a position (industrially and technologically) to disassemble the sun at all, or build something like a Dyson swarm, then it's probably not too difficult to build an artificial system to light the Earth in such a way as to mimic the sun, and make it look and feel nearly identical to biological humans living on the surface, using less than a billionth of the sun's normal total light output. The details of tides might be tricky, but probably not out of reach.

9Seth Herd6mo

You're such a traditionalist! More seriously, accusing rationalists of hauling the Amish and their mothers to camps doesn't seem quite fair. Like you said, most rationalists seem pretty nice and aren't proposing involuntary rapid changes. And this post certainly didn't. You'd need to address the actual arguments in play to write a serious post about this. "Don't propose weird stuff" isn't a very good argument. You could argue that went very poorly with communism, or come up with some other argument. Actually I think rationalists have come up with some. It looks to me like the more respected rationalists are pretty cautious about doing weird drastic stuff just because the logic seems correct at the time. See the unilateralist curse and Yudkiwky's and other's pleas that nobody do anything drastic about AGI even though they think it's very likely going to kill us all. This stuff is fun to think about, but it's planning the victory party before planning how to win the war. How to put the future into kind and rational hands seems like an equally interesting and much more urgent project right now. I'd be fine with a pretty traditional utopian future or a very weird one, but not fine with joyless machines eating the sun, or worse yet all of the suns they can reach.

Davidad's Bold Plan for Alignment: An In-Depth Explanation

David Matolcsi6mo50

What is an infra-Bayesian Super Mario supposed to mean? I studied infra-Bayes under Vanessa for half a year, and I have no idea what this could possibly mean. I asked Vanessa when this post came out and she also said she can't guess what you might mean under this. Can you explain what this is? It makes me very skeptical that the only part of the plan I know something about seems to be nonsense.

Also, can you give more information ir link to a resource on what Davidad's team is currently doing? It looks like they are the best funded AI safety group that currently exist (except if you count Anthropic), but I never hear about them.

2Quinn6mo

(i'm guessing) super mario might refer to a simulation of the Safeguarded AI / Gatekeeper stack in a videogame. It looks like they're skipping videogames and going straight to cyberphysical systems (1, 2).

Misfortune and Many Worlds

David Matolcsi6mo20

I don't agree with everything in this post, but I think it's a true an aunderappreciated point that "if your friend dies in a random accident, that's actually only a tiny loss accorfing to MWI."

I usually use this point to ask people to retire the old argument that "Religious people don't actually belive in their religion, otherwise they would be no more sad at the death of a loved one than if their loved one sailed to Australia." I think this "should be" true of MWI believers too, and we still feel very sad when a loved one dies in accident.

I don't think t... (read more)

1Jonah Wilberg6mo

Yes, very much agree with those points. Virtue ethics is another angle to come at the same point that there's a process whereby you internalise system 2 beliefs into system 1. Virtues need to be practised and learned, not just appreciated theoretically. That's why stoicism has been thought of (e.g. by Pierre Hadot) as promoting 'spiritual exercises' rather than systematic philosophy - I draw some further connections to stoicism in the next post in the sequence.

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

David Matolcsi7mo10

I fixed some misunderstandable parts, I meant the $500k being the LW hosting + Software subscriptions and the Dedicated software + accounting stuff together. And I didn't mean to imply that the labor cost of the 4 people is $500k, that was a separate term in the costs.

Is Lighthaven still cheaper if we take into account the initial funding spent on it in 2022 and 2023? I was under the impression that buying Lighthaven is one of the things that made a lot of sense when the community believed it would have access to FTX funding, and once we bought it, i... (read more)

9habryka7mo

Ah yeah, I did misunderstand you there. Makes sense now. It's tricky because a lot of that is capital investment, and it's extremely unclear what the resell price of Lighthaven would end up being if we ended up trying to sell, since we renovated it in a pretty unconventional way. Total renovations cost around ~$7M-$8M. About $3.5M of that was funded as part of the mortgage from Jaan Tallinn, and another $1.2M of that was used to buy a property right next to Lighthaven which we are hoping to take out an additional mortgage on (see footnote #3), and which we currently own in full. The remaining ~$3M largely came from SFF and Open Phil funding. We also lost a total of around ~$1.5M in net operating costs so far. Since the property is super hard to value, let's estimate the value of the property after our renovations at our current mortgage value ($20M).[1] During the same time, the Lightcone Offices would have cost around $2M, so if you view the value we provided in the meantime as roughly equivalent, we are out around $2.5M, but also, property prices tend to increase over time at least some amount, so by default we've probably recouped some fraction of that in appreciated property values, and will continue to recoup more as we break even. My honest guess is that Lighthaven would make sense even without FTX, from an ex-post perspective, but that if we hadn't have had FTX there wouldn't have been remotely enough risk appetite for it to get funded ex-ante. I think in many worlds Lighthaven turned out much worse than it did (and for example, renovation costs already ended up in the like 85th percentile of my estimates due to much more extensive water and mold damage than I was expecting in the mainline). 1. ^ I think this is a potentially controversial choice, though I think it makes sense. I think most buyers would not be willing to pay remotely as much for the venue as that, since they would basically aim to return the property back to its standard hote

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

David Matolcsi7mo110

I donated $1000. Originally I was worried that this is a bottomless money-pit, but looking at the cost breakdown, it's actually very reasonable. If Oliver is right that Lighthaven funds itself apart from the labor cost, then the real costs are $500k for the hosting, software and accounting cost of LessWrong (this is probably an unavoidable cost and seems obviously worthy of being philanthropically funded), plus paying 4 people (equivalent to 65% of 6 people) to work on LW moderation and upkeep (it's an unavoidable cost to have some people working on LW, 4 ... (read more)

habryka7mo100

Thank you so much!

Some quick comments:

then the real costs are $500k for the hosting and hosting cost of LessWrong

Raw server costs for LW are more like ~$120k (and to be clear, you could drive this lower with some engineering, though you would have to pay for that engineering cost). See the relevant line in the budget I posted.

Total labor cost for the ~4 people working on LW is closer to ~$800k, instead of the $500k you mention.

(I'm not super convinced it was a good decision to abandon the old Lightcone offices for Lighthaven, but I guess it mad

David Matolcsi7mo520

I'm considering donating. Can you give us a little more information on breakdown of the costs? What are typical large expenses that the 1.6 million upkeep of Lighthaven consists of? Is this a usual cost for a similar sized event space, or is something about the location or the specialness of the place that makes it more expensive?

How much money does running LW cost? The post says it's >1M, which somewhat surprised me, but I have no idea what's the usual cost of running such a site is. Is the cost mostly server hosting or salaries for content moderation or salaries for software development or something I haven't thought of?

habryka7mo*380

Very reasonable question! Here is a breakdown of our projected budget:

Type		Cost
Core Staff Salaries, Payroll, etc. (6 people)		$1.4M
Lighthaven (Upkeep)
Operations & Sales	$240k
Repairs & Maintenance Staff	$200k
Porterage & Cleaning Staff	$320k
Property Tax	$300k
Utilities & Internet	$180k
Additional Rental Property	$180k
Supplies (Food + Maintenance)	$180k
Lighthaven Upkeep Total		$1.6M
Lighthaven Mortgage		$1M
LW Hosting + Software Subscriptions		$120k
Dedicated Software + Accounting Staff		$330k
Total Costs		$4.45M
Expected

David Matolcsi7mo10

Yes, you are right, I phrased it wrongly.

"The Solomonoff Prior is Malign" is a special case of a simpler argument

David Matolcsi7mo21

Importantly, the oracle in the story is not making an elementary mistake, I think it's true that it's "probably" in a simulation. (Most of the measure of beings like it are in simulations.) It is also not maximizing reward, it is just honestly reporting what it expects its future observations to be about the President (which is within the simulation).

I agree with many of the previous commenters, and I acknowledged in the original post, that we don't know how to build such an AI that just honestly reports its probabilities of observables (even if thy depend of crazy simulation things), so all of this is hypothetical, but having such a truthful Oracle was the initial assumption of the thought experiment.

"The Solomonoff Prior is Malign" is a special case of a simpler argument

David Matolcsi7mo31

Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?

0Logan Zoellner7mo

yes

Why imperfect adversarial robustness doesn't doom AI control

David Matolcsi7moΩ010

I always assume when thinking about future dangerous models that they have access to some sort of black-box memory. Do we think there is a non-negligible chance that an AI that doesn't have hidden memory, only English-language CoT, will be able to evade our monitoring and execute a rouge deployment? (Not a rhetorical question, there might be a way I haven't thought of.)

So I think that assuming the AI being stateless when thinking about future risk is not a good idea, as I think the vast majority of the risk comes from AIs for which this assumption is not t... (read more)

5Buck7mo

Yes, I do think this. I think the situation looks worse if the AI has hidden memory, but I don't think we're either fine if the model doesn't have it or doomed if it does.

"The Solomonoff Prior is Malign" is a special case of a simpler argument

David Matolcsi8mo22

Hm, probably we disagree on something. I'm very confused how to mesh epistemic uncertainty with these "distribution over different Universes" types of probability. When I say "Boltzmann brains are probably very low measure", I mean "I think Boltzmann brains are very low measure, but this is a confusing topic and there might be considerations I haven't thought of and I might be totally mistaken". I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language with... (read more)

2Noosphere898mo

This part IMO is a crux, in that I don't truly believe an objective measure/magical reality fluid can exist in the multiverse, if we allow the concept to be sufficiently general, ruining both probability and expected value/utility theory in the process. Heck, in the most general cases, I don't believe any coherent measure exists at all, which basically ruins probability and expected utility theory at the same time.

3Richard_Ngo8mo

The part I was gesturing at wasn't the "probably" but the "low measure" part. Yes, that's a good summary of my position—except that I think that, like with ethics, there will be a bunch of highly-suggestive logical/mathematical facts which make it much more intuitive to choose some priors over others. So the choice of prior will be somewhat arbitrary but not totally arbitrary. I don't think this is a fully satisfactory position yet, it hasn't really dissolved the confusion about why subjective anticipation feels so real, but it feels directionally correct.

DeepSeek beats o1-preview on math, ties on coding; will release weights

David Matolcsi8mo50

I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I'm sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)

2Michael Roe7mo

As someone who, in a previous job, got to go to a lot of meetings where the European commission is seeking input about standardising or regulating something - humans also often do the thing where they just use the English word in the middle of a sentence in another language, when they can’t think what the word is. Often with associated facial expression / body language to indicate to the person they’re speaking to “sorry, couldn’t think of the right word”. Also used by people speaking English, whose first language isn’t English, dropping into their own lamguage for a word or two. If you’ve been the editor of e.g. an ISO standard, fixing these up in the proposed text is such fun. So, it doesn’t surprise me at all that LLMs do this. I have, weirdly, seen llms put a single Chinese word in the middle of English text … and consulting a dictionary reveals that it was, in fact, the right word, just in Chinese.

"The Solomonoff Prior is Malign" is a special case of a simpler argument

David Matolcsi8mo30

I like your poem on Twitter.

I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too.

I agree that there's no real answer to "where you are", you are a superposition of beings across the multiverse... (read more)

3Richard_Ngo8mo

Hmmm, uncertain if we disagree. You keep saying that these concepts are cursed and yet phrasing your claims in terms of them anyway (e.g. "probably very low measure"), which suggests that there's some aspect of my response you don't fully believe. In particular, in order for your definition of "what beings are sufficiently similar to you" to not be cursed, you have to be making claims not just about the beings themselves (since many Boltzmann brains are identical to your brain) but rather about the universes that they're in. But this is kinda what I mean by coalitional dynamics: a bunch of different copies of you become more central parts of the "coalition" of your identity based on e.g. the types of impact that they're able to have on the world around them. I think describing this as a metric of similarity is going to be pretty confusing/misleading. You still need a prior over worlds to calculate impacts, which is the cursed part.

"The Solomonoff Prior is Malign" is a special case of a simpler argument

David Matolcsi8mo2-5

I think that pleading total agnosticism towards the simulators' goals is not enough. I write "one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values." So I think you need a better reason to guard against being influenced then "I can't know what they want, everything and its opposite is equally likely", because the action proposed above is pretty clearly more favored by the simulators than not doing it.

Btw, I... (read more)

Meme Marine7mo100

The reason for agnosticism is that it is no more likely for them to be on one side or the other. As a result, you don't know without evidence who is influencing you. I don't really think this class of Pascal's Wager attack is very logical for this reason - an attack is supposed to influence someone's behavior but I think that without special pleading this can't do that. Non-existent beings have no leverage whatsoever and any rational agent would understand this - even humans do. Even religious beliefs aren't completely evidenceless, the type of evidence ex... (read more)