David Matolcsi

Wikitag Contributions

Comments

Sorted by

Thanks for the reply. If you have time, I'm still interested in hearing what would be a realistic central example of non-concentrated failure that's good to imagine while reading the post.

This post was a very dense read, and it was hard for me to digest what the main conclusions were supposed to be. Could you write some concrete scenarios that you think are central examples of schemers causing non-concentrated failures? While reading the post, I never knew what situation to imagine: An AI is doing philosophical alignment research but intentionally producing promising-looking crackpotry? It is building cyber-sec infrastructure but leaving in a lot of vulnerabilities? Advising the President, but having a bias towards advocating for integrating AI into the military?

I think these problems are all pretty different in what approaches are promising in preventing them, so it would be useful to see what you think the most likely non-concentrated failures are, so we can read the post with that in mind.

As another point, you could really write conclusion sections. There are a lot of different points made in the post, and it's hard to see which are the most important to get across to the reader in your opinion. A conclusion section would help a lot in that.

In general, I think that among all the people I know, you might be the one who has the biggest difference in how good you are at explaining concepts in person, and how bad you are at communicating them in blog posts. (Strangely, your LW comments are also very good and digestible, more similar to your in person communication than to your long-form posts, I don't know why.) I think it could be high leverage for you to experiment some with making your posts more readable. Using more concrete examples and writing conclusion sections would go a long way in improving your posts in general, but I felt compelled to comment here because this post was especially hard to read without them. 

My strong guess is that OpenAI's results are real, it would really surprise me if they were literally cheating on the benchmarks. It looks like they are just using much more inference-time compute than is available to any outside user, and they use a clever scaffold that makes the model productively utilize the extra inference time. Elliot Glazer (creator of FrontierMath) says in a comment on my recent post on FrontierMath: 

A quick comment: the o3 and o3-mini announcements each have two significantly different scores, one <= 10%, the other >= 25%. Our own eval of o3-mini (high) got a score of 11% (it's on Epoch's Benchmarking Hub). We don't actually know what the higher scores mean, could be some combination of extreme compute, tool use, scaffolding, majority vote, etc., but we're pretty sure there is no publicly accessible way to get that level of performance out of the model, and certainly not performance capable of "crushing IMO problems." 

I do have the reasoning traces from the high-scoring o3-mini run. They're extremely long, and one of the ways it leverages the higher resources is to engage in an internal dialogue where it does a pretty good job of catching its own errors/hallucinations and backtracking until it finds a path to a solution it's confident in. I'm still writing up my analysis of the traces and surveying the authors for their opinions on the traces, and will also update e.g. my IMO predictions with what I've learned.

I like the idea of IMO-style releases, always collecting new problems, testing the AIs on them, then releasing to the public. What do you think, how important it is to only have problems with numerical solutions? If you can test the AIs on problems with proofs, then there are already many competitions that regularly release high-quality problems. (I'm shilling KöMaL again as one that's especially close to my heart, but there are many good monthly competitions around the world.) I think if we instruct the AI to present its solution in one page at the end, then it's not that hard to get an experience competition grader to read the solution and give it scores according to the normal competitions scores, so the result won't be much less objective than if it was only numerical solutions. If you want to stick to problems with numerical solutions, I'm worried that you will have a hard time regularly assembling high-quality numerical problems again and again, and even if the problems are released publicly, people will have a harder time evaluating them than if they actually came from a competition where we can compare to the natural human baseline of the competing students.

Thanks a lot for the answer, I put in an edit linking to it. I think it's a very interesting update that the models get significantly better at catching and correcting their mistakes in OpenAI's scaffold with longer inference time. I am surprised by this, given how much it feels like the models can't distinguish its plausible fake reasoning from good proofs at all. But I assume there is still a small signal in the right direction, and that can be amplified if the model think the question through  a lot of times (and does something like a majority voting within its chain of thought?). I think this is an interesting update towards the viability of inference time scaling.

I think many of my other points still stand however: I still don't know how capable I should expect the internally scaffolded model to be given that it got 32% on FrontierMath, and I would much rather have them report results on the IMO or a similar competition, than on a benchmark I can't see and whose difficulty I can't easily assess. 

I like the main idea of the post. It's important to note though that the setup assumed that we have a bunch of alignnent ideas that all have an independent 10% chance of working. Meanwhile, in reality I expect a lot of correlation: there is a decent chance that alignment is easy and a lot of our ideas will work, and a decent chance that it's hard and basically nothing works.

Does anyone know of a not peppermint flavored zinc acetate lozenge? I really dislike peppermint, so I'm not sure it would be worth it to drink 5 peppermint flavored glasses of water a day to decrease the duration of cold with one day, and I haven't found other zinc acetate lozenge options yet, the acetate version seems to be rare among zing supplement. (Why?)

Fair, I also haven't made any specific commitments, I phrased it wrongly. I agree there can be extreme scenarios with trillions of digital minds tortured where you'd maybe want to declare war on the. rest of society. But I would still like people to write down that "of course, I wouldn't want to destroy Earth before we can save all the people who want to live in their biological bodies, just to get a few years of acceleration in the cosmic conquest". I feel a sentence like this should really have been included in the original post about dismantling the Sun, and until people are not willing to write this down, I remain paranoid that they would in fact haul the Amish the extermination camps if it feels like a good idea at the time. (As I said, I met people who really held this position.)

As I explain in more detail in my other comment, I expect market based approaches to not dismantle the Sun anytime soon. I'm interested if you know of any governance structure that you support that you think will probably lead to dismantling the Sun within the next few centuries.

I feel reassured that you don't want to Eat the Earth while there are still biological humans who want to live on it. 

I still maintain that under governance systems I would like, I would expect the outcome to be very conservative with  the solar system in the next thousand years. Like one default governance structure I quite like is to parcel out the Universe equally among the people alive during the Singularity, have a binding constitution on what they can do on their fiefdoms (no torture, etc), and allow them to trade and give away their stuff to their biological and digital descendants. There could also be a basic income coming to all biological people,[1] though not to digital as it's too easy to mass-produce them.

One year of delay in cosmic expansion costs us around 1 in a billion of the reachable Universe under some assumptions on where the grabby aliens are (if they exist). One year also costs us around 1 in a billion of the Sun's mass being burned, if like Habryka you care about using the solar system optimally for the sake of the biological humans who want to stay. So one year of delay can be bought by 160 people paying out 10% of their wealth. I really think that you won't do things like moving the Earth closer to the Sun and things like that in the next 200 years, there will just always be enough people to pay out, it just takes 10,000 traditionalist families, literally the Amish could easily do it. And it won't matter much, the cosmic acceleration will soon become a moot point as we build out other industrial bases, and I don't expect the biological people to feel much of a personal need to dismantle the Sun anytime soon. Maybe in 10,000 years the objectors will run out of money, and the bio people either overpopulate or have expensive hobbies like building planets to themselves and decide to dismantle the Sun, though I expect them to be rich enough to just haul in matter from other stars if they want to.

By the way, I recommend Tim Underwood's sci-fi, The Accord, as a very good exploration of these topics, I think it's my favorite sci-fi novel.

As for the 80 trillions stars, I agree it's a real loss, but for me this type of sadness feels "already priced in". I already accepted that the world won't and shouldn't be all my personal absolute kingdom, so other people's decision will cause a lot of waste from my perspective, and 0.00000004% is just a really negligible part of this loss. In this, I think my analogy to current government is quite apt, I feel similarly about current governments, that I already accepted that the world will be wasteful compared to the rule of a dictatorship perfectly aligned with me, but that's how it needs to be.

  1. ^

    Though you need to pay attention to overpopulation. If the average biological couple has 2.2 children, the Universe runs out of atoms to support humans in 50 thousand years. Exponential growth is crazy fast. 

Load More