If I open LW on my phone, clicking the X on the top right only makes the top banner disappear, but the dark theme remains.
Relatedly, if it's possible to disentangle how the frontpage looks on computer and phone, I would recommend removing the dark theme on phone altogether, you don't see the cool space visuals on the phone anyway, so the dark theme is just annoying for no reason.
Maybe the crux is whether the dark color significantly degrades user experience. For me it clearly does, and my guess is that's what Sam is referring to when he says "What is the LW team thinking? This promo goes far beyond anything they've done or that I expected they would do."
For me, that's why this promotion feels like a different reference class than seeing the curated posts on the top or seeing ads on the SSC sidebar.
It's interesting to see that Gemini sticked to their guns even after being shown the human solution, I would have expected to apologize and agree with the human solution.
Gemini's rebuttal goes wrong when it makes the assertion "For the set of visited positions to eventually be the set of \textit{all} positive integers, it is a necessary condition that this density must approach 1" without justification. This assertion is unfortunately not true.
It's a nice paper, and I'm glad they did the research, but importantly, the paper reports a negative result about our agenda. The main result is that the method inspired by our ideas under-performs the baseline. Of course, these are just the first experiments, work is ongoing, this is not conclusive negative evidence for anything. But the paper certainly shouldn't be counted as positive evidence for ARC's ideas.
Unfortunately, while it's true that the Pope has a math degree, the person who wrote papers on theology and Bayes theorem is a different Robert Prevost.
https://www.researchgate.net/profile/Robert-Prevost
That's not how I see it. I think the argument tree doesn't go very deep until I lose the the thread. Here are a few, slightly stylized but real, conversations I had with friends who had no context on what ARC was doing, when I tried to explain our research to them:
Me: We want to to do Low Probability Estimation.
Them: Does this mean you want to estimate the probability that ChatGPT says a specific word after a 100 words on chain of thought? Isn't this clearly impossible?
Me: No, you see, we only want to estimate the probabilities only as well as the model kn...
I spent 15 months working for ARC Theory. I recently wrote up why I don't believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC's research direction is fundamentally unsound, or I'm still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it's pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining re...
If one reads my posts, I think it should become very clear to the reader that either ARC's research direction is fundamentally unsound, or I'm still misunderstanding some of the very basics after more than a year of trying to grasp it.
I disagree. Instead, I think that either ARC's research direction is fundamentally unsound, or you're still misunderstanding some of the finer details after more than a year of trying to grasp it. Like, your post is a few layers deep in the argument tree, and the discussions we had about these details (e.g. in January) went e...
I still don't see it, sorry. If I think of deep learning as an approximation of some kind of simplicity prior + updating on empirical evidence, I'm not very surprised that it solves the capacity allocation problem and learns a productive model of the world. [1] The price is that the simplicity prior doesn't necessarily get rid of scheming. The big extra challenge for heuristic explanations is that you need to do the same capacity allocation in a way that scheming reliably gets explained (even though it's not relevant for the model's performance a...
Thanks for the reply. I agree that it would be exciting in itself to create "a formal framework for heuristic arguments that is well-developed enough that we can convincingly apply it to neural networks", and I agree that for that goal, LPE and MAD are more of a test case than a necessary element. However, I think you probably can't get rid of the question of empirical regularities.
I think you certainly need to resolve the question of empirical regularities if you want to apply your methods to arbitrary neural networks, and I strongly suspect that yo...
I agree 4 and 5 are not really separate. The main point is that using formal input distributions for explanations just passes the buck to explain things about the generative AI that defines the formal input distribution, and at some point something needs to have been trained n real data, and we need to explain behavior there.
Yes, this is part of the appeal of catastrophe detectors, that we can make an entire interesting statement fully formal by asking how often a model causes a catastrophe (as defined by a neural net catastrophe detector) on a a formal distribution (defined by a generative neural net with a Gaussian random seed). This is now a fully formal statement but I'm skeptical this helps much., Among other issues:
I think this is a good and important post, but there was one point I felt missing from it: What if the company, bing caught in a race, not only wants to keep using their proven schemer model, but they want to continue its training to be smarter, or quickly build other smarter models with similar techniques? I think it's likely they will want to do that, and I think most of your recommendations in the post become very dubious if the scheming AI is continuously trained to be smarter.
Do you have recommendations on what to do if the company wants to trai...
Unclear if we can talk about "humans" in a simulation where logic works differently, but I don't know, it could work. I remain uncertain how feasible trades across logical counterfactuals will be, it's all very confusing.
Thanks for the reply, I broadly agree with your points here. I agree we should pronably eventually try to do trades across logical counter-factuals. Decreasing logical risk is one good framing for that, but in general, there are just positive trades to be made.
However, I think you are still underestimating how hard it might be to strike these deals. "Be kind to other existing agents" is a natural idea to us, but it's still unclear to me if it's something you should assign hogh probability to as a preference of logically counter-factual beings. Sure, there ...
Maybe your idea works too, it's an interesting concept, but I'm unsure. The crucial question always is how the AI is supposed to know who is creating the simulations, what the simulators' values might be, with whom they should trade. In this logical counter-factual trade, who are the other "agents" that the AI is supposed to be nice to? Are rocks agents, should it preserve every rock in the Universe? Usually, I wouldn't be that worried about this, as I think 'agent' is a fairly natural concept that might even have some nice mathematical definition. But onc...
Thanks for the reply. If you have time, I'm still interested in hearing what would be a realistic central example of non-concentrated failure that's good to imagine while reading the post.
This post was a very dense read, and it was hard for me to digest what the main conclusions were supposed to be. Could you write some concrete scenarios that you think are central examples of schemers causing non-concentrated failures? While reading the post, I never knew what situation to imagine: An AI is doing philosophical alignment research but intentionally producing promising-looking crackpotry? It is building cyber-sec infrastructure but leaving in a lot of vulnerabilities? Advising the President, but having a bias towards advocating for integratin...
My strong guess is that OpenAI's results are real, it would really surprise me if they were literally cheating on the benchmarks. It looks like they are just using much more inference-time compute than is available to any outside user, and they use a clever scaffold that makes the model productively utilize the extra inference time. Elliot Glazer (creator of FrontierMath) says in a comment on my recent post on FrontierMath:
...A quick comment: the o3 and o3-mini announcements each have two significantly different scores, one <= 10%, the other >= 25
I like the idea of IMO-style releases, always collecting new problems, testing the AIs on them, then releasing to the public. What do you think, how important it is to only have problems with numerical solutions? If you can test the AIs on problems with proofs, then there are already many competitions that regularly release high-quality problems. (I'm shilling KöMaL again as one that's especially close to my heart, but there are many good monthly competitions around the world.) I think if we instruct the AI to present its solution in one page at the end, t...
Thanks a lot for the answer, I put in an edit linking to it. I think it's a very interesting update that the models get significantly better at catching and correcting their mistakes in OpenAI's scaffold with longer inference time. I am surprised by this, given how much it feels like the models can't distinguish its plausible fake reasoning from good proofs at all. But I assume there is still a small signal in the right direction, and that can be amplified if the model think the question through a lot of times (and does something like a majority voti...
I like the main idea of the post. It's important to note though that the setup assumed that we have a bunch of alignnent ideas that all have an independent 10% chance of working. Meanwhile, in reality I expect a lot of correlation: there is a decent chance that alignment is easy and a lot of our ideas will work, and a decent chance that it's hard and basically nothing works.
Does anyone know of a not peppermint flavored zinc acetate lozenge? I really dislike peppermint, so I'm not sure it would be worth it to drink 5 peppermint flavored glasses of water a day to decrease the duration of cold with one day, and I haven't found other zinc acetate lozenge options yet, the acetate version seems to be rare among zing supplement. (Why?)
Fair, I also haven't made any specific commitments, I phrased it wrongly. I agree there can be extreme scenarios with trillions of digital minds tortured where you'd maybe want to declare war on the. rest of society. But I would still like people to write down that "of course, I wouldn't want to destroy Earth before we can save all the people who want to live in their biological bodies, just to get a few years of acceleration in the cosmic conquest". I feel a sentence like this should really have been included in the original post about dismantling the Sun...
As I explain in more detail in my other comment, I expect market based approaches to not dismantle the Sun anytime soon. I'm interested if you know of any governance structure that you support that you think will probably lead to dismantling the Sun within the next few centuries.
I feel reassured that you don't want to Eat the Earth while there are still biological humans who want to live on it.
I still maintain that under governance systems I would like, I would expect the outcome to be very conservative with the solar system in the next thousand years. Like one default governance structure I quite like is to parcel out the Universe equally among the people alive during the Singularity, have a binding constitution on what they can do on their fiefdoms (no torture, etc), and allow them to trade and give away their stuff ...
I maintain that biological humans will need to do population control at some point. If they decide that enacting the population control in the solar system at a later population leve is worth it for them to dismantle the Sun, then they can go for it. My guess is that they won't, and will have population control earlier.
I think that the coder looking up and saying that the Sun burning is distasteful but the Great Transhumanist Future will come in 20 years, along with a later mention of "the Sun is a battery", together implies that the Sun is getting dismantled in the near future. I guess you can debate in how strong the implication is, maybe they just want to dismantle the Sun in the long term, and currently only using the Sun as a battery in some benign way, but I think that's not the most natural interpretation.
Yeah, maybe I just got too angry. As we discussed in other comments, I believe that astronomical acceleration perspective the real deal is maximizing the initial industrialization of Earth and its surroundings, which does require killing off (and mind uploading) the Amish and everyone else. Sure, if people are only arguing that we should only dismantle the Sun and Earth after millennia, that's more acceptable, but I really don't see what's the point then, we can build out our industrial base on Alpha Centauri by then.
The part that is frustrating to m...
I expect non-positional material goods to be basically saturated for Earth people in a good post-Singularity world, so I don't think you can promise them to become twice as rich. And also, people dislike drastic change and new things they don't understand. 20% of the US population refused the potentially life-saving covid vaccine out of distrust of new things they don't understand. Do you think they would happily move to a new planet with artificial sky maintained by supposedly benevolent robots? Maybe you could buy off some percentage of the population if...
Are you arguing that if technologically possible, the Sun should be dismantled in the first few decades after the Singularity, as it is implied in the Great Transhumanist Future song, the main thing I'm complaining about here? In that case, I don't know of any remotely just and reasonable (democratic, market-based or other) governance structure that would allow that to happen given how the majority of people feel.
If you are talking about population dynamics, ownership and voting shifting over millennia to the point that they decide to dismantle the Sun, then sure, that's possible, though that's not what I expect to happen, see my other comment on market trades and my reply to Habryka on population dynamics.
You mean that people on Earth and the solar system colonies will have enough biological children, and space travel to other stars for biological people will be hard enough that they will want the resources from dismantling the Sun? I suppose that's possible, though I expect they will put some kind of population control for biological people in place before that happens. I agree that also feels aversive, but at some point it needs to be done anyway, otherwise exponential population growth just brings us back to the Malthusian limit a few ten thousand years ...
I agree that not all decisions about the cosmos should be made on a majoritarian democratic way, but I don't see how replacing the Sun with artificial light can be done by market forces under normal property rights. I think you are currently would not be allowed to build a giant glass dome around someone's pot of land, and this feels at least that strong.
I'm broadly sympathetic to having property rights and markets in the post-Singularity future, and probably the people will scope-sensitive and longtermist preferences will be able to buy out the futu...
I agree that I don't viscerally feel the loss of the 200 galaxies, and maybe that's a deficiency. But I still find this position crazy. I feel this is a decent parallel dialogue:
Other person: "Here is a something I thought of that would increase health outcomes in the world by 0.00000004%."
Me: "But surely you realize that this measure is horrendously unpopular, and the only way to implement it is through a dictatorial world government."
Other person: "Well yes, I agree it's a hard dilemma, but on absolute terms, 0.00000004% of the world population is 3 peop...
Yes, I wanted to argue something like this.
I think this is a false dilemma. If all human cultures on Earth come to the conclusion in 1000 years that they would like the Sun to be dismantled (which I very much doubt), then sure, we can do that. But at that point, we could already have built awesome industrial bases by dismantling Alpha Centauri, or just building them up by dismantling 0.1% of the Sun that doesn't affect anything on Earth. I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon.
The thing that actually ha...
I might write a top level post or shortform about this at some point. I find it baffling how casually people talk about dismantling the Sun around here. I recognize that this post makes no normative claim that we should do it, but it doesn't say that it would be bad either, and expects that we will do it even if humanity remains in power. I think we probably won't do it if humanity remains in power, we shouldn't do it, and if humanity disassembles the Sun, it will probably happen for some very bad reason, like a fanatical dictatorship getting in power.&nbs...
You are putting words in people's mouths to accuse lots of people of wanting to round up the Amish and hauling them to extermination camps, and I am disappointed that you would resort to such accusations.
So, I'm with you on "hey guys, uh, this is pretty horrifying, right? Uh, what's with the missing mood about that?".
The issue is that not-eating-the-sun is also horrifying. i.e. see also All Possible Views About Humanity's Future Are Wild. To not eat the sun is to throw away orders of magnitude more resources than anyone has ever thrown away before. Is it percentage-wise "a small fraction of the cosmos?". Sure. But, (quickly checks Claude, which wrote up a fermi code snippet before answering, I can share the work if you want to doublecheck yourself), a two ...
Without making any normative arguments: if you're in a position (industrially and technologically) to disassemble the sun at all, or build something like a Dyson swarm, then it's probably not too difficult to build an artificial system to light the Earth in such a way as to mimic the sun, and make it look and feel nearly identical to biological humans living on the surface, using less than a billionth of the sun's normal total light output. The details of tides might be tricky, but probably not out of reach.
What is an infra-Bayesian Super Mario supposed to mean? I studied infra-Bayes under Vanessa for half a year, and I have no idea what this could possibly mean. I asked Vanessa when this post came out and she also said she can't guess what you might mean under this. Can you explain what this is? It makes me very skeptical that the only part of the plan I know something about seems to be nonsense.
Also, can you give more information ir link to a resource on what Davidad's team is currently doing? It looks like they are the best funded AI safety group that currently exist (except if you count Anthropic), but I never hear about them.
I don't agree with everything in this post, but I think it's a true an aunderappreciated point that "if your friend dies in a random accident, that's actually only a tiny loss accorfing to MWI."
I usually use this point to ask people to retire the old argument that "Religious people don't actually belive in their religion, otherwise they would be no more sad at the death of a loved one than if their loved one sailed to Australia." I think this "should be" true of MWI believers too, and we still feel very sad when a loved one dies in accident.
I don't think t...
I fixed some misunderstandable parts, I meant the $500k being the LW hosting + Software subscriptions and the Dedicated software + accounting stuff together. And I didn't mean to imply that the labor cost of the 4 people is $500k, that was a separate term in the costs.
Is Lighthaven still cheaper if we take into account the initial funding spent on it in 2022 and 2023? I was under the impression that buying Lighthaven is one of the things that made a lot of sense when the community believed it would have access to FTX funding, and once we bought it, i...
I donated $1000. Originally I was worried that this is a bottomless money-pit, but looking at the cost breakdown, it's actually very reasonable. If Oliver is right that Lighthaven funds itself apart from the labor cost, then the real costs are $500k for the hosting, software and accounting cost of LessWrong (this is probably an unavoidable cost and seems obviously worthy of being philanthropically funded), plus paying 4 people (equivalent to 65% of 6 people) to work on LW moderation and upkeep (it's an unavoidable cost to have some people working on LW, 4 ...
Thank you so much!
Some quick comments:
then the real costs are $500k for the hosting and hosting cost of LessWrong
Raw server costs for LW are more like ~$120k (and to be clear, you could drive this lower with some engineering, though you would have to pay for that engineering cost). See the relevant line in the budget I posted.
Total labor cost for the ~4 people working on LW is closer to ~$800k, instead of the $500k you mention.
...(I'm not super convinced it was a good decision to abandon the old Lightcone offices for Lighthaven, but I guess it mad
I'm considering donating. Can you give us a little more information on breakdown of the costs? What are typical large expenses that the 1.6 million upkeep of Lighthaven consists of? Is this a usual cost for a similar sized event space, or is something about the location or the specialness of the place that makes it more expensive?
How much money does running LW cost? The post says it's >1M, which somewhat surprised me, but I have no idea what's the usual cost of running such a site is. Is the cost mostly server hosting or salaries for content moderation or salaries for software development or something I haven't thought of?
Very reasonable question! Here is a breakdown of our projected budget:
Type | Cost | |
---|---|---|
Core Staff Salaries, Payroll, etc. (6 people) | $1.4M | |
Lighthaven (Upkeep) | ||
Operations & Sales | $240k | |
Repairs & Maintenance Staff | $200k | |
Porterage & Cleaning Staff | $320k | |
Property Tax | $300k | |
Utilities & Internet | $180k | |
Additional Rental Property | $180k | |
Supplies (Food + Maintenance) | $180k | |
Lighthaven Upkeep Total | $1.6M | |
Lighthaven Mortgage | $1M | |
LW Hosting + Software Subscriptions | $120k | |
Dedicated Software + Accounting Staff | $330k | |
Total Costs | $4.45M | |
Expected |
Yes, you are right, I phrased it wrongly.
Importantly, the oracle in the story is not making an elementary mistake, I think it's true that it's "probably" in a simulation. (Most of the measure of beings like it are in simulations.) It is also not maximizing reward, it is just honestly reporting what it expects its future observations to be about the President (which is within the simulation).
I agree with many of the previous commenters, and I acknowledged in the original post, that we don't know how to build such an AI that just honestly reports its probabilities of observables (even if thy depend of crazy simulation things), so all of this is hypothetical, but having such a truthful Oracle was the initial assumption of the thought experiment.
Even assuming that the simulators have wildly different values, why would doing something insane a good thing to do?
I always assume when thinking about future dangerous models that they have access to some sort of black-box memory. Do we think there is a non-negligible chance that an AI that doesn't have hidden memory, only English-language CoT, will be able to evade our monitoring and execute a rouge deployment? (Not a rhetorical question, there might be a way I haven't thought of.)
So I think that assuming the AI being stateless when thinking about future risk is not a good idea, as I think the vast majority of the risk comes from AIs for which this assumption is not t...
Hm, probably we disagree on something. I'm very confused how to mesh epistemic uncertainty with these "distribution over different Universes" types of probability. When I say "Boltzmann brains are probably very low measure", I mean "I think Boltzmann brains are very low measure, but this is a confusing topic and there might be considerations I haven't thought of and I might be totally mistaken". I think this epistemic uncertainty is distinct from the type of "objective probabilities" I talk about in my post, and I don't really know how to use language with...
I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I'm sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)
I like your poem on Twitter.
I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too.
I agree that there's no real answer to "where you are", you are a superposition of beings across the multiverse...
I think that pleading total agnosticism towards the simulators' goals is not enough. I write "one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values." So I think you need a better reason to guard against being influenced then "I can't know what they want, everything and its opposite is equally likely", because the action proposed above is pretty clearly more favored by the simulators than not doing it.
Btw, I...
The reason for agnosticism is that it is no more likely for them to be on one side or the other. As a result, you don't know without evidence who is influencing you. I don't really think this class of Pascal's Wager attack is very logical for this reason - an attack is supposed to influence someone's behavior but I think that without special pleading this can't do that. Non-existent beings have no leverage whatsoever and any rational agent would understand this - even humans do. Even religious beliefs aren't completely evidenceless, the type of evidence ex...
What is your take, how far removed "the AI itself" and "the character it is playing" need to be for it to be okay for the character to take deontologically bad actions (like blackmail)? Here are some scenarios, I'm interested where you would draw the line, I think there can be many reasonable lines here.
1. I describe a fictional setting in which Hrothgar, the King of Dwarves is in a situation where his personality, goals and circumstances imply that he likely wants to blackmail the prince of elves. At the end of the description, I ask Claude what is Hrothg... (read more)