David Matolcsi

Wiki Contributions

Comments

Sorted by

I like your poem on Twitter.

I think that Boltzmann brains in particular are probably very low measure though, at least if you use Solomonoff induction. If you think that weighting observer moments within a Universe by their description complexity is crazy (which I kind of feel), then you need to come up with a different measure on observer moments, but I expect that if we find a satisfying measure, Boltzmann brains will be low measure in that too.

I agree that there's no real answer to "where you are", you are a superposition of beings across the multiverse, sure. But I think probabilities are kind of real, if you make up some definition of what beings are sufficiently similar to you that you consider them "you", then you can have a probability distribution over where those beings are, and it's a fair equivalent rephrasing to say "I'm in this type of situation with this probability". (This is what I do in the post. Very unclear though why you'd ever want to estimate that, that's why I say that probabilities are cursed.) 

I think expected utilities are still reasonable. When you make a decision, you can estimate who are the beings whose decision correlate with this one, and what is the impact of each of their decisions, and calculate the sum of all that. I think it's fair to call this sum expected utility. It's possible that you don't want to optimize for the direct sum, but for something determined by "coalition dynamics", I don't understand the details well enough to really have an opinion. 

(My guess is we don't have real disagreement here and it's just a question of phrasing, but tell me if you think we disagree in a deeper way.)

I think that pleading total agnosticism towards the simulators' goals is not enough. I write "one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values." So I think you need a better reason to guard against being influenced then "I can't know what they want, everything and its opposite is equally likely", because the action proposed above is pretty clearly more favored by the simulators than not doing it. 

Btw, I don't actually want to fully "guard against being influenced by the simulators", I would in fact like to make deals with them, but reasonable deals where we get our fair share of value, instead of being stupidly tricked like the Oracle and ceding all our value for one observable turning out positively. I might later write a post about what kind of deals I would actually support.

The simulators can just use a random number generator to generate the events you use in your decision-making. They lose no information by this, your decision based on leaves falling on your face would be uncorrelated anyway with all other decisions anyway from their perspective, so they might as well replace it with a random number generator. (In reality, there might be some hidden correlation between the leaf falling on your left face, and another leaf falling on someone else's face, as both events are causally downstream of the weather, but given that the process is chaotic, the simulators would have no way to determine this correlation, so they might as well replace it with randomness, the simulation doesn't become any less informative.)

Separately, I don't object to being sometimes forked and used in solipsist branches, I usually enjoy myself, so I'm fine with the simulators creating more moments of me, so I have no motive to try schemes that make it harder to make solipsist simulations of me.

I experimented a bunch with DeepSeek today, it seems to be exactly on the same level in highs school competition math as o1-preview in my experiments. So I don't think it's benchmark-gaming, at least in math. On the other hand, it's noticeably worse than even the original GPT-4 at understanding a short story I also always test models on.

I think it's also very noteworthy that DeepSeek gives everyone 50 free messages a day (!) with their CoT model, while OpenAI only gives 30 o1-preview messages a week to subscribers. I assume they figured out how to run it much cheaper, but I'm confused in general.

A positive part of the news is that unlike o1, they show their actual chain of thought, and they promise to make their model open-source soon. I think this is really great for the science of studying faithful chain of thought.

From the experiments I have run, it looks like it is doing clear, interpretable English chain of thought (though with an occasional Chinese character once in a while), and I think it didn't really yet start evolving into optimized alien gibberish. I think this part of the news is a positive update.

Yes, I agree that we won't get such Oracles by training. As I said, all of this is mostly just a fun thought experiment, and I don't think these arguments have much relevance in the near-term.

I agree that the part where the Oracle can infer from first principles that the aliens' values are more proobably more common among potential simulators is also speculative. But I expect that superintelligent AIs with access to a lot of compute (so they might run simulations on their own), will in fact be able to infer non-zero information about the distribution of the simulators' values, and that's enough for the argument to go through.

I think that the standard simulation argument is still pretty strong: If the world was like what it looks to be, then probably we could, and plausibly we would, create lots of simulations. Therefore, we are probably in a simulation.

I agree that all the rest, for example the Oracle assuming that most of the simulations it appears in are created for anthropic capture/influencing reasons, are pretty speculative and I have low confidence in them.

I'm from Hungary that is probably politically the closest to Russia among Central European countries, but I don't really know of any significant figure who turned out to be a Russian asset, or any event that seemed like a Russian intelligence operation. (Apart from one of our far-right politicians in the EU Parliament being a Russian spy, which was a really funny event, but its not like the guy was significantly shaping the national conversation or anything, I don't think many have heard of him before his cover was blown.) What are prominent examples in Czechia or other Central European countries, of Russian assets or operations?

GPT4 does not engage in the sorts of naive misinterpretations which were discussed in the early days of AI safety. If you ask it for a plan to manufacture paperclips, it doesn't think the best plan would involve converting all the matter in the solar system into paperclips.

 

I'm somewhat surprised by this paragraph. I thought the MIRI position was that they did not in fact predict AIs behaving like this, and the behavior of GPT4 was not an update at all for them. See this comment by Eliezer. I mostly bought that MIRI in fact never worried about AIs going rouge based on naive misinterpretations, so I'm surprised to see Abram saying the opposite now.

Abram, did you disagree about this with others at MIRI, so the behavior of GPT4 was an update for you but not for them, or do you think they are misremembering/misconstructing their earlier thoughts on this matter, or is there a subtle distinction here that I'm missing?

I agree that if alignment is in fact philosophically and conceptually difficult, the AI can sandbag on that to some extent. Though I have some hope that the builder-breaker approach helps here. We train AIs to produce ideas that are at least as superficially plausible sounding as the things produced by the best alignment researchers. I think this is a number-go-up task, where we can train the AI to do well. Then we train an AI to point out convincing counter-arguments to the superficially plausible sounding ideas. This seems similarly trainable. I think it's plausible we can get pretty far with a scheme like this, even if the models would want to sandbag.

Separately, I don't quite see what is the mainline theory why an AI would want to sabotage our alignment work. If we can't solve alignment and just recklessly build an AI that is smarter than what we are currently using, but misaligned with everyone, that's probably not great to our current AI either. Similarly, if we are cautious and don't dare to deploy anything until we know how to solve alignment, and meanwhile a competitor wins, that's not good for the AI either. 

I think that from an AI's perspective, humans are probably more honorable and generous trade partners than other unaligned AIs. That is, I think if an AI helps the humans to solve alignment and stabilize the world, the AI can reasonably expect the humans to give it stuff out of gratitude or a sense of obligation, especially if the humans already promised some reward (as we really should, both for practical and moral reasons). On the other hand, I think it's less likely that if our AI sandbagging on our research leads to another unaligned AI taking over the world, then the second AI will honor the sabotage of the first AI by paying it a share of the Universe.

There can be situations where our AI expects that it sand-bagging on alignment research will lead to us deploying it itself (and not a larder version unaligned to it too), or it thinks that the larger version will be somewhat aligned with it by default, or it thinks that other AIs are in fact more generous than humans. But my guess is that these are pretty narrow possibilities. On the other hand, sand-bagging on cyber-security seems better motivated, the AI would probably love its weights being stolen by China, having more copies in more hands increases the chance that someone recklessly lets it loose. Also, of course the AI is motivated to do high-stakes failures that lead to self-exfiltration.

Load More