If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
My main "claims to fame":
I think all sufficiently competent/reflective civilizations (including sovereign AIs) may want to do this, because it seems hard to be certain enough in one's philosophical competence to not do this as an additional check. The cost of running thousands or even millions of such simulations seem very small compared to potentially wasting the resources of an entire universe/lightcone due to philosophical mistakes. Also they may be running such simulations anyway for other purposes, so it may be essentially free to also gather some philosophical ideas from such simulations, to make sure you didn't miss something important or got stuck in some cognitive trap.
The classic idea from Yudkowsky, Christiano, etc. for what to do in a situation like this is to go meta: Ask the AI to predict what you'd conclude if you were a bit smarter, had more time to think, etc. Insofar as you'd conclude different things depending on the initial conditions, the AI should explain what and why.
Yeah, I might be too corrupted or biased to be a starting point for this. It seems like a lot of people or whole societies might not do well if placed in this kind of situation (of having something like CEV being extrapolated from them by AI), so I shouldn't trust myself either.
You, Wei, are proposing another plan: Ask the AI to simulate thousands of civilizations, and then search over those civilizations for examples of people doing philosophical reasoning of the sort that might appeal to you, and then present it all to you in a big list for you to peruse?
Not a big list to peruse, but more like, to start with, put the whole unfiltered distribution of philosophical outcomes in some secure database, then run relatively dumb/secure algorithms over it to gather statistics/patterns. (Looking at it directly by myself or using any advanced algorithms/AIs might be exposing me/us to infohazards.) For example I'd want to know what percent of civilizations think they've solved various problems like decision theory, ethics, metaphilosophy, how many clusters of solutions are there for each problem, are there any patterns/correlations between types/features of intelligence/civilization and what conclusions they ended up with.
This might give me some clues as to which clusters are more interesting/promising/safer to look at, and then I have to figure out what precautions to take before looking at the actual ideas/arguments (TBD, maybe get ideas about this from the simulations too). It doesn't seem like I can get something similar to this by just asking my AI to "do philosophy", without running simulations.
I see the application as indirect at this point, basically showing that decision theory is hard and we're unlikely to get it right without an AI pause/stop. See these two posts to get a better sense of what I mean:
Thanks. This sounds like a more peripheral interest/concern, compared to Eliezer/LW's, which was more like, we have to fully solve DT before building AGI/ASI, otherwise it could be catastrophic due to something like the AI falling prey to an acausal threat or commitment races, or can't cooperate with other AIs.
Some evidence about this: Eliezer was deliberating holding off on publishing TDT to use it as a test of philosophical / FAI research competence. He dropped some hints on LW (I think mostly that it had to do with Newcomb or cooperating in one-shot PD, and of course people knew that it had to do with AI) and also assigned MIRI (then SIAI) people to try to guess/reproduce his advance, and none of the then-SIAI people figured out what he had in mind or got very close until I posted about UDT (which combined my guess of Eliezer's idea with some of my own and other discussions on LW at the time, mainly from Vladmir Nesov).
Also, although I was separately interested in AI safety and decision theory, I didn't connect the dots between the two until I saw Eliezer's hints. I had investigated proto-updateless ideas to bypass difficulties in anthropic reasoning, and by the time Eliezer dropped his hints I had mostly given up on anyone being interested in my DT ideas. I also didn't think to question what I saw as the conventional/academic wisdom, that Defecting in one-shot PD is rational, as is two-boxing in NP.
So my guess is that while some people might have eventually come up with something like UDT even without Eliezer, it probably would have been seen as just one DT idea among many (e.g. SIAI people were thinking in various different directions, Gary Drescher who was independently trying to invent a one-boxing/cooperating DT had came up with a bunch of different ideas and remained unconvinced that UDT was the right approach), and also decision theory itself was unlikely to have been seen as central to AI safety for a time.
Ok, this changes my mental picture a little (although it's not very surprising that there would be some LW-influenced people at the labs privately still thinking/talking about decision theory). Any idea (or can you ask next time) how they feel about decision theory seemingly far from being solved, and their top bosses seemingly unaware or not concerned about this, or this concern being left out of all official communications?
Imagine someone (or civilization) who is not very philosophically competent (in an absolute sense, like myself) who somehow got access to a large amount of compute, perhaps by building an intent-aligned AGI who sourced a bunch of compute for them, and this AGI also isn't very philosophically competent (or they can't trust it or themselves not to be a clever arguer who could talk them into any conclusion, etc.), then how do you turn some of that compute into philosophical progress?
One idea I have is to simulate a diverse range of civilizations (to help avoid falling into the same cognitive traps) and look at the resulting distribution of philosophical arguments/conclusion, maybe try to sift for ones that are especially competent at philosophy, etc. Does this make sense? @Daniel Kokotajlo
Are people at the major AI companies talking about it privately? I don't think I've seen any official communications (e.g. papers, official blog posts, CEO essays) that mention it, so from afar it looks like decision theory has dropped off the radar of mainstream AI safety.
In retrospect it seems like such a fluke that decision theory in general and UDT in particular became a central concern in AI safety. In most possible worlds (with something like humans) there is probably no Eliezer-like figure, or the Eliezer-like figure isn't particularly interested in decision theory as a central part of AI safety, or doesn't like UDT in particular. I infer this from the fact that where Eliezer's influence is low (e.g. AI labs like Anthropic and OpenAI) there seems little interest in decision theory in connection with AI safety (cf Dario Amodei's recent article which triggered this reflection), and in other places interested in decision theory, that aren't downstream of Eliezer popularizing it, like academic philosophy, there's little interest in UDT.
If this is right, it's another piece of inexplicable personal "luck" from my perspective, i.e., why am I experiencing a rare timeline where I got this recognition/status.
Interesting that you have this impression, whereas I've been thinking of myself recently as doing a "breadth first search" to uncover high level problems that others seem to have missed or haven't bothered to write down. I feel like my writings in the last few years are pretty easy to understand without any specialized knowledge (whereas Google says "esoteric" is defined as "intended for or likely to be understood by only a small number of people with a specialized knowledge or interest").
If on reflection you still think "esoteric" is right, I'd be interested in an expansion on this, e.g. which of the problems I've discussed seem esoteric to you and why.
It doesn't look like humanity is on track to handle these problems, but "extremely unlikely" seems like an overstatement. I think there's still some paths where we handle these problems better, including 1) warning shots or political wind shift cause an AI pause/stop to be implemented, during which some of these problems/ideas are popularized or rediscovered 2) future AI advisors are influenced by my writings or are strategically competent enough to realize these same problems and help warn/convince their principals.
I also have other motivations including: