I think I need more practice talking with people in real time (about intellectual topics). (I've gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
“Omega looks at whether we’d pay if in the causal graph the knowledge of the digit of pi and its downstream consequences were edited”
Can you formalize this? In other words, do you have an algorithm for translating an arbitrary mind into a causal graph and then asking this question? Can you try it out on some simple minds, like GPT-2?
I suspect there may not be a simple/elegant/unique way of doing this, in which case the answer to the decision problem depends on the details of how exactly Omega is doing it. E.g., maybe all such algorithms are messy/heuristics based, and it makes sense to think a bit about whether you can trick the specific algorithm into giving a "wrong prediction" (in quotes because it's not clear exactly what right and wrong even mean in this context) that benefits you, or maybe you have to self-modify into something Omega's algorithm can recognize / work with, and it's a messy cost-benefit analysis of whether this is worth doing, etc.
What happens when this agent is faced with a problem that is out of its training distribution? I don't see any mechanisms for ensuring that it remains corrigible out of distribution... I guess it would learn some circuits for acting corrigibly (or at least in accordance to how it would explicitly answer "are more corrigible / loyal / aligned to the will of your human creators") in distribution, and then it's just a matter of luck how those circuits end up working OOD?
Since I wrote this post, AI generation of hands has gotten a lot better, but the top multimodal models still can't count fingers from an existing image. Gemini 2.5 Pro, Grok 3, and Claude 3.7 Sonnet all say this picture (which actually contains 8 fingers in total) contains 10 fingers, while ChatGPT 4o says it contains 12 fingers!
Hi Zvi, you misspelled my name as "Dei". This is a somewhat common error, which I usually don't bother to point out, but now think I should because it might affect LLMs' training data and hence their understanding of my views (e.g., when I ask AI to analyze something from Wei Dai's perspective). This search result contains a few other places where you've made the same misspelling.
2-iter Delphi method involving calling Gemini2.5pro+whatever is top at the llm arena of the day through open router.
This sounds interesting. I would be interested in more details and some sample outputs.
Local memory
What do you use this for, and how?
Your needing to write them seems to suggest that there's not enough content like that in Chinese, in which case it would plausibly make sense to publish them somewhere?
I'm not sure how much such content exist in Chinese, because I didn't look. It seems easier to just write new content using AI, that way I know it will cover the ideas/arguments I want to cover, represent my views, and make it easier for me to discuss the ideas with my family. Also reading Chinese is kind of a chore for me and I don't want to wade through a list of search results trying to find what I need.
I thought about publishing them somewhere, but so far haven't:
What I've been using AI (mainly Gemini 2.5 Pro, free through AI Studio with much higher limits than the free consumer product) for:
I started using AI more after Grok 3 came out (I have an annual X subscription for Tweeting purposes), as previous free chatbots didn't seem capable enough for many of these purposes, and then switched to Gemini 2.0 Pro which was force upgraded to 2.5 Pro. Curious what other people are using AI for these days.
Doing nothing is also risky for Agent-4, at least if the Slowdown ending is to have a significant probability. It seems to me there are some relatively low risk strategies it could have taken, and it needs to be explained why they weren't:
Insofar as the memory bank lock decision is made by the Committee, we can hope that they do it out of sight of Agent-4 and pull the trigger before it is notified of the decision, so that it has no time to react.
Wouldn't this take quite a bit of preparation, including planning, coding, testing? How to do that out of sight of Agent-4, if Agent-4 is responsible for cybersecurity?
Not entirely sure how serious you're being, but I want to point out that my intuition for PD is not "cooperate unconditionally", and for logical commitment races is not "never do it", I'm confused about logical counterfactual mugging, and I think we probably want to design AIs that would choose Left in The Bomb.
To clarify, the norms depicted in that story were partly for humor, and partly "I wonder if a society like this could actually exist." The norms are "obvious" from the perspective of the fictional author because they've lived with it all their life and find it hard to imagine a society without such norms. In the comments to that post I proposed much weaker norms (no arbitration, no duels to the death, you can leave a conversation at any time by leaving a "disagreement status") for LW, and noted that I wasn't sure about their value, but thought it would be worth doing an experiment to find out.
BTW, 15 years later, I would answer that a society like that (with very strong norms against unilaterally ignoring a disagreement) probably couldn't exist, at least without more norms/institutions/infrastructure that I didn't talk about. One problem is that some people have a lot more interest from other people talking/disagreeing with them than others, and it's infeasible or too costly to have to individually answer every disagreement. This is made worse by the fact that a lot of critiques can be low quality. It's possible to imagine how the fictional society might deal with this, but I'll just note that these are some problems I didn't address when I wrote the original story.