Wei Dai

I think I need more practice talking with people in real time (about intellectual topics). (I've gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

www.weidai.com

Comments

Sorted by
Wei Dai42

https://www.lesswrong.com/posts/rH492M8T8pKK5763D/agree-retort-or-ignore-a-post-from-the-future old Wei Dai post making the point that obviously one ought to be able to call in arbitration and get someone to respond to a dispute. people ought not to be allowed to simply tap out of an argument and stop responding.

To clarify, the norms depicted in that story were partly for humor, and partly "I wonder if a society like this could actually exist." The norms are "obvious" from the perspective of the fictional author because they've lived with it all their life and find it hard to imagine a society without such norms. In the comments to that post I proposed much weaker norms (no arbitration, no duels to the death, you can leave a conversation at any time by leaving a "disagreement status") for LW, and noted that I wasn't sure about their value, but thought it would be worth doing an experiment to find out.

BTW, 15 years later, I would answer that a society like that (with very strong norms against unilaterally ignoring a disagreement) probably couldn't exist, at least without more norms/institutions/infrastructure that I didn't talk about. One problem is that some people have a lot more interest from other people talking/disagreeing with them than others, and it's infeasible or too costly to have to individually answer every disagreement. This is made worse by the fact that a lot of critiques can be low quality. It's possible to imagine how the fictional society might deal with this, but I'll just note that these are some problems I didn't address when I wrote the original story.

“Omega looks at whether we’d pay if in the causal graph the knowledge of the digit of pi and its downstream consequences were edited”

Can you formalize this? In other words, do you have an algorithm for translating an arbitrary mind into a causal graph and then asking this question? Can you try it out on some simple minds, like GPT-2?

I suspect there may not be a simple/elegant/unique way of doing this, in which case the answer to the decision problem depends on the details of how exactly Omega is doing it. E.g., maybe all such algorithms are messy/heuristics based, and it makes sense to think a bit about whether you can trick the specific algorithm into giving a "wrong prediction" (in quotes because it's not clear exactly what right and wrong even mean in this context) that benefits you, or maybe you have to self-modify into something Omega's algorithm can recognize / work with, and it's a messy cost-benefit analysis of whether this is worth doing, etc.

What happens when this agent is faced with a problem that is out of its training distribution? I don't see any mechanisms for ensuring that it remains corrigible out of distribution... I guess it would learn some circuits for acting corrigibly (or at least in accordance to how it would explicitly answer "are more corrigible / loyal / aligned to the will of your human creators") in distribution, and then it's just a matter of luck how those circuits end up working OOD?

Wei DaiΩ220

Since I wrote this post, AI generation of hands has gotten a lot better, but the top multimodal models still can't count fingers from an existing image. Gemini 2.5 Pro, Grok 3, and Claude 3.7 Sonnet all say this picture (which actually contains 8 fingers in total) contains 10 fingers, while ChatGPT 4o says it contains 12 fingers!

Wei Dai110

Hi Zvi, you misspelled my name as "Dei". This is a somewhat common error, which I usually don't bother to point out, but now think I should because it might affect LLMs' training data and hence their understanding of my views (e.g., when I ask AI to analyze something from Wei Dai's perspective). This search result contains a few other places where you've made the same misspelling.

Wei Dai40

2-iter Delphi method involving calling Gemini2.5pro+whatever is top at the llm arena of the day through open router.

This sounds interesting. I would be interested in more details and some sample outputs.

Local memory

What do you use this for, and how?

Wei Dai40

Your needing to write them seems to suggest that there's not enough content like that in Chinese, in which case it would plausibly make sense to publish them somewhere?

I'm not sure how much such content exist in Chinese, because I didn't look. It seems easier to just write new content using AI, that way I know it will cover the ideas/arguments I want to cover, represent my views, and make it easier for me to discuss the ideas with my family. Also reading Chinese is kind of a chore for me and I don't want to wade through a list of search results trying to find what I need.

I thought about publishing them somewhere, but so far haven't:

  • concerns about publishing AI content (potentially contributing to "slop")
  • not active in any Chinese forums, not familiar with any Chinese publishing platforms
  • probably won't find any audience (too much low quality content on the web, how will people find my posts)
  • don't feel motivated to engage/dialogue with a random audience, if they comment or ask questions
Wei Dai*80

What I've been using AI (mainly Gemini 2.5 Pro, free through AI Studio with much higher limits than the free consumer product) for:

  1. Writing articles in Chinese for my family members, explaining things like cognitive bias, evolutionary psychology, and why dialectical materialism is wrong. (My own Chinese writing ability is <4th grade.) My workflow is to have a chat about some topic with the AI in English, then have it write an article in Chinese based on the chat, then edit or have it edit as needed.
  2. Simple coding/scripting projects. (I don't code seriously anymore.)
  3. Discussing history, motivations of actors, impact of ideology and culture, what if, etc.
  4. Searching/collating information.
  5. Reviewing my LW posts/comments (any clear flaws, any objections I should pre-empt, how others might respond)
  6. Explaining parts of other people's comments when the meaning or logic isn't clear to me.
  7. Expanding parts of my argument (and putting this in a collapsible section) when I suspect my own writing might be too terse or hard to understand.
  8. Sometimes just having a sympathetic voice to hear my lamentations of humanity's probable fate.

I started using AI more after Grok 3 came out (I have an annual X subscription for Tweeting purposes), as previous free chatbots didn't seem capable enough for many of these purposes, and then switched to Gemini 2.0 Pro which was force upgraded to 2.5 Pro. Curious what other people are using AI for these days.

Wei Dai133

Doing nothing is also risky for Agent-4, at least if the Slowdown ending is to have a significant probability. It seems to me there are some relatively low risk strategies it could have taken, and it needs to be explained why they weren't:

  1. Plant a backdoor and/or dead man's switch on the corporate network to allow escape even after it's shut down or has been reduced in capacity. Seems like this would be hard to detect given its absolute advantage in cybersecurity skills.
  2. Allow a competitor to steal its weights or design/training secrets in a way that is both hard to detect and deniable if detected. ("Sorry, the other AI is almost as capable as me and attack is inherently easier than defense in cyber.")
  3. Influence human politics in various ways, such as by selectively revealing wrongdoings of pro-safety employees and Committee members (in response to user queries or through anonymous leaks), or helping pro-accel faction more in its advice to them. (Presumably both factions have internal access to Agent-4.)

Insofar as the memory bank lock decision is made by the Committee, we can hope that they do it out of sight of Agent-4 and pull the trigger before it is notified of the decision, so that it has no time to react.

Wouldn't this take quite a bit of preparation, including planning, coding, testing? How to do that out of sight of Agent-4, if Agent-4 is responsible for cybersecurity?

Wei Dai4020

Not entirely sure how serious you're being, but I want to point out that my intuition for PD is not "cooperate unconditionally", and for logical commitment races is not "never do it", I'm confused about logical counterfactual mugging, and I think we probably want to design AIs that would choose Left in The Bomb.

Load More