momom2

AIS student, self-proclaimed aspiring rationalist, very fond of game theory.
"The only good description is a self-referential description, just like this one."

Posts

Sorted by New

15Two arguments against longtermist thought experiments

5mo

7Piling bounded arguments

6mo

8What criterion would you use to select companies likely to cause AI doom?

19Cheat sheet of AI X-risk

2Was Eliezer Yudkowsky right to give himself 10% to succeed with HPMoR in 2010?

3Do you like excessive sugar?

7How can there be a godless moral world ?

Wikitag Contributions

Comments

Sorted by

Newest

Recent AI model progress feels mostly like bullshit

momom210h10

I think what's going on is that large language models are trained to "sound smart" in a live conversation with users, and so they prefer to highlight possible problems instead of confirming that the code looks fine, just like human beings do when they want to sound smart.

This matches my experience, but I'd be interested in seeing proper evals of this specific point!

What did you learn from leaked documents?

momom22d10

The advice in there sounds very conducive to a productive environment, but also very toxic. Definitely an interesting read, but I wouldn't model my own workflow based on this.

We need (a lot) more rogue agent honeypots

momom22d10

Honeypots should not be public and mentioned here since this post will potentially be part of a rogue AI's training data.
But it's helpful for people interested in this topic to look at existing honeypots (to learn how to make their own, evaluate effectiveness, get intuitions about honeypots work, etc.) so what you should do is mention that you made a honeypot or know of one, but not say what or where. Interested people can contact you privately if they care to.

What is Interpretability?

momom24d10

Thank you very much, this was very useful to me.

Twelve Virtues of Rationality

momom26d41

They're a summarization of a lot of vibes from the Sequences.
Artistic choice, I assume. It doesn't bear on the argument.
Yudkowsky explains all about the virtues in the Sequences.
For studies, there are broad studies on cognitive science (especially relating to bias) but you'll be hard-pressed to match them precisely to one virtue or another. Mostly, Yudkowsky's opinions on these virtues are supported by academic literature, but I'm not aware of any work that showcases this clearly.
For practical experience, you can look into the legacy of the Center For Applied Rationality (CFAR) which tried for years to do just that: train people to get better at life using rationality. Mostly, I was under the impression that they had medium success, but I haven't looked deeply into it.

Why it's so hard to talk about Consciousness

momom28d10

Do you know what it feels like to feel pain? Then congratulations, you know what it feels like to have qualia. Pain is a qualia. It's that simple. If I told you that I was going to put you in intense pain for an hour, but I assured you there would be no physical damage or injury to you whatsoever, you would still be very much not ok with that. You would want to avoid that experience. Why? Because pain hurts! You're not afraid of the fact that you're going to have an "internal representation" of pain, nor are you worried about what behavior you might display as a result of the pain. You're worried first and foremost about the fact that it's going to hurt! The "hurt" is the qualia.

I still don't grok qualia, and I'm not sure I get your thought experiment.

To be more detailed, let's imagine the following:
"I'll cut off your arm, but you'll be perfectly fine, no pain, no injury, well would you be okay with that? No! That's because you care about your arm for itself and not just for the negative effects..."
"How can you cut off my arm without any negative effect?"
"I'll anesthesize you and put you to sleep, cut off your arm, then before you wake up, I'll have it regrown using technanobabble. Out of 100 patients, none reported having felt anything bad before, during or after the experiment, the procedure is perfectly side-effect-free."
"Well, in that case I guess I don't mind you cutting my arm."

Compare:
"I'll put you in immense pain, but there will be no physical damage or injury whatsoever. No long-term brain damage or lingering pain or anything."
"How can you put me in pain without any negative effect?"
"I'll cut out the part of your brain that processes pain and replace it by technanobabble so your body will work exactly as before. Meanwhile, I'll stimulate this bit of brain in a jar. Then, I'll put it back. Out of 100 patients, all displayed exactly the same behavior as if nothing had been done to them."
"Well, in that case, I don't mind you putting me in this 'immense pain'."

I think the article's explanation of the difference between our intuitions is quite crisp, but it still seems self-evident to me that when you try to operationalize the thing it disappears. The self-evidence is the problem, since you intuit differently - I am fairly confident from past conversations that my comparison will seem flawed to you in some important way but I can't predict in what way (If you have some general trick for being able to tell how qualia-realist people answer such questions, I'd love to hear it, it sounds like a big step towards grokking your perspective)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?

momom29d30

For making an AI Safety video, we at the CeSIA also have had some success at it and we'd be happy to help by providing technical expertise, proofreading and translation in French.
Other channels you could reach out to:

Rational Animations (bit redundant with Rob Miles, but it can't hurt)
Siliconversations
AI Explained

Bogdan Ionut Cirstea's Shortform

momom212d30

The first thing that comes to mind is to beg the question of what proportion of human-generated papers are publishing-worthier (since a lot of them are slop), but let's not forget that publication matters little for catastrophic risk, it's actually getting results that would be important.
So I recommend not updating at all on AI risk based on Sakana's results (or updating negatively if you expected that R&D automation would come faster, or that this might slow down human augmentation).

How to Make Superbabies

momom226d10

In that case, per my other comment, I think it's much more likely that superbabies concern only a small fraction of the population and exacerbates inequality without bringing the massive benefits that a generally more capable population would.

Do you think superbabies would be put to work on alignment in a way that makes a difference due to geniuses driving the field? I'm having trouble understanding how concretely you think superbabies can lead to significantly improved chance of helping alignment.

How to Make Superbabies

momom21mo114

I'm having trouble understanding your ToC in a future influenced by AI. What's the point of investigating this if it takes 20 years to become significant?