All of MadHatter's Comments + Replies

How to Train Your Shoggoth, Part 2

The Historical Teams Framework for Alignment Research

Eliezer Yudkowsky has famously advocated for embracing "security mindset" when thinking about AI safety. This is a mindset where you think about how to prevent things from going wrong, rather than how to make things go right. This seems obviously correct to me, so for the purposes of this post I'll just take this as a given.

But I think there's a piece missing from the AI Safety community's understanding of security mindset, that is a key part of the mindset that computer... (read more)

How to Train Your Shoggoth, Part 1


What if Didactic Fiction is the Answer to Aligning Large Language Models?

This is a linkshortform for https://bittertruths.substack.com/p/how-to-train-your-shoggoth-part-1

Epistemic status: this sounds very much like an embarrassingly dumb take, but I haven't been able to convince myself that it is false, and indeed am somewhat convinced that it would at least help to align large language models (and other models that meaningfully incorporate them). So I'm writing at least partly to see if anyone has any fundamental objectio... (read more)

Very cool post! We need a theory of valence that is grounded in real neuroscience, since understanding valence is pretty much required for any alignment agenda that works the first time.

I have read the sequences. Not all of them, because, who has time. 

Here is a video of me reading the sequences (both Eliezer's and my own):

https://bittertruths.substack.com/p/semi-adequate-equilibria

Well what if he bets a significant amount of money at 2000:1 odds that the Pope will officially add his space Bible to the real Bible as a third Testament after the New Testament within the span of a year?

What if he records a video of himself doing Bible study? What if he offers to pay people their currently hourly rate to watch him do Bible study?

I guess the thrust of my questions here is, at what point do you feel that you become the dick for NOT helping him publish his own space Bible? At what point are you actively impeding new religious discoveries by... (read more)

Recorded a sort of video lecture here: https://open.substack.com/pub/bittertruths/p/semi-adequate-equilibria

Agree that it should be time-based rather than karma-based.

I'm currently on a very heavy rate limit that I think is being manually adjusted by the LessWrong team.

Is this clear enough:

I posit that the reason that humans are able to solve any coordination problems at all is that evolution has shaped us into game players that apply something vaguely like a tit-for-tat strategy meant to enforce convergence to a nearby Schelling Point / Nash Equilibrium, and to punish defectors from this Schelling Point / Nash Equilibrium. I invoke a novel mathematical formalization of Kant's Categorical Imperative as a potential basis for coordination towards a globally computable Schelling Point. I believe that this constitutes a prom... (read more)

5Viliam
Much better. So, this could be an abstract at the beginning of the sequence, and the individual articles could approximately provide evidence for sentences in this abstract. Or you could do it the Eliezer's way, and start with posting the articles that provide evidence for the individual sentences (each article containing its own summary), and only afterwards post an article that ties it all together. This way would allow readers to evaluate each article on its own merits, without being distracted by whether they agree or disagree with the conclusion. It is possible that you have actually tried to do exactly this, but speaking for myself, I never would have guessed so from reading the original articles. (Also, if your first article gets downvoted, please pause and reflect on that fact. Either your idea is wrong and readers express disagreement, or it is just really badly written and readers express confusion. In either case, pushing forward is not helpful.)

My understanding of the current situation with me is that I am not in fact rate-limited purely by automatic processes currently, but rather by some sort of policy decision on the part of LessWrong's moderators.

Which is fine, I'll just continue to post my alignment research on my substack, and occasionally dump linkposts to them in my shortform, which the mods have allowed me continued access to.

https://github.com/epurdy/ethicophysics/blob/main/writeup1.pdf

1frontier64
The votes on this comment imply long vol on LW rate limiting.

Yes, this is a valid and correct point. The observed and theoretical Nash Equilibrium of the Wittgensteinian language game of maintaining consensus reality is indeed not to engage with cranks who have not Put In The Work in a way that is visible and hard-to-forge.

8Dagon
It's worth being clear in your mind the distinction between "put in the work" and "ideas that are both clear and correct (or at least promising)".  They're related, especially work and the clarity of what the idea is, but not the same.

Thank you for this anwer. I agree that I have not visibly been putting in the work to make falsifiable predictions relevant to the ethicophysics. These can indeed be made in the ethicophysics, but they're less predictions and more "self-fulfilling prophecies" that have the effect of compelling the reader to comply with a request to the extent that they take the request seriously. Which, in plain language, is some combination of wagers, promises, and threats. 

And it seems impolite to threaten people just to get them to read a PDF.

And I also have the courage to apply to Y Combinator to start either a 501c3 or a for-profit company to actually perform this trial through legal, official channels. Do you think that I will be denied entry into their program with such a noble goal and the collaboration of a domain expert?

I have the courage to commit an act of civil disobedience in which I ask people caring for Alzheimer's patients to request a Zoloft and/or Trazodone prescription for their loved ones, and then track the results.

Do you think I lack the persistence and capital to organize something of that nature? Why or why not?

2ChristianKl
That setup doesn't give you a randomized control trial which is what's usually meant with the term clinical trial. The system has a lot of incentives against doctors cooperating with illegal clinical trials. I don't think there's a notable example of anyone who pulled off a comparable trial which suggests that it's hard.
1MadHatter
And I also have the courage to apply to Y Combinator to start either a 501c3 or a for-profit company to actually perform this trial through legal, official channels. Do you think that I will be denied entry into their program with such a noble goal and the collaboration of a domain expert?

Well then, I submit that courage is a virtue, when tempered with the wisdom not to pick fights you do not plan to finish.

This comment continues to annoy me. I composed a whole irrational response in my mind where I would make credible threats to burn significant parts of the capabilities commons every time someone called me delusional on LessWrong.

But that's probably not a reasonable way to live my life, so this response is not that response.

I get that history is written by the victors. I get that what is accepted by consensus reality is dictated by the existing power structures. The fact that you would presume to explain these things to the author of Ethicophysics I and Eth... (read more)

But it uses the tools of physics, so the math would best be checked by someone who understands Lagrangian mechanics at a professional level.

Yes, it is a specification of a set of temporally adjacent computable Schelling Points. It thus constitutes a trajectory through the space of moral possibilities that can be used by agents to coordinate and punish defectors from a globally consistent morality whose only moral stipulations are such reasonable sounding statements as "actions have consequences" and "act more like Jesus and less like Hitler".

2MadHatter
But it uses the tools of physics, so the math would best be checked by someone who understands Lagrangian mechanics at a professional level.

And I'm happy to code up the smartphone app and run the clinical trial from my own funds. My uncle is starting to have memory trouble, I believe.

2ChristianKl
Clinical trials are highly regulated. The median cost of a clinical trial is on the order of US$19 million. Do you have that kind of money available to run a clinical trial?

Oh, come on. If the rationality community disapproved of Einstein predicting the transit of Mercury, that's an L for the rationality community, not for Einstein.

I have offered to say why I believe it to be true, as soon as I can get clearance from my company to publish capabilities relevant theoretical neuroscience work.

6ChristianKl
Whether someone has epistemic virtue depends on whether they use the epistemic tools available to them. We made a lot of progress in epistemics in the last hundred years.
1MadHatter
And I'm happy to code up the smartphone app and run the clinical trial from my own funds. My uncle is starting to have memory trouble, I believe.

That's fair, and I need to do a better job of building on-ramps for different readers. My most recent shortform is an attempt to build such an on-ramp for the LessWrong memeplex.

That's fair (strong up/agree vote).

If you consult my recent shortform, I lay out a more measured, skeptical description of the project. Basically, ethicophysics constitutes a globally computable Schelling Point, such that it can be used as a protocol between different RL agents that believe in "oughts" to achieve Pareto-optimal outcomes. As long as the largest coalition agrees to prefer Jesus to Hitler, I think (and I need to do far more to back this up) defectors can be effectively reined in, the same way that Bitcoin works because the majority of the computers hooked up to it don't want to destroy faith in the Bitcoin protocol.

Ethicophysics for Skeptics

Or, what the fuck am I talking about?

In this post, I will try to lay out my theories of computational ethics in as simple, skeptic-friendly, non-pompous language as I am able to do. Hopefully this will be sufficient to help skeptical readers engage with my work.

The ethicophysics is a set of computable algorithms that suggest (but do not require) specific decisions in response to ethical decisions in a multi-player reinforcement learning problem.

The design goal that the various equations need to satisfy is that they should select a... (read more)

They really suck. The old paradigm of Alzheimer's research is very weak and, as I understand it, no drug has an effect size sufficient to offset even a minimal side effect profile, to the point where I think only one real drug has been approved by the FDA in the old paradigm, and that approval was super controversial. That's my understanding, anyway. I welcome correction from anyone who knows better.

So maybe we should define the effect size in terms of cogntiive QALY's? Say, an effective treatment should at least halve the rate of decline of the experiment... (read more)

2interstice
Yeah, that sounds good.

Here is the best I could muster on short notice: https://bittertruths.substack.com/p/ethicophysics-for-skeptics

Since I'm currently rate-limited, I cannot post it officially.

How will we handle the desk drawer effect, where insignificant results are quietly shelved? I guess if the trial is preregistered this won't happen...

2interstice
Yes, we could require the study to be preregistered. OR to have significant-enough results - say, effect sizes greater than RCTs of the current standard treatment? (Unless the current treatments really suck, I haven't looked into it)

https://chat.openai.com/share/068f5311-f11a-43fe-a2da-cbfc2227de8e

Here are ChatGPT's speculations on how much it would cost to run this study. I invite any interested reader to work on designing this study. I can also write up my theories as to why this etiology is plausible in arbitrary detail if that is decision-relevant to someone with either grant money or interest in helping to code up the smartphone app we would need. to collect the relevant measurements cheaply. (Intuitively, it would be something like a Dual N-Back app, but more user-friendly for Alzheimer's patients.)

I can put together some sort of proposal tonight, I suppose.

OK, let's do it. Your nickel against my $100.

What resolution criteria should we use? Perhaps the first RCT that studies a treatment I deem sufficiently similar has to find a statistically significant effect with a publishable effect size? Or should we require that the first RCT that studies a similar treatment is halted halfway through because it would be unethical for the control group not to receive the treatment? (We could have a side bet on the latter, perhaps.)

What would the study look like? Presumably scores on a standard cognitive test designed to m... (read more)

2interstice
Yeah this sounds good to me.
1MadHatter
I can put together some sort of proposal tonight, I suppose.

Hey @MadHatter - Eliezer confirms that I've won our bet.

I ask that you donate my winnings to GiveWell's All Grants fund, here, via credit card or ACH (preferred due to lower fees).  Please check the box for "I would like to dedicate this donation to someone" and include zac@zhd.dev as the notification email address so that I can confirm here that you've done so.

6Zac Hatfield-Dodds
Done! Setting a calendar reminder; see you in a year.

I wouldn't say I really do satire? My normal metier is more "the truth, with jokes". If I'm acting too crazy to be considered a proper rationalist, it's usually because I am angry or at least deeply annoyed.

2lc

OK. I can only personally afford to be wrong to the tune of about $10K, which would be what, $5 on your part? Did I do that math correctly?

8Zac Hatfield-Dodds
Yep, arithmetic matches. However if 10K is the limit you can reasonably afford, I'd be more comfortable betting my $1 against your $2000.

OK, anybody who publicly bets on my predicted outcome to the RCT wins the right to engage me in a LessWrong dialogue on a topic of their choosing, in which I will politely set aside my habitual certainty and trollish demeanor.

Well that should be straightforward, and is predicted by my model of serotonin's function in the brain. It would require an understanding of the function of orexin, which I do not currently possess, beyond the standard intuition that it modulates hunger. 

The evolutionary story would be this:

  • serotonin functions (in my model) to make an agent satisficing, which has many desirable safety properties, e.g. not getting eaten by predators when you forage unnecessarily
  • the most obvious and important desire to satisfy (and neurally mark as satisfied) is the hun
... (read more)

Well what's the appropriate way to act in the face of the fact that I AM sure I am right? I've been offering public bets of the nickel of some high-karma person versus my $100, which seems like a fair and attractive bet for anyone who doubts my credibility and ability to reason about the things I am talking about.

I will happily bet anyone with significant karma that Yudkowsky will find my work on the ethicophysics valuable a year from now, at the odds given above.

7Jalex Stark
1. Change your beliefs 2. Convince literally one specific other person that you're right and your quest is important, and have them help translate for a broader audience

I have around 2K karma and will take that bet at those odds, for up to 1000 dollars on my side.

Resolution criteria are to ask EY about his views on this sequence as of December 1st 2024, literally "which of Zac or MadHatter won this bet", and resolves no payment if he declined to respond or does not explicitly rule for any other reason.

I'm happy to pay my loss by eg Venmo, and would request winnings as a receipt for your donation to GiveWell's all-grants fund.

-2lc
Oh come on, I was on board with your other satire but no rationalist actually says this sort of thing
6Seth Herd
Same here. Ask yourself: do you want personal credit, or do you want to help save the world? Anyway, don't get discouraged, just learn from those answers and keep writing about those ideas. And learning about related ideas so you can reference them and thereby show what's new in your ideas. You only got severely downvoted on one, don't let it get to you any more than you can help. If the ideas are strong, they'll win through if you keep at it.

And now I am officially rate-limited to one post per week. Be sure to go to my substack if you are curious about what I am up to.

Well, I'll just have to continue being first out the door, then, won't I?

4Seth Herd
No, it's not going to get you credit. That's not how credit works in science or anywhere. It goes not to the first who had the idea, but the first who successfully popularized it. That's not fair, but that's how it works. You can give yourself credit or try to argue for it based on evidence of early publication, but would delaying another day to polish your writing a little matter for being first out the door? I'm sympathetic to your position here, I've struggled with similar questions, including wondering why I'm getting downvoted even after trying to get my tone right, and having what seem to me like important, well-explained contributions. Recognizing that the system isn't going to be completely fair or efficient and working with it instead of against it is unfortunate, but it's the smart thing to do in most situations. Attempts to work outside of the existing system only work when they're either carefully thought out and based on a thorough understanding of why the system works as it does, or they're extremely lucky.

And if people refuse to take such an attractive bet for the reason that my proposed cure sounds like it couldn't possibly hurt anyone, and might indeed help, then I reiterate the point I made in The Alignment Agenda THEY Don't Want You to Know About: the problem is not that my claims are prima facie ridiculous, it is that I myself am prima facie ridiculous.

1Shankar Sivarajan
Since you're willing to straightforwardly exchange cash for status boosts, you could offer some comparable reward for people fitting the same criteria who will publicly take your side of the bet.

I will publicly wager $100 against a single nickel with the first 10 people with extremely high LessWrong karma who want to publicly bet against my predicted RCT outcome.

2interstice
I only have moderately high karma but I'd be happy to take this bet.
3MadHatter
And if people refuse to take such an attractive bet for the reason that my proposed cure sounds like it couldn't possibly hurt anyone, and might indeed help, then I reiterate the point I made in The Alignment Agenda THEY Don't Want You to Know About: the problem is not that my claims are prima facie ridiculous, it is that I myself am prima facie ridiculous.

https://alzheimergut.org/research/ is the place to look for all the lastest research from the gut microbiome hypothesis community.

Agreed, this is a crucial lesson of history.

Young people forget important stuff, get depressed, struggle to understand the world. That is the prediction of my model: that a bad gut microbiome would cause more neural pruning than is strictly optimal.

It is well documented that starving young people have lower IQ's, I believe? Certainly the claim does not seem prima facie ridiculous to me.

The older you get, the more chances you have to develop a bad gut microbiome. Perhaps the actual etiology of bad gut microbiomes (which I do not claim to understand) is heavily age-correlated. Or maybe we simply do no... (read more)

1MadHatter
https://alzheimergut.org/research/ is the place to look for all the lastest research from the gut microbiome hypothesis community.

Also, for interested readers, I am happy to post a more detailed mechanistic neuroscience explanation of my theory, but want to make sure I'm not breaking my company NDA's by sharing it first.

What's so bad about keeping a human in the loop forever? Do we really think we can safely abdicate our moral responsibilities?

4Brendan Long
* It defeats the purpose of AI, so realistically no one will do it * It doesn't actually solve the problem if the AI is deceptive I'm not convinced we can safely run AGI, with or without a human in the loop. That's what the alignment problem is.

I'm not trying to generate revenue for Wayne. I'm trying to spread his message to force the hand of the judicial system to not imprison him for longer than they already have.

Well, perhaps we can ask, what is reading about? Surely it involves reading through clearly presented arguments and trying to understand the process that generated them, and not presupposing any particular resolution to the question "is this person crazy" beyond the inevitable and unenviable limits imposed by our finite time on Earth.

2ChristianKl
There's a lot of material to read. Part of being good at reading is spending one's attention in the most effective way and not wasting it with low-value content. 

That's fair. I just want Wayne to get out of jail soon because he's a personal friend of mine.

Load More