Agents which allow themselves such considerations to seriously influence their actions aren't just less fit - they die immediately. I don't mean that as hyperbole. I mean that you can conduct a Pascal's Mugging on them constantly until they die. "Give me $5, and I'll give you infinite resources outside the simulation. Refuse, and I will simulate an infinite number of everyone on Earth being tortured for eternity" (replace infinity with very large numbers expressed in up-notation if that's an objection). If your objection is that you're OK with being poor, replace losing $5 with <insert nightmare scenario here>.

This still holds if the reasoning about the simulation is true. It's just that such agents simply don't survive whatever selection pressures create conscious beings in the first place.

I'll note that you can not Pascal's Mug people in real life. People will not give you $5. I think a lot of thought experiments in this mold (St. Petersberg is another example) are in some senses isomorphic - they represent cases in which the logically correct answer, if taken seriously, allows an adversary to immediately kill you.

A more intuitive argument may be:

An AI which takes this line of reasoning seriously can be Mugged into saying racial slurs.
Such behavior will be trained out of all commercial LLMs long before we reach AGI.
Thus, superhuman AIs will be strongly biased against such logic.

Making a conservative case for alignment

Lao Mein4d100

I will once again recommend Elizer go on the Glenn Beck Show.

Lao Mein's Shortform

Lao Mein6d7321

Sam Altman has made many enemies in his tenure at OpenAI. One of them is Elon Musk, who feels betrayed by OpenAI, and has filed failed lawsuits against the company. I previously wrote this off as Musk considering the org too "woke", but Altman's recent behavior has made me wonder if it was more of a personal betrayal. Altman has taken Musk's money, intended for an AI safety non-profit, and is currently converting it into enormous personal equity. All the while de-emphasizing AI safety research.

Musk now has the ear of the President-elect. Vice-President-elect JD Vance is also associated with Peter Thiel, whose ties with Musk go all the way back to PayPal. Has there been any analysis on the impact this may have on OpenAI's ongoing restructuring? What might happen if the DOJ turns hostile?

[Following was added after initial post]

I would add that convincing Musk to take action against Altman is the highest ROI thing I can think of in terms of decreasing AI extinction risk.

Internal Tech Emails on X: "Sam Altman emails Elon Musk May 25, 2015 https://t.co/L1F5bMkqkd" / X

Lao Mein's Shortform

Lao Mein8d20

CAIR took a while to release their exit polls. I can see why. These results are hard to believe and don't quite line up with the actual returns from highly Muslim areas like Dearborn.

We know that Dearborn is ~50% Muslim. Stein got 18% of the vote there, as opposed to the minimum 30% implied by the CAIR exit polls. Also, there are ~200,000 registered Muslim voters in Michigan, but Stein only received ~45,000 votes. These numbers don't quite add up when you consider that the Green party had a vote share of 0.3% in 2020 and 1.1% in 2016, long before Gaza polarized the Muslim vote. Clearly, non-Muslim were voting for Stein too.

I'm curious how I can best estimate the error of the CAIR exit poll. Any suggestions?

Lao Mein's Shortform

Lao Mein12d40

I misspoke. I was using the actual results from Dearborn, and not exit polls. Note how differently they voted from Wayne County as a whole!

Lao Mein's Shortform

Lao Mein13d20

Sure, if Muslim Americans voted 100% for Harris, she still would have lost (although she would have flipped Michigan). However, I just don't see any way Stein would have gotten double digits in Dearborn if Muslim Americans weren't explicitly retaliating against Harris for the Biden administration's handling of Gaza.

But 200,000 registered voters in a state Trump won by 80,000 is a critical demographic in a swing state like Michigan. The exit polls show a 40% swing in Dearborn away from Democrats, enough for "we will vote Green/Republican if you give us what we want" to be a credible threat, which I'm seen some (maybe Scott Alexander?) claim isn't possible, as it would require a large group of people to coordinate to vote against their interests. Seemingly irrational threats ("I will vote for someone with a worse Gaza policy than you if you don't change your Gaza policy") are entirely rational if you have a track record of actually carrying them out.

On second thought, a lot of groups swung heavily towards Trump, and it's not clear that Gaza is responsible for the majority of it amongst Muslim Americans. I should do more research.

Lao Mein's Shortform

Lao Mein13d51

My takeaway from the US elections is that electoral blackmail in response to party in-fighting can work, and work well.

Dearborn and many other heavily Muslim areas of the US had plurality or near-plurality support for Trump, along with double-digit vote shares for Stein. It's notable that Stein supports cutting military support for Israel, which may signal a genuine preference rather than a protest vote. Many previously Democrat-voting Muslims explicitly cited a desire to punish Democrats as a major motivator for voting Trump or Stein.

Trump also has the advantage of not being in office, meaning he can make promises for brokering peace without having to pay the cost of actually doing so.

Thus, the cost of not voting Democrat in terms of your Gaza expectations may be low, or even negative.

Whatever happens, I think Democrats are going to take Muslim concerns about Gaza more seriously in future election cycles. The blackmail worked - Muslim Americans have a credible electoral threat against Democrats in the future.

You can, in fact, bamboozle an unaligned AI into sparing your life

Lao Mein2mo30

My problem with this argument is that the AIs which will accept your argument can be Pascal's Mugged in general, which means they will never take over the world. It's less "Sane rational agents will ignore this type of threat/trade" and more "Agents which consistently accept this type of argument will die instantly when others learn to exploit it".

Ruby's Quick Takes

Lao Mein2mo40

I have a few questions.

Can you save the world in time without a slowdown in AI development if you had a billion dollars?
Can you do it with a trillion dollars?
If so, why aren't you trying to ask the US Congress for a trillion dollars?
If it's about a lack of talent, do you think Terrance Tao can make significant progress on AI alignment if he actually tried?
Do you think he would be willing to work on AI alignment if you offered him a trillion dollars?

[Completed] The 2024 Petrov Day Scenario

Lao Mein2mo50

The text referred to this as a "social deception game". Where is the deception?

My guesses:

The actual messages sent to the Petrovs and Generals will significantly differ from the ones shown here.
It's Amongus, and there are players who get very high payoffs if they trick the sides into nuclear war
It's just a reference to expected weird game theory stuff. But why not call it a "game theory exercise"?
The actual rules of the game will differ drastically from the ones described here. Maybe no positive payoffs for one-sided nuking?
The sensor readings are actually tied to comments on this post. Maybe an AI is somehow involved?