Sequences

Singularity now: is GPT-4 trying to takeover the world?

Wiki Contributions

Comments

Sorted by

A universe with classical mechanics, except that when you die the universe gets resampled, would be anthropic angelic.

Beings who save you are also anthropic angelic. For example, the fact that you don't die while driving is because the engineers explicitly tried to minimize your chance of death. You can make inferences based on this. For example, even if you have never crashed, you can reason that during a crash you will endure less damage than other parts of the car, because the engineers wanted to save you more than they wanted to save the parts of the car.

No, the argument is that the traditional (weak) evidence for anthropic shadow is instead evidence of anthropic angel. QI is an example of anthropic angel, not anthropic shadow.

So for example, a statistically implausible number of LHC failures would be evidence for some sort of QI and also other related anthropic angel hypotheses, and they don't need to be exclusive.

The more serious problem is that quantum immortality and angel immortality eventually merges

An interesting observation, but I don't see how that is a problem with Anthropically Blind? I do not assert anywhere that QI and anthropic angel are contradictory. Rather, I give QI as an example of an anthropic angel.

"I am more likely to be born in the world where life extensions technologies are developing and alignment is easy". Simple Bayesian update does not support this.

I mean, why not?

P(Life extension is developing and alignment is easy | I will be immortal) = P(Life extension is developing and alignment is easy) * (P(I will be immortal | Life extension is developing and alignment is easy) / P(I will be immortal))

Believing QI is the same as a Bayesian update on the event "I will become immortal".

Imagine you are a prediction market trader, and a genie appears. You ask the genie "will I become immortal" and the genie answers "yes" and then disappears.

Would you buy shares on a Taiwan war happening?

If the answer is yes, the same thing should apply if a genie told you QI is true (unless the prediction market already priced QI in). No weird anthropics math necessary!

LDT decision theories are probably the best decision theories for problems in the fair problem class.

The post demonstrates why this statement is misleading.

If "play the ultimatum game against a LDT agent" is not in the fair problem class, I'd say that LDT shouldn't be in the "fair agent class". It is like saying that in a tortoise-only race, the best racer is a hare because a hare can beat all the tortoises.

So based on the definitions you gave I'd classify "LDT is the best decision theory for problems in the fair problem class" as not even wrong.

In particular, consider a class of allowable problems S, but then also say that an agent X is allowable only if "play a given game with X" is in S. Then the proof in the No agent is rational in every problem section of my proof goes through for allowable agents. (Note that that argument in that section is general enough to apply to agents that don't give into $9 rock.)

Practically speaking: if you're trying to follow decision theory X, than playing against other X is a reasonable problem

Another problem is, do you know how to formulate/formalize a version of LDT so that we can mathematically derive the game outcomes that you suggest here?

There is a no free lunch theorem for this. LDT (and everything else) can be irrational

I would suggest formulating this like a literal attention economy.

  1. You set a price for your attention (probably like $1). The price at which even if the post is a waste of time, the money makes it worth it.
  2. "Recommenders" can recommend content to you by paying the price.
  3. If the content was worth your time, you pay the recommender the $1 back plus a couple cents.

The idea is that the recommenders would get good at predicting what posts you'd pay them for. And since you aren't a causal decision theorist they know you won't scam them. In particular, on average you should be losing money (but in exchange you get good content).

This doesn't necessarily require new software. Just tell people to send PayPals with a link to the content.

With custom software, theoretically there could exist a secondary market for "shares" in the payout from step 3 to make things more efficient. That way the best recommenders could sell their shares and then use that money to recommend more content before you payout.

If the system is bad at recommending content, at least you get paid!

Yes this would be a no free lunch theorem for decision theory.

It is different from the "No free lunch in search and optimization" theorem though. I think people had an intuition that LDT will never regret its decision theory, because if there is a better decision theory than LDT will just copy it. You can think of this as LDT acting as tho it could self-modify. So the belief (which I am debunking) is that the environment can never punish the LDT agent; it just pretends to be the environment's favorite agent.

The issue with this argument is that in the problem I published above, the problem itself contains a LDT agent, and that LDT agent can "punish" the first for acting like, or even pre-committing to, or even literally self-modifying to become $9 rock. It knows that the first agent didn't have to do that.

So the first LDT agent will literally regret not being hardcoded to "output $9".

This is very robust to what we "allow" agents to do (can they predict each other, how accurately can they predict each other, what counterfactuals are legit or not, etc...), because no matter what the rules are you can't get more than $5 in expectation in a mirror match.

I disagree with my characterization as thinking problems can be solved on paper

Would you say the point of MIRI was/is to create theory that would later lead to safe experiments (but that it hasn't happened yet)? Sort of like how the Manhattan project discovered enough physics to not nuke themselves, and then started experimenting? 🤔

Load More