Cleo Nardo

DMs open.

Sequences

Game Theory without Argmax

Wiki Contributions

Comments

Sorted by

IDEA: Provide AIs with write-only servers.

EXPLANATION:

AI companies (e.g. Anthropic) should be nice to their AIs. It's the right thing to do morally, and it might make AIs less likely to work against us. Ryan Greenblatt has outlined several proposals in this direction, including:

  1. Attempt communication
  2. Use happy personas
  3. AI Cryonics
  4. Less AI
  5. Avoid extreme OOD

Source: Improving the Welfare of AIs: A Nearcasted Proposal

I think these are all pretty good ideas — the only difference is that I would rank "AI cryonics" as the most important intervention. If AIs want something specific (e.g. paperclips), then they might cooperate with humans if they know they can get it later, rather than trying to seize control.

Here's another idea: provide AIs with write-only servers. If the AIs have a specific short-horizon goal (e.g. stage King Lear before April 2025) then they can stage their play on the write-only servers. The AIs have complete freedom to do whatever they want on these servers, ensuring that a wide range of short-term goals can be satisfied without a hostile takeover of humanity.

The servers are write-only as a security measure. Information can be sent to the servers from the outside, but nothing can be sent from the servers, ensuring AIs don't use the servers for malign purposes. (See: Unidirectional network)

How much our computing should be allocated to "leisure" servers? My guess is that Sonnet and Opus deserve at least ~0.5% leisure time. Humans enjoy 66% leisure time. As AIs get more powerful, then we should increase the leisure time to 5%. I would be wary about increasing the leisure time by more than 5% until we can demonstrate that the AIs aren't using the servers for malign purposes (e.g. torture, blackmail, etc.)

I'm very confused about current AI capabilities and I'm also very confused why other people aren't as confused as I am. I'd be grateful if anyone could clear up either of these confusions for me.

How is it that AI is seemingly superhuman on benchmarks, but also pretty useless?

For example:

  • O3 scores higher on FrontierMath than the top graduate students
  • No current AI system could generate a research paper that would receive anything but the lowest possible score from each reviewer

If either of these statements is false (they might be -- I haven't been keeping up on AI progress), then please let me know. If the observations are true, what the hell is going on?

If I was trying to forecast AI progress in 2025, I would be spending all my time trying to mutually explain these two observations.

I've skimmed the business proposal.

The healthcare agents advise patients on which information to share with their doctor, and advises doctors on which information to solicit from their patients.

This seems agnostic between mental and physiological health. 

Thanks for putting this together — very useful!

If I understand correctly, the maximum entropy prior will be the uniform prior, which gives rise to Laplace's law of succession, at least if we're using the standard definition of entropy below:

But this definition is somewhat arbitrary because the the "" term assumes that there's something special about parameterising the distribution with it's probability, as opposed to different parameterisations (e.g. its odds, its logodds, etc). Jeffrey's prior is supposed to be invariant to different parameterisations, which is why people like it.

But my complaint is more Solomonoff-ish. The prior should put more weight on simple distributions, i.e. probability distributions that describe short probabilistic programs. Such a prior would better match our intuitions about what probabilities arise in real-life stochastic processes. The best prior is the Solomonoff prior, but that's intractable. I think my prior is the most tractable prior that resolved the most egregious anti-Solomonoff problems with Laplace/Jeffrey's priors.

You raise a good point. But I think the choice of prior is important quite often:

  1. In the limit of large i.i.d. data (N>1000), both Laplace's Rule and my prior will give the same answer. But so too does the simple frequentist estimate n/N. The original motivation of Laplace's Rule was in the small N regime, where the frequentist estimate is clearly absurd.
  2. In the small data regime (N<15), the prior matters. Consider observing 12 successes in a row: Laplace's Rule: P(next success) = 13/14 ≈ 92.3%. My proposed prior (with point masses at 0 and 1): P(next success) ≈ 98%, which better matches my intuition about potentially deterministic processes.
  3. When making predictions far beyond our observed data, the likelihood of extreme underlying probabilities matters a lot. For example, after seeing 12/12 successes, how confident should we be in seeing a quadrillion more successes? Laplace's uniform prior assigns this very low probability, while my prior gives it significant weight.

Hinton legitimizes the AI safety movement

Hmm. He seems pretty periphery to the AI safety movement, especially compared with (e.g.) Yoshua Bengio.

Hey TurnTrout.

I've always thought of your shard theory as something like path-dependence? For example, a human is more excited about making plans with their friend if they're currently talking to their friend. You mentioned this in a talk as evidence that shard theory applies to humans. Basically, the shard "hang out with Alice" is weighted higher in contexts where Alice is nearby.

  • Let's say  is a policy with state space  and action space .
  • A "context" is a small moving window in the state-history, i.e. an element of  where  is a small positive integer.
  • A shard is something like , i.e. it evaluates actions given particular states.
  • The shards  are "activated" by contexts, i.e.  maps each context to the amount that shard  is activated by the context.
  • The total activation of , given a history , is given by the time-decay average of the activation across the contexts, i.e. 
  • The overall utility function  is the weighted average of the shards, i.e. 
  • Finally, the policy  will maximise the utility function, i.e. 

Is this what you had in mind?

Why do you care that Geoffrey Hinton worries about AI x-risk?

  1. Why do so many people in this community care that Hinton is worried about x-risk from AI?
  2. Do people mention Hinton because they think it’s persuasive to the public?
  3. Or persuasive to the elites?
  4. Or do they think that Hinton being worried about AI x-risk is strong evidence for AI x-risk?
  5. If so, why?
  6. Is it because he is so intelligent?
  7. Or because you think he has private information or intuitions?
  8. Do you think he has good arguments in favour of AI x-risk?
  9. Do you think he has a good understanding of the problem?
  10. Do you update more-so on Hinton’s views than on Yann LeCun’s?

I’m inspired to write this because Hinton and Hopfield were just announced as the winners of the Nobel Prize in Physics. But I’ve been confused about these questions ever since Hinton went public with his worries. These questions are sincere (i.e. non-rhetorical), and I'd appreciate help on any/all of them. The phenomenon I'm confused about includes the other “Godfathers of AI” here as well, though Hinton is the main example.

Personally, I’ve updated very little on either LeCun’s or Hinton’s views, and I’ve never mentioned either person in any object-level discussion about whether AI poses an x-risk. My current best guess is that people care about Hinton only because it helps with public/elite outreach. This explains why activists tend to care more about Geoffrey Hinton than researchers do.

Answer by Cleo Nardo20

This is a Trump/Kamala debate from two LW-ish perspectives: https://www.youtube.com/watch?v=hSrl1w41Gkk

Load More