All of Joern Stoehler's Comments + Replies

Here's my best quickly-written story of why I expect AGI to understand human goals, but not share it. The intended audience is mostly myself, so I use personal jargon.

What a system at AGI level wants depends a lot on how it coheres its different goal-like instincts during self-reflection. Introspection & (less strongly) neuroscience tells us humans start with very similar-to-each-other internal impulses and self-reflection processes and also end up with similar goals. AI has a quite different set of genes/architecture and environment/training data, and... (read more)

1Bogdan Ionut Cirstea
Arguably, though, we do have the beginnings of theories of metacognition, see e.g. A Theoretical Understanding of Self-Correction through In-context Alignment.

Imo mildly misleading. I expect large parts of the 85% to just not have read their mails, or to have been too busy to answer what may look to them like a mildly useful survey.

4ChristianKl
I agree that it isn't perfect, but it's important information that should not be left out. 

Why are you concerned in that scenario? Any more concrete details on what you expect to go wrong?

I don't think there's a cure-it-all solution, except "don't build it", and even that might be counterproductive in some edge cases.

3lemonhope
Very broad concerns but two totally random example risks: * During training, the model hacks out of my cluster and sends a copy of itself or a computer virus elsewhere on the internet. Later on, chaos ensues. * AI lawyer has me assassinated and impersonates me to steal my company.

Addendum: I just learned that dipole-dipole interaction are classified as a type of vdW force in chemistry. This is different from solid state physics, where vdW is reserved for the quantum mechanical effect of induced dipole - induced dipole interaction.

So it's indeed vdW forces that keep a protein in its shape. (This might also explain why OP found different oom for their strength?)

When discussing the stability of proteins, I mostly think of their folding, not whether their primary or secondary structure breaks.

The free energy difference between folded and unfolded states of a typical protein is allegedly (not an expert!) in the range 21-63 kJ/mol. So way less than a single covalent bond.

I have a friend who does his physics PhD on protein folding, and from what I remember he mostly simulates the surface charge of proteins, i.e. cares about dipole-dipole interactions (the weaker version of ionic bonds) and interaction effects with the... (read more)

7Joern Stoehler
Addendum: I just learned that dipole-dipole interaction are classified as a type of vdW force in chemistry. This is different from solid state physics, where vdW is reserved for the quantum mechanical effect of induced dipole - induced dipole interaction. So it's indeed vdW forces that keep a protein in its shape. (This might also explain why OP found different oom for their strength?)

See Table 2 in https://www.emilkirkegaard.com/p/skill-vs-luck-in-games for

[...] the corresponding winning probability of a player who is exactly one standard deviation better than his opponent. We refer to this probability as p^sd . For comparison, we also provide the winning probablities when a 99% percentile player is matched against a 1% percentile player, which we call p99 1 .

Go & Chess (p^sd=83.3,72.9) are notably above Backgammon (p^sd=53.6%)

2Joe Collman
Oh that's cool - nice that someone's run the numbers on this. I'm actually surprised quite how close-to-50% both backgammon and poker are.

I expect that other voters correlate with my choice, and so I am not just deciding 1 vote, but actually a significant fraction of votes.

If the number of uncorrelated blue voters, plus the number of people who vote identical to me exceeds 50%, then I can save the uncorrelated blue voters.

More formally: let R, B, C denote the fraction of uncorrelated red, uncorrelated blue and correlated voters that will vote the same as you do. Let S be how large a fraction of people you'd let die in order to save yourself (i.e. some measure of selfishness).

Then choosing bl... (read more)

2Ege Erdil
If people vote as if their individual vote determines the vote of a non-negligible fraction of the voter pool, then you only need ν=O(1/N) (averaged over the whole population, so the value of the entire population is νN=O(1) instead of ν=O(1), which seems much more realistic. So voting blue can make sense for a sufficiently large coalition of "ordinary altruists" with ν≫1/N who are able to pre-commit to their vote and think people outside the coalition might vote blue by mistake etc. rather than the "extraordinary altruists" we need in the original situation with ν=O(1). Ditto if you're using a decision theory where it makes sense to suppose such a commitment already exists when making your decision.

Thanks for this concise post :) If we set I actually worry that agent will not do nothing, but instead prevent us from doing anything that reduces . Imo it is not easy to formalize such that we no longer want to reduce ourselves. For example, we may want to glue a vase onto a fixed location inside our house, preventing it from accidentally falling and breaking. This however also prevents us from constantly moving the vase around the house, or from breaking it and scattering the pieces for maximum entropy.

Building an aligned superintelli... (read more)

[This comment is no longer endorsed by its author]Reply
2Logan Zoellner
F(a) is the set of futures reachable by agent a at some intial t=0.  F_b(a) is the set of futures reachable at time t=0 by  agent a if agent b exists.  There's no way for F_b(a) > F(a), since creating agent b is under our assumptions one of the things agent a can do.