Fictional evidence that this isn't obvious: in Blindsight, which I otherwise thought was a reasonably smart book (for example, it goes out of its way to make its aliens genuinely alien), the protagonists allow an unknown alien intelligence to communicate with them using a human voice. Armed with the idea of AI-boxing, this seemed so stupid to me that it actually broke my suspension of disbelief, but this isn't an obvious thought to have.
Spoiler: Gura ntnva, gur nyvra qbrf nccneragyl znantr gb chg n onpxqbbe va bar bs gur uhzna'f oenvaf.
I think this is making a five-inch fence half an inch higher. It's just not relevant on the scale of an agent to which a human is a causal system made of brain areas and a group of humans is just another causal system made of several interacting copies of those brain areas.
I agree that the AI you envision would be dangerously likely to escape a "competent" box too; and in any case, even if you manage to keep the AI in the box, attempts to actually use any advice it gives are extremely dangerous.
That said, I think your "half an inch" is off by multiple orders of magnitude.
It is, after all, much harder to convince a group of mutually-suspicious humans than to convince one lone person.
That sounds right. Would you have evidence to back up the intuition? (This knowledge would also be useful for marketing and other present life persuasion purposes.)
(
TL;DR: Mo' people - mo' problems?
I can think of effects that could theoretically make it easier to convince a group:
- For some reason, Boxy might be better skilled at manipulating social/group dynamics than at influencing a lone wolf.
- More people make the system more complex. Complexity generally increases the likelihood of security holes.
- Every extra person makes another target and will bring new soft spots to the table, which the AI could pounce on.
- Supposing that the most competent person available would get the position of the lone Gatekeeper, the average competence would fall when adding more staff.
- Then the machine could go for an inductive approach - convince the weakest link first, proceed from there with this human ally on her side.
- Persuaded humans could principally be employed as actuators, e.g. for pressuring, even attacking opposing group members.
- The lone wolf could be strong against a computer but weak against fellow humans.
- Surely you will say "But any communication with the terminal will be supervised by everyone!" But that does not strictly make such influence impossible as far as I can tell.
- Also the superintelligence could get creative, e.g. instill a discussion among the colleagues so that most of them are distracted.
- Persuaded humans could principally be employed as actuators, e.g. for pressuring, even attacking opposing group members.
- Then the machine could go for an inductive approach - convince the weakest link first, proceed from there with this human ally on her side.
(You could take preemptive measures against these worries, but Boxy might find security holes in every 'firewall' you come up with - an arms race we could win?)
)
My comment was mostly inspired by (known effective) real-world examples. Note that relieving anyone who shows signs of being persuaded is a de-emphasized but vital part of this policy, as is carefully vetting people before trusting them.
Actually implementing a "N people at a time" rule can be done using locks, guards and/or cryptography (note that many such algorithms are provably secure against an adversary with unlimited computing power, "information theoretic security").
Note that the AI box setting is not one which security-minded people would consider "competent"; once you're convinced that AI is dangerous and persuasive, the minimum safeguard would be to require multiple people to be present when interacting with the box, and to only allow release with the assent of a significant number of people.
It is, after all, much harder to convince a group of mutually-suspicious humans than to convince one lone person.
(This is not a knock on EY's experiment, which does indeed test a level of security that really was proposed by several real-world people; it is a knock on their security systems.)
[META] Why is this so heavily upvoted? Does that indicate actual value to LW, or just a majority of lurking septemberites captivated by cute pixel art?
It was just hacked out in a couple of hours to organize my thoughts for the meetup. It has little justification for anything, very little coherent overarching structure, and it's not even really serious. It's only 90% true, with many bugs. Very much a worse-is-better sort of post.
Now it's promoted with 50-something upvotes. I notice that I would not predict this, and feel the need to update.
What should I (we) learn from this?
Am I underestimating the value of a given post-idea? (i.e. should we all err on the side of writing more?)
Are structure, seriousness, watertightness and such are trumped by fun and clarity? Is it safe to run with this? This could save a lot of work.
Are people just really interested in morality, or re-framing of problems, or well-linked integration posts?
For me, high (insight + fun) per (time + effort).
Haiti today is a situation that makes my moral intuition throw error codes. Population density is three times that of Cuba. Should we be sending aid? It would be kinder to send helicopter gunships and carry out a cull. Cut the population back to one tenth of its current level, then build paradise. My rival moral intuition is that culling humans is always wrong.
Trying to stay concrete and present, should I restrict my charitable giving to helping countries make the demographic transition? Within a fixed aid budget one can choose package A = (save one child, provide education, provide entry into global economy; 30 years later the child, now an adult, feeds his own family and has some money left over to help others) package B = (save four children; that's it, money all used up, thirty years later there are 16 children needing saving and its not going to happen). Concrete choice of A over B: ignore Haiti and send money to Karuna trust to fund education for untouchables in India, preferring to raise a few children out of poverty by letting other children die.
(Are you sure you want this posted under what appears to be a real name?)
Assume the subject of reprogramming is an existing human being, otherwise minimally altered by this reprogramming, i.e., we don't do anything that isn't necessary to switch their motivation to paperclips. So unless you do something gratuitiously non-minimal like moving the whole decision-action system out of the range of introspective modeling, or cutting way down on the detail level of introspective modeling, or changing the empathic architecture for modeling hypothetical selves, the new person will experience themselves as having ineffable 'qualia' associated with the motivation to produce paperclips.
The only way to make it seem to them like their motivational quales hadn't changed over time would be to mess with the encoding of their previous memories of motivation, presumably in a structure-destroying way since the stored data and their introspectively exposed surfaces will not be naturally isomorphic. If you carry out the change to paperclip-motivation in the obvious way, cognitive comparisions of the retrieved memories to current thoughts will return 'unequal ineffable quales', and if the memories are visualized in different modalities from current thoughts, 'incomparable ineffable quales'.
Doing-what-leads-to-paperclips will also be a much simpler 'quale', both from the outside perspective looking at the complexity of cognitive data, and in terms of the internal experience of complexity - unless you pack an awful lot of detail into the question of what constitutes a more preferred paperclip. Otherwise, compared to the old days when you thought about justice and fairness, introspection will show that less questioning and uncertainty is involved, and that there are fewer points of variation among the motivational thought-quales being considered.
I suppose you could put in some extra work to make the previous motivations map in cognitively comparable ways along as many joints as possible, and try to edit previous memories without destroying their structure so that they can be visualized in a least common modality with current experiences. But even if you did, memories of the previous quales for rightness-motivation would appear as different in retrospect when compared to current quales for paperclip-motivation as a memory of a 3D greyscale forest landscape vs. a current experience of a 2D red-and-green fractal, even if they're both articulated in the visual sensory modality and your modal workspace allows you to search for, focus on, and compare commonly 'experienced' shapes between them.
I have no problem with this passage. But it does not seem obviously impossible to create a device that stimulates that-which-feels-rightness proportionally to (its estimate of) the clippiness of the universe - it's just a very peculiar kind of wireheading.
As you point out, it'd be obvious, on reflection, that one's sense of rightness has changed; but that doesn't necessarily make it a different qualia, any more than having your eyes opened to the suffering of (group) changes your experience of (in)justice qua (in)justice.
Yes, thank you. As far as I can tell, (1) and (2) are closest to the meaning I inferred. I understand that we can consider them separately, but IMO (2) implies (1).
If an agent seeks to maximize its sense of well-being (as it would reasonable to assume humans do), then we would expect the agent to take actions which it believes will achieve this effect. Its beliefs could be wrong, of course, but since the agent is descended from a long line of evolutionarily successful agents, we can expect it to be right a lot more often that it's wrong.
Thus, if the agent's sense of well-being can be accurately predicted as being proportional to its status (regardless of whether the agent itself is aware of this or not), then it would be reasonable to assume that the agent will take actions that, on average, lead to raising its status.
Lowenheim-Skolem is going to give you trouble, unless "coherently-thinkable" is meant of as a subtantive restriction. You might be able to enumerate finitely-axiomatisable models, up to isomorphism, up to aleph-w, if you limit yourself to k-categorical theories, for k < aleph-w, though. Then you could use Will's strategy and enumerate axioms.
Edit: I realised I'm being pointlessly obscure.
The Upwards Lowenheim-Skolem means that, for every set of axioms in your list, you'll have multiple (non-isomorphic) models.
You might avoid this if "coherantly thinkable" was taken to mean "of small cardinality".
If you didn't enjoy this restriction, you could, for any given set of axioms, enumerate the k-categorical models of that set of axioms - or at least enumerate the models of whose cardinality can be expressed as 2^2^...2^w, for some finite number of 2's. This is because k-categoriciticy means you'll only have one model of each cardinality, up to isomorphism.
So then you just enumerate all the possible countable combinations of axioms, and you have an enumeration of all countably axiomatisable, k-categorical, models.
I don't think it's unfair to put some restrictions on the universes you want to describe. Sure, reality could be arbitrarily weird - but if the universe cannot even be approximated within a number of bits much larger than the number of neurons (or even atoms, quarks, whatever), "rationality" has lost anyway.
(The obvious counterexample is that previous generations would have considered different classes of universes unthinkable in this fashion.)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
I don't think the last one is that useful either. Really, anything that can fit in a twitter is unlikely to be useful. And if someone wants to make useful advice that does, they shouldn't be giving generalised messages that can be applied anywhere, but rather highly specific advice with a narrow target audience.
patio11 is something of a "marketing engineer", and his target audience is young software enthusiasts (Hacker News). What makes you think that this isn't pretty specific advice for a fairly narrow audience?