Is the harm that the average ethical review board prevents less than the harm that they cause by preventing research from happening? Are principles such as requiring informed consent from all research participants justifiable from an utilitarian perspective?
Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
A lot of my work involves tweaking the utility or probability of an agent to make it believe - or act as if it believed - impossible or almost impossible events. But we have to be careful about this; an agent that believes the impossible may not be so different from one that doesn't.
Consider for instance an agent that assigns a prior probability of zero to JFK ever having been assassinated. No matter what evidence you present to it, it will go on disbelieving the "non-zero gunmen theory".
Initially, the agent will behave very unusually. If it was in charge of JFK's security in Dallas before the shooting, it would have sent all secret service agents home, because no assassination could happen. Immediately after the assassination, it would have disbelieved everything. The films would have been faked or misinterpreted; the witnesses, deluded; the dead body of the president, that of twin or an actor. It would have had huge problems with the aftermath, trying to reject all the evidence of death, seeing a vast conspiracy to hide the truth of JFK's non-death, including the many other conspiracy theories that must be false flags, because they all agree with the wrong statement that the president was actually assassinated.
But as time went on, the agent's behaviour would start to become more and more normal. It would realise the conspiracy was incredibly thorough in its faking of the evidence. All avenues it pursued to expose them would come to naught. It would stop expecting people to come forward and confess the joke, it would stop expecting to find radical new evidence overturning the accepted narrative. After a while, it would start to expect the next new piece of evidence to be in favour of the assassination idea - because if a conspiracy has been faking things this well so far, then they should continue to do so in the future. Though it cannot change its view of the assassination, its expectation for observations converge towards the norm.
If it does a really thorough investigation, it might stop believing in a conspiracy at all. At some point, the probability of a miracle will start to become more likely than a perfect but undetectable conspiracy. It is very unlikely that Lee Harvey Oswald shot at JFK, missed, and the president's head exploded simultaneously for unrelated natural causes. But after a while, such a miraculous explanation will start to become more likely than anything else the agent can consider. This explanation opens the possibility of miracles; but again, if the agent is very thorough, it will fail to find evidence of other miracles, and will probably settle on "an unrepeatable miracle caused JFK's death in a way that is physically undetectable".
But then note that such an agent will have a probability distribution over future events that is almost indistinguishable from a normal agent that just believes the standard story of JFK being assassinated. The zero-prior has been negated, not in theory but in practice.
How to do proper probability manipulation
This section is still somewhat a work in progress.
So the agent believes one false fact about the world, but its expectation is otherwise normal. This can be both desirable and undesirable. The negative is if we try and control the agent forever by giving it a false fact.
To see the positive, ask why would we want an agent to believe impossible things in the first place? Well, one example was an Oracle design where the Oracle didn't believe its output message would ever be read. Here we wanted the Oracle to believe the message wouldn't be read, but not believe anything else too weird about the world.
In terms of causality, if X designates the message being read at time t, and B and A are event before and after t, respectively, we want P(B|X)≈P(B) (probabilities about current facts in the world shouldn't change much) while P(A|X)≠P(A) is fine and often expected (the future should be different if the message is read or not).
In the JFK example, the agent eventually concluded "a miracle happened". I'll call this miracle a scrambling point. It's kind of a breakdown in causality: two futures are merged into one, given two different pasts. The two pasts are "JFK was assassinated" and "JFK wasn't assassinated", and their common scrambled future is "everything appears as if JFK was assassinated". The non-assassination belief has shifted the past but not the future.
For the Oracle, we want to do the reverse: we want the non-reading belief to shift the future but not the past. However, unlike the JFK assassination, we can try and build the scrambling point. That's why I always talk about messages going down noisy wires, or specific quantum events, or chaotic processes. If the past goes through a truly stochastic event (it doesn't matter whether there is true randomness or just that the agent can't figure out the consequences), we can get what we want.
The Oracle idea will go wrong if the Oracle conclude that non-reading must imply something is different about the past (maybe it can see through chaos in ways we thought it couldn't), just as the JFK assassination denier will continue to be crazy if can't find a route to reach "everything appears as if JFK was assassinated".
But there is a break in the symmetry: the JFK assassination denier will eventually reach that point as long as the world is complex and stochastic enough. While the Oracle requires that the future probabilities be the same in all (realistic) past universes.
Now, once the Oracle's message has been read, the Oracle will find itself in the same situation as the other agent: believing an impossible thing. For Oracles, we can simply reset them. Other agents might have to behave more like the JFK assassination disbeliever. Though if we're careful, we can quantify things more precisely, as I attempted to do here.
A paper published in Astrobiology: A New Empirical Constraint on the Prevalence of Technological Species in the Universe (PDF), A. Frank and W.T. Sullivan.
From the abstract:
Recent advances in exoplanet studies provide strong constraints on all astrophysical terms in the Drake equation. [...] We find that as long as the probability that a habitable zone planet develops a technological species is larger than ~ 10-24, humanity is not the only time technological intelligence has evolved.
They say we now know with reasonable certainty the total number of stars ever to exist (in the observable universe), and the average number of planets in the habitable zone. But we still don't know the probabilities of life, intelligence, and technology arising. They call this cumulative unknown factor fbt.
Their result: for technological civilization to arise no more than once, with probability 0.01, in the lifetime of the observable universe, fbt should be no greater than ~ 2.5 x 10-24.
It's convenient that they calculate the chance technological civilization ever arose, rather than the chance one exists now. This is just the number we need to estimate the likelihood of a Great Filter.
They state their result as "[if we set fbt ≤ 2.5 x 10-24, then] at in a statistical sense were we to rerun the history of the Universe 100 times, only once would a lone technological species occur". But I don't know what rerunning the Universe means. I also can't formulate this as saying "if we hadn't already observed the Universe to be apparently empty of life, we would expect it to contain or to have once contained life with a probability of 1024", because that would ignore the chance that another civilization (if it counterfactually existed) would have affected or prevented the rise of life on Earth. Can someone help reformulate this?
I don't know if their modern values for star and planet formation have been used in previous discussions of the Fermi paradox or the Great Filter. (The papers they cite for their values date from 2012, 2013 and 2015.) I also don't know if these values should be trusted, or what concrete values had been used previously. People on top of the Great Filter discussion probably already updated when the astronomical data came in.
A putative new idea for AI control; index here.
Corrigibility was an attempt to allow an AI to safely change its values, without seeking to provoke or avoid the change. The idea is that, when the AI's utility changes from u to v at time t, it maximises a meta-utility U such that
- U≤t = u
- U>t = v + E(u|u→u) - E(v|u→v).
Here u→u designates the event that u remains unchanged, while u→v designates the change.
As has been shown, adding those expectation terms means the AI will not seek to provoke or resist such a utility change - it is indifferent to the change. Those expectation terms can be designated compensatory rewards.
What it is not indifferent to, though, is changes to E. If the AI knows that change u→v is coming, then it has incentives to change E to E' with a high E'(u|u→u) and a low E'(v|u→v). Its ability to rewire is limited by the fact that E' will serve as its future expectation estimator, so it won't mangle it completely, but it certainly can twist it a bit.
Notice that the AI is now no longer indifferent to the change of utility. It can benefit from rewiring E this way only if a change is coming up. Notice that E' probably can't rewire itself any more, so the rewired AI will be indifferent, but not before rewiring.
To solve this problem, we need to turn to corrigibility a second time. Let ρ be the AI's current expectation/probability estimator. Let ρ' be one that it can conceivably change into. Let C be a random variable designating the sum total of all compensatory rewards. Now, to guarantee that C is finite would require bounds on all the utility functions and some discount rate. In practice, if often will be finite. That's because E(C)=E(u0|u0→u0)-E(un |un-1→un, H), where u0 is the AI's first utility, un its last, and H intermediate history. So some milder restrictions on the ui should suffice to bound C.
Then, whenever the AI takes an action that changes ρ to ρ', it gets the special meta-compensatory reward
This prevents it from deriving any advantage to C via changes to ρ. Or, more precisely, it prevents it from deriving any expected advantage, that it can figure out using its current ρ.
Note that it doesn't prevent improvements to ρ in general, it just removes the incentives to change it to affect compensatory rewards.
Since any action a might have an indirect effect on ρ, the "utility maximising" for a given u must be changed to:
- Eρ(u|a) + Σρ' Pρ(ρ→ρ'|a) (Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ')),
where Pρ is the probability estimate corresponding to ρ; the probability term can be rewritten as Eρ(Iρ→ρ') for Iρ→ρ' the indicator function for ρ→ρ'. In fact the whole line above can be rewritten as
- Eρ(u|a) + Eρ(Eρ(C|ρ→ρ)-Eρ(C|ρ→ρ') | a).
For this to work, Eρ needs to be able to say sensible things about itself, and also about Eρ', which is used to estimate C if ρ→ρ'.
If we compare this with various ways of factoring out variables, we can see that it's a case where we have a clear default, ρ, and are estimating deviations from that.
Trying posting here since I don't see how to post to https://agentfoundations.org/.
Recently sphere packing was solved in dimension 24, and I read about it on Quanta Magazine. I found the following part of the article (paraphrased) fascinating.
Cohn and Kumar found that the best possible sphere packings in dimensions 24 could be at most 0.0000000000000000000000000001 percent denser than the Leech lattice. Given this ridiculously close estimate, it seemed clear that the Leech lattice must be the best sphere packings in dimension 24.
This is clearly a kind of reasoning under logical uncertainty, and seems very reasonable. Most humans probably would reason similarly, even when they have no idea what the Leech lattice is.
Is this kind of reasoning covered by already known desiderata for logical uncertainty?
Im trying to find a best place to start learning the field. I have no special math background. Im very eager to learn. Thanks alot!
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
New meetups (or meetups with a hiatus of more than a year) are happening in:
Irregularly scheduled Less Wrong meetups are taking place in:
- Baltimore / UMBC Weekly Meetup: How To Actually Change Your Mind (part 3): 01 May 2016 03:00PM
- European Community Weekend: 02 September 2016 03:35PM
- San Francisco Meetup: Projects: 01 May 2016 06:15PM
The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:
- Raleigh, NC (RTLW) Discussion Meetup: 07 May 2022 07:30PM
- Sydney Rationality Dojo - May: 01 May 2016 04:00PM
- Sydney Rationality Dojo - July: 03 July 2016 04:00PM
- Washington, D.C.: Utopias: 01 May 2016 03:30PM
Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Canberra, Columbus, Denver, Kraków, London, Madison WI, Melbourne, Moscow, Mountain View, New Hampshire, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers and a Slack channel for daily discussion and online meetups on Sunday night US time.
A friend recently shared an image of Lincoln with the quote, "Better to remain silent and be thought a fool than speak and remove all doubt."
Correcting that idea, I replied with the following: "Speak! Reveal your foolishness, and open yourself so that others may enlighten you and you can learn. Fear the false mantle of silence-as-wisdom; better to briefly be the vocal fool than forever the silent fool."
The experience led me to thinking that it might be fun, cathartic, andor a good mental exercise/reminder to translate our culture's more irrational memes into a more presentable package.
Post your own examples if you like, and if I think of/see more I'll post here.