You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Crude measures

10 Stuart_Armstrong 27 March 2015 03:44PM

A putative new idea for AI control; index here.

Partially inspired by as conversation with Daniel Dewey.

People often come up with a single great idea for AI, like "complexity" or "respect", that will supposedly solve the whole control problem in one swoop. Once you've done it a few times, it's generally trivially easy to start taking these ideas apart (first step: find a bad situation with high complexity/respect and a good situation with lower complexity/respect, make the bad very bad, and challenge on that). The general responses to these kinds of idea are listed here.

However, it seems to me that rather than constructing counterexamples each time, we should have a general category and slot these ideas into them. And not only have a general category with "why this can't work" attached to it, but "these are methods that can make it work better". Seeing the things needed to make their idea better can make people understand the problems, where simple counter-arguments cannot. And, possibly, if we improve the methods, one of these simple ideas may end up being implementable.

 

Crude measures

The category I'm proposing to define is that of "crude measures". Crude measures are methods that attempt to rely on non-fully-specified features of the world to ensure that an underdefined or underpowered solution does manage to solve the problem.

To illustrate, consider the problem of building an atomic bomb. The scientists that did it had a very detailed model of how nuclear physics worked, the properties of the various elements, and what would happen under certain circumstances. They ended up producing an atomic bomb.

The politicians who started the project knew none of that. They shovelled resources, money and administrators at scientists, and got the result they wanted - the Bomb - without ever understanding what really happened. Note that the politicians were successful, but it was a success that could only have been achieved at one particular point in history. Had they done exactly the same thing twenty years before, they would not have succeeded. Similarly, Nazi Germany tried a roughly similar approach to what the US did (on a smaller scale) and it went nowhere.

So I would define "shovel resources at atomic scientists to get a nuclear weapon" as a crude measure. It works, but it only works because there are other features of the environment that are making it work. In this case, the scientists themselves. However, certain social and human features about those scientists (which politicians are good at estimating) made it likely to work - or at least more likely to work than shovelling resources at peanut-farmers to build moon rockets.

In the case of AI, advocating for complexity is similarly a crude measure. If it works, it will work because of very contingent features about the environment, the AI design, the setup of the world etc..., not because "complexity" is intrinsically a solution to the FAI problem. And though we are confident that human politicians have some good enough idea about human motivations and culture that the Manhattan project had at least some chance of working... we don't have confidence that those suggesting crude measures for AI control have a good enough idea to make their idea works.

It should be evident that "crudeness" is on a sliding scale; I'd like to reserve the term for proposed solutions to the full FAI problem that do not in any way solve the deep questions about FAI.

 

More or less crude

The next question is, if we have a crude measure, how can we judge its chance of success? Or, if we can't even do that, can we at least improve the chances of it working?

The main problem is, of course, that of optimising. Either optimising in the sense of maximising the measure (maximum complexity!) or of choosing the measure that is most extreme fit to the definition (maximally narrow definition of complexity!). It seems we might be able to do something about this.

Let's start by having AI create sample a large class of utility functions. Require them to be around the same expected complexity as human values. Then we use our crude measure μ - for argument's sake, let's make it something like "approval by simulated (or hypothetical) humans, on a numerical scale". This is certainly a crude measure.

We can then rank all the utility functions u, using μ to measure the value of "create M(u), a u-maximising AI, with this utility function". Then, to avoid the problems with optimisation, we could select a certain threshold value and pick any u such that E(μ|M(u)) is just above the threshold.

How to pick this threshold? Well, we might have some principled arguments ("this is about as good a future as we'd expect, and this is about as good as we expect that these simulated humans would judge it, honestly, without being hacked").

One thing we might want to do is have multiple μ, and select things that score reasonably (but not excessively) on all of them. This is related to my idea that the best Turing test is one that the computer has not been trained or optimised on. Ideally, you'd want there to be some category of utilities "be genuinely friendly" that score higher than you'd expect on many diverse human-related μ (it may be better to randomly sample rather than fitting to precise criteria).

You could see this as saying that "programming an AI to preserve human happiness is insanely dangerous, but if you find an AI programmed to satisfice human preferences, and that other AI also happens to preserve human happiness (without knowing it would be tested on this preservation), then... it might be safer".

There are a few other thoughts we might have for trying to pick a safer u:

  • Properties of utilities under trade (are human-friendly functions more or less likely to be tradable with each other and with other utilities)?
  • If we change the definition of "human", this should have effects that seem reasonable for the change. Or some sort of "free will" approach: if we change human preferences, we want the outcome of u to change in ways comparable with that change.
  • Maybe also check whether there is a wide enough variety of future outcomes, that don't depend on the AI's choices (but on human choices - ideas from "detecting agents" may be relevant here).
  • Changing the observers from hypothetical to real (or making the creation of the AI contingent, or not, on the approval), should not change the expected outcome of u much.
  • Making sure that the utility u can be used to successfully model humans (therefore properly reflects the information inside humans).
  • Make sure that u is stable to general noise (hence not over-optimised). Stability can be measured as changes in E(μ|M(u)), E(u|M(u)), E(v|M(u)) for generic v, and other means.
  • Make sure that u is unstable to "nasty" noise (eg reversing human pain and pleasure).
  • All utilities in a certain class - the human-friendly class, hopefully - should score highly under each other (E(u|M(u)) not too far off from E(u|M(v))), while the over-optimised solutions - those scoring highly under some μ - must not score high under the class of human-friendly utilities.

This is just a first stab at it. It does seem to me that we should be able to abstractly characterise the properties we want from a friendly utility function, which, combined with crude measures, might actually allow us to select one without fully defining it. Any thoughts?

And with that, the various results of my AI retreat are available to all.

Rough calculations: Fermi and the art of guessing

9 XiXiDu 08 September 2011 10:39AM

Fermi problem

In science, particularly in physics or engineering education, a Fermi problem, Fermi question, or Fermi estimate is an estimation problem designed to teach dimensional analysis, approximation, and the importance of clearly identifying one's assumptions. Named after physicist Enrico Fermi, such problems typically involve making justified guesses about quantities that seem impossible to compute given limited available information.

Fermi was known for his ability to make good approximate calculations with little or no actual data, hence the name. One example is his estimate of the strength of the atomic bomb detonated at the Trinity test, based on the distance travelled by pieces of paper dropped from his hand during the blast. Fermi's estimate of 10 kilotons of TNT was remarkably close to the now-accepted value of around 20 kilotons, a difference of less than one order of magnitude.

[...]

Scientists often look for Fermi estimates of the answer to a problem before turning to more sophisticated methods to calculate a precise answer. This provides a useful check on the results: where the complexity of a precise calculation might obscure a large error, the simplicity of Fermi calculations makes them far less susceptible to such mistakes. (Performing the Fermi calculation first is preferable because the intermediate estimates might otherwise be biased by knowledge of the calculated answer.)

Fermi estimates are also useful in approaching problems where the optimal choice of calculation method depends on the expected size of the answer. For instance, a Fermi estimate might indicate whether the internal stresses of a structure are low enough that it can be accurately described by linear elasticity; or if the estimate already bears significant relationship in scale relative to some other value, for example, if a structure will be over-engineered to withstand loads several times greater than the estimate.

Although Fermi calculations are often not accurate, as there may be many problems with their assumptions, this sort of analysis does tell us what to look for to get a better answer.

Link: en.wikipedia.org/wiki/Fermi_problem

Fermi Problem: Power developed at the eruption of the Puyehue-Cordón Caulle volcanic system in June 2011

Enrico Fermi was renowned for his ability to make reliable estimates. But how well can you do on a modern estimation problem?

[...]

Hernan Asory and Arturo Lopez Davalos at the Comision Nacional De Energia Atomica in Argentina, have set themselves (and their students) a similar estimation task. The problem is to estimate the energy release as well as the volume and mass of sand ejected during the eruption of the Puyehue-Cordon Caulle volcano in Chile on 4 July.

You can look up the calculations and the assumption they make in the paper. You might want to try the estimate yourself.

Link: technologyreview.com/blog/arxiv/27140/

If you want to get better at doing rough mental calulcations, the following books might provide some valuable heuristics:

Street-Fighting Mathematics: The Art of Educated Guessing and Opportunistic Problem Solving

Time for some quick arithmetic: Is 3600 x 4.4 x 104 x 32 larger or smaller than 3 x 109?

Finding the right answer, says Sanjoy Mahajan, associate director for teaching initiatives at MIT’s Teaching and Learning Laboratory, does not require crafting a long, tedious calculation. Instead, the key to solving this problem — and many others — lies in having informal tools on hand that let us attack the problem. Though the result may not be perfectly precise, he believes, intuitive mathematical reasoning is often sufficient for our needs.

“That’s not to say exact answers aren’t useful,” says Mahajan, “but if looking for them is your only approach, you may never get any answer at all. Sometimes it’s better to start with something rough.”

[...]

Mahajan believes we should learn practical math tools and understand why they work.

[...]

Mahajan’s unconventional teaching practices stem from his focus, as a physicist, on finding quick, practical answers. Then again, perhaps rolling up one’s sleeves and hacking through problems is how everyone works. “There is a culture in pure mathematics that emphasizes rigor and careful proofs,” says Strogatz. “Yet all practicing mathematicians know we also use our intuitions, then we clean our answers up.”

[...]

So let’s get back to the initial question (the numbers relate to the storage capacity of a data CD-ROM). The key to solving it, says Mahajan, is to recognize that the components of the first, messy-looking number can be broken into powers of 10. Then we can temporarily set aside these powers of 10 — Mahajan calls this “taking out the big part,” one of his tenets of problem-solving — while handling the smaller, simpler multiplication problem.

Okay: Picture the number as (3.6 x 103) x (4.4 x 104) x (3.2 x 101). To multiply powers of 10 in practice, we add them, here producing 108. Leave that aside momentarily and multiply 3.6 x 4.4 x 3.2. The answer is about 50, or 5.0 x 101. Combine that with 108, and we have our answer: Roughly 5.0 x 109, which is bigger than 3 x 109. Street-fighting math, and we barely got a scratch. 

Link: web.mit.edu/newsoffice/2010/street-fight-0329.html

Secrets of Mental Math: The Mathemagician's Guide to Lightning Calculation and Amazing Math Tricks

Yes, even you can learn to do seemingly complex equations in your head; all you need to learn are a few tricks. You’ll be able to quickly multiply and divide triple digits, compute with fractions, and determine squares, cubes, and roots without blinking an eye. No matter what your age or current math ability, Secrets of Mental Math will allow you to perform fantastic feats of the mind effortlessly. This is the math they never taught you in school.