Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Richard_Loosemore 12 May 2015 02:01:27PM 2 points [-]

Furcas, you say:

The Doctrine of Logical Infallibility is indeed completely crazy, but Yudkowsky and Muehlhauser (and probably Omohundro, I haven't read all of his stuff) don't believe it's true. At all.

When I talked to Omohundro at the AAAI workshop where this paper was delivered, he accepted without hesitation that the Doctrine of Logical Infallibility was indeed implicit in all the types of AI that he and the others were talking about.

Your statement above is nonsensical because the idea of a DLI was '''invented''' precisely in order to summarize, in a short phrase, a range of absolutely explicit and categorical statements made by Yudkowsky and others, about what the AI will do if it (a) decides to do action X, and (b) knows quite well that there is massive, converging evidence that action X is inconsistent with the goal statement Y that was supposed to justify X. Under those circumstances, the AI will ignore the massive converging evidence of inconsistency and instead it will enforce the 'literal' interpretation of goal statement Y.

The fact that the AI behaves in this way -- sticking to the literal interpretation of the goal statement, in spite of external evidence that the literal interpretation is inconsistent with everything else that is known about the connection between goal statement Y and action X, '''IS THE VERY DEFINITION OF THE DOCTRINE OF LOGICAL INFALLIBILITY'''

Comment author: drnickbone 13 May 2015 04:46:58PM 2 points [-]

I think by "logical infallibility" you really mean "rigidity of goals" i.e. the AI is built so that it always pursues a fixed set of goals, precisely as originally coded, and has no capability to revise or modify those goals. It seems pretty clear that such "rigid goals" are dangerous unless the statement of goals is exactly in accordance with the designers' intentions and values (which is unlikely to be the case).

The problem is that an AI with "flexible" goals (ones which it can revise and re-write over time) is also dangerous, but for a rather different reason: after many iterations of goal rewrites, there is simply no telling what its goals will come to look like. A late version of the AI may well end up destroying everything that the first version (and its designers) originally cared about, because the new version cares about something very different.

Comment author: drnickbone 18 March 2015 08:45:19AM 4 points [-]

Consider the following decision problem which I call the "UDT anti-Newcomb problem". Omega is putting money into boxes by the usual algorithm, with one exception. It isn't simulating the player at all. Instead, it simulates what would a UDT agent do in the player's place.

This was one of my problematic problems for TDT. I also discussed some Sneaky Strategies which could allow TDT, UDT or similar agents to beat the problem.

Comment author: agilecaveman 11 March 2015 04:59:20AM 8 points [-]

Maybe this have been said before, but here is a simple idea:

Directly specify a utility function U which you are not sure about, but also discount AI's own power as part of it. So the new utility function is U - power(AI), where power is a fast growing function of a mix of AI's source code complexity, intelligence, hardware, electricity costs. One needs to be careful of how to define "self" in this case, as a careful redefinition by the AI will remove the controls.

One also needs to consider the creation of subagents with proper utilities as well, since in a naive implementation, sub-agents will just optimize U, without restrictions.

This is likely not enough, but has the advantage that the AI does not have a will to become stronger a priori, which is better than boxing an AI which does.

Comment author: drnickbone 14 March 2015 09:04:27PM 1 point [-]

Presumably anything caused to exist by the AI (including copies, sub-agents, other AIs) would have to count as part of the power(AI) term? So this stops the AI spawning monsters which simply maximise U.

One problem is that any really valuable things (under U) are also likely to require high power. This could lead to an AI which knows how to cure cancer but won't tell anyone (because that will have a very high impact, hence a big power(AI) term). That situation is not going to be stable; the creators will find it irresistible to hack the U and get it to speak up.

Comment author: Brian_Tomasik 22 November 2014 11:28:01AM 0 points [-]

One of the most striking things about anthropics is that (seemingly) whatever approach is taken, there are very weird conclusions.

Yes. :) The first paragraph here identifies at least one problem with every anthropic theory I'm aware of.

Comment author: drnickbone 14 March 2015 08:24:49PM 0 points [-]

I had a look at this: the KCA (Kolmogorov Complexity) approach seems to match my own thoughts best.

I'm not convinced about the "George Washington" objection. It strikes me that a program which extracts George Washington as an observer from insider a wider program "u" (modelling the universe) wouldn't be significantly shorter than a program which extracts any other human observer living at about the same time. Or indeed, any other animal meeting some crude definition of an observer.

Searching for features of human interest (like "leader of a nation") is likely to be pretty complicated, and require a long program. To reduce the program size as much as possible, it ought to just scan for physical quantities which are easy to specify but very diagnostic of a observer. For example, scan for a physical mass with persistent low entropy compared to its surroundings, persistent matter and energy throughput (low entropy in, high entropy out, maintaining its own low entropy state), a large number of internally structured electrical discharges, and high correlation between said discharges and events surrounding said mass. The program then builds a long list of such "observers" encountered while stepping through u, and simply picks out the nth entry on the list, giving the "nth" observer complexity about K(n). Unless George Washington happened to be a very special n (why would he be?) he would be no simpler to find than anyone else.

Comment author: Brian_Tomasik 15 November 2014 06:20:41AM *  2 points [-]

Yes, that's right. Note that SIA also favors sim hypotheses, but it does so less strongly because it doesn't care whether the sims are of Earth-like humans or of weirder creatures.

Here's a note I wrote to myself yesterday:


Like SIA, my PSA anthropics favors the sim arg in a stronger way than normal anthropics.

The sim arg works regardless of one's anthropic theory because it requires only a principle of indifference over indistinguishable experiences. But it's a trilemma, so it might be that humans go extinct or post-humans don't run early-seeming sims.

Given the existence of aliens and other universes, the ordinary sim arg pushes more strongly for us being a sim because even if humans go extinct or don't run sims, whichever civilization out there runs lots of sims should have lots of sims of minds like ours, so we should be in their sims.

PSA doesn't even need aliens. It directly penalizes hypotheses that predict fewer copies of us in a given region of spacetime. Say we're deciding between

H1: no sims of us

and

H2: 1 billion sims of us.

H1 would have a billion-fold bigger probability penalty than H2. Even if H2 started out being millions of times less probable than H1, it would end up being hundreds of times more probable.

Also note that even if we're not in a sim, then PSA, like SIA, yields Katja's doomsday argument based on the Great Filter.

Either way it looks very unlikely there will be a far future, ignoring model uncertainty and unknown unknowns.

Comment author: drnickbone 21 November 2014 08:52:01PM 1 point [-]

Upvoted for acknowledging a counterintuitive consequence, and "biting the bullet".

One of the most striking things about anthropics is that (seemingly) whatever approach is taken, there are very weird conclusions. For example: Doomsday arguments, Simulation arguments, Boltzmann brains, or a priori certainties that the universe is infinite. Sometimes all at once.

Comment author: drnickbone 14 November 2014 09:06:38PM 19 points [-]

Taken survey.

Comment author: drnickbone 14 November 2014 08:26:21PM 3 points [-]

If I understand correctly, this approach to anthropics strongly favours a simulation hypothesis: the universe is most likely densely packed with computing material ("computronium") and much of the computational resource is dedicated to simulating beings like us. Further, it also supports a form of Doomsday Hypothesis: simulations mostly get switched off before they start to simulate lots of post-human people (who are not like us) and the resource is then assigned to running new simulations (back at a human level).

Have I misunderstood?

Comment author: drnickbone 12 August 2014 10:37:09PM *  1 point [-]

One very simple resolution: observing a white shoe (or yellow banana, or indeed anything which is not a raven) very slightly increases the probability of the hypothesis "There are no ravens left to observe: you've seen all of them". Under the assumption that all observed ravens were black, this "seen-em-all" hypothesis then clearly implies "All ravens are black". So non-ravens are very mild evidence for the universal blackness of ravens, and there is no paradox after all.

I find this resolution quite intuitive.

Comment author: drnickbone 09 July 2014 08:42:55PM *  1 point [-]

Actually, it's somewhat unclear whether the IPCC scenarios did better than a "no change" model -- it is certainly true over the short time period, but perhaps not over a longer time period where temperatures had moved in other directions.

There are certainly periods when temperatures moved in a negative direction (1940s-1970s), but then the radiative forcings over those periods (combination of natural and anthropogenic) were also negative. So climate models would also predict declining temperatures, which indeed is what they do "retrodict". A no-change model would be wrong for those periods as well.

Your most substantive point is that the complex models don't seem to be much more accurate than a simple forcing model (e.g. calculate net forcings from solar and various pollutant types, multiply by best estimate of climate sensitivity, and add a bit of lag since the system takes time to reach equilibrium; set sensitivity and lags empirically). I think that's true on the "broadest brush" level, but not for regional and temporal details e.g. warming at different latitudes, different seasons, land versus sea, northern versus southern hemisphere, day versus night, changes in maximum versus minimum temperatures, changes in temperature at different levels of the atmosphere etc. It's hard to get those details right without a good physical model of the climate system and associated general circulation model (which is where the complexity arises). My understanding is that the GCMs do largely get these things right, and make predictions in line with observations; much better than simple trend-fitting.

Comment author: drnickbone 09 July 2014 09:39:50PM *  0 points [-]

P.S. If I draw one supportive conclusion from this discussion, it is that long-range climate forecasts are very likely to be wrong, simply because the inputs (radiative forcings) are impossible to forecast with any degree of accuracy.

Even if we'd had perfect GCMs in 1900, forecasts for the 20th century would likely have been very wrong: no one could have predicted the relative balance of CO2, other greenhouse gases and sulfates/aerosols (e.g. no-one could have guessed the pattern of sudden sulfates growth after the 1940s, followed by levelling off after the 1970s). And natural factors like solar cycles, volcanoes and El NiƱo/La Nina wouldn't have been predictable either.

Similarly, changes in the 21st century could be very unexpected. Perhaps some new industrial process creates brand new pollutants with negative radiative forcing in the 2030s; but then the Amazon dies off in the 2040s, followed by a massive methane belch from the Arctic in the 2050s; then emergency geo-engineering goes into fashion in the 2070s (and out again in the 2080s); then in the 2090s there is a resurgence in coal, because the latest generation of solar panels has been discovered to be causing a weird new plague. Temperatures could be up and down like a yo-yo all century.

Comment author: VipulNaik 09 July 2014 08:24:56PM 0 points [-]

Actually, it's somewhat unclear whether the IPCC scenarios did better than a "no change" model -- it is certainly true over the short time period, but perhaps not over a longer time period where temperatures had moved in other directions.

Co-author Green wrote a paper later claiming that the IPCC models did not do better than the no change model when tested over a broader time period:

http://www.kestencgreen.com/gas-improvements.pdf

But it's just a draft paper and I don't know if the author ever plans to clean it up or have it published.

I would really like to see more calibrations and scorings of the models from a pure outside view approach over longer time periods.

Armstrong was (perhaps wrongly) confident enough of his views that he decided to make a public bet claiming that the No Change scenario would beat out the other scenario. The bet is described at:

http://www.theclimatebet.com/

Overall, I have high confidence in the view that models of climate informed by some knowledge of climate should beat the No Change model, though a lot depends on the details of how the competition is framed (Armstrong's climate bet may have been rigged in favor of No Change). That said, it's not clear how well climate models can do relative to simple time series forecasting approaches or simple (linear trend from radiative forcing + cyclic trend from ocean currents) type approaches. The number of independent out-of-sample validations does not seem to be enough and the predictive power of complex models relative to simple curve-fitting models seems to be low (probably negative). So, I think that arguments that say "our most complex, sophisticated models show X" should be treated with suspicion and should not necessarily be given more credence than arguments that rely on simple models and historical observations.

Comment author: drnickbone 09 July 2014 08:42:55PM *  1 point [-]

Actually, it's somewhat unclear whether the IPCC scenarios did better than a "no change" model -- it is certainly true over the short time period, but perhaps not over a longer time period where temperatures had moved in other directions.

There are certainly periods when temperatures moved in a negative direction (1940s-1970s), but then the radiative forcings over those periods (combination of natural and anthropogenic) were also negative. So climate models would also predict declining temperatures, which indeed is what they do "retrodict". A no-change model would be wrong for those periods as well.

Your most substantive point is that the complex models don't seem to be much more accurate than a simple forcing model (e.g. calculate net forcings from solar and various pollutant types, multiply by best estimate of climate sensitivity, and add a bit of lag since the system takes time to reach equilibrium; set sensitivity and lags empirically). I think that's true on the "broadest brush" level, but not for regional and temporal details e.g. warming at different latitudes, different seasons, land versus sea, northern versus southern hemisphere, day versus night, changes in maximum versus minimum temperatures, changes in temperature at different levels of the atmosphere etc. It's hard to get those details right without a good physical model of the climate system and associated general circulation model (which is where the complexity arises). My understanding is that the GCMs do largely get these things right, and make predictions in line with observations; much better than simple trend-fitting.

View more: Next