Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: cousin_it 30 September 2017 11:21:43PM *  2 points [-]

Nice! Right now I'm faced with an exercise in catching loopholes of exactly that kind, while trying to write a newbie-friendly text on UDT. Basically I'm going through a bunch of puzzles involving perfect predictors, trying to reformulate them as crisply as possible and remove all avenues of cheating. It's crazy.

For your particular puzzle, I think you can rescue it by making the gods go into an infinite loop when faced with a paradox. And when faced with a regular non-paradoxical question, they can wait for an unknown but finite amount of time before answering. That way you can't reliably distinguish an infinite loop from an answer that's just taking a while, so your only hope of solving the problem in guaranteed finite time is to ask non-paradoxical questions. That also stops you from manipulating gods into doing stuff, I think.

Comment author: Florian_Dietz 01 October 2017 12:45:30AM 1 point [-]

Can you give me some examples of those exercises and loopholes you have seen?

logic puzzles and loophole abuse

2 Florian_Dietz 30 September 2017 03:45PM

I recently read about the hardest logic puzzle ever on Wikipedia and noticed that someone published a paper in which they solved the problem by asking only two questions instead of three. This relied on abusing the loophole that boolean formulas can result in a paradox.

This got me thinking in what other ways the puzzle could be abused even further, and I managed to find a way to turn the problem into a hack to achieve omnipotence by enslaving gods (see below).

I find this quite amusing, and I would like to know if you know of any other examples where popular logic puzzles can be broken in amusing ways. I'm looking for any outside-the-box solutions that give much better results than expected. another example.

 

Here is my solution to the "hardest logic puzzle ever":

 

This solution is based on the following assumption: The gods are quite capable of responding to a question with actions besides saying 'da' and 'ja', but simply have no reason to do so. As stated in the problem description, the beings in question are gods and they have a language of their own. They could hardly be called gods, nor have need for a spoken language, if they weren't capable of affecting reality.

At a bare minimum, they should be capable of pronouncing the words 'da' and 'ja' in multiple different ways, or to delay answering the question by a fixed amount of time after the question is asked. Either possibility would extend the information content of an answer from a single bit of information to arbitrarily many bits, depending on how well you can differentiate different intonations of 'da' and 'ja', and how long you are willing to wait for an answer.

We can construct a question that will result in a paradox unless a god performs a certain action. In this way, we can effectively enslave the god and cause it to perform arbitrary actions on our behalf, as performing those actions is the only way to answer the question. The actual answer to the question becomes effectively irrelevant.

To do this, we approach any of the three gods and ask them the question OBEY, which is defined as follows:

OBEY = if WISH_WRAPPER then True else PARADOX

PARADOX = "if I asked you PARADOX, would you respond with the word that means no in your language?"

WISH_WRAPPER = "after hearing and understanding OBEY, you act in such a way that your actions maximally satisfy the intended meaning behind WISH. Where physical, mental or other kinds of constraints prevent you from doing so, you strive to do so to the best of your abilities instead."

WISH = "you determine the Coherent Extrapolated Volition of humanity and act to maximize it."

You can substitute WISH for any other wish you would like to see granted. However, one should be very careful while doing so, as beings of pure logic are likely to interpret vague actions differently from how a human would interpret them. In particular, one should avoid accidentally making WISH impossible to fulfill, as that would cause the god's head to explode, ruining your wish.

The above formulation tries to take some of these concerns into account. If you encounter this thought experiment in real life, you are advised to consult a lawyer, a friendly-AI researcher, and possibly a priest, before stating the question.

Since you can ask three questions, you can enslave all three gods. Boolos' formulation states about the random god that "if the coin comes down heads, he speaks truly; if tails, falsely". This formulation implies that the god does try to determine the truth before deciding how to answer. This means that the wish-granting question also works for the random god.

If the capabilities of the gods are uncertain, it may help to establish clearer goals as well as fall-back goals. For instance, to handle the case that the gods are in fact limited to speaking only 'da' and 'ja', it may help to append the WISH as follows: "If you are unable to perform actions in response to OBEY besides answering 'da' or 'ja', you wait for the time period outlined in TIME before making your answer." You can now encode arbitrary additional information in TIME, with the caveat that you will have to actually wait before getting a response. Your ability to accurately measure the elapsed time between question and answer directly correlates with how much information you can put into TIME without risking starvation before the question is answered. The following is a simple example of TIME that would allow you to solve the original problem formulation with just asking OBEY once of any of the gods:

TIME = "If god A speaks the truth, B lies and C is random, you wait for 1 minute before answering. If god A speaks the truth, C lies and B is random, you wait for 2 minutes before answering. If god B speaks the truth, A lies and C is random, you wait for 3 minutes before answering. If god B speaks the truth, C lies and A is random, wait for 4 minutes before answering. If god C speaks the truth, A lies and B is random, wait for 5 minutes before answering. If god C speaks the truth, B lies and A is random, wait for 6 minutes before answering."

a different perspecive on physics

0 Florian_Dietz 26 June 2017 10:47PM

(Note: this is anywhere between crackpot and inspiring, based on the people I talked to before. I am not a physicist.)

I have been thinking about a model of physics that is fundamentally different from the ones I have been taught in school and university. It is not a theory, because it does not make predictions. It is a different way of looking at things. I have found that this made a lot of things we normally consider weird a lot easier to understand.

Almost every model of physics I have read of so far is based on the idea that reality consists of stuff inside a coordinate system, and the only question is the dimensionality of the coordinate system. Relativity talks about bending space, but it still treats the existence of space as the norm. But what if there were no dimensions at all?

Rationale

If we assume that the universe is computable, then dimension-based physics, while humanly intuitive, are unnecessarily complicated. To simulate dimension-based physics, one first needs to define real numbers, which is complicated, and requires that numbers be stored with practically infinite precision. Occam's Razor argues against this.

A graph model in contrast would be extremely simple from a computational point of view: a set of nodes, each with a fixed number of attributes, plus a set of connections between the nodes, suffices to express the state of the universe. Most importantly, it would suffice for the attributes of nodes to be simple booleans or natural numbers, which are much easier to compute than real numbers. Additionally, transition functions to advance in time would be easy to define as well as they could just take the form of a set of if-then rules that are applied to each node in turn. (these transition functions roughly correspond to physical laws in more traditional physical theories)

Idea

Model reality as a graph structure. That is to say, reality at a point of time is a set of nodes, a set of connections between those nodes, and a set of attributes for each node. There are rules for evolving this graph over time that might be as simple as those in Conway's game of life, but they lead to very complex results due to the complicated structure of the graph.

Connections between nodes can be created or deleted over time according to transition functions.

What we call particles are actually patterns of attributes on clusters of nodes. These patterns evolve over time according to transition functions. Also, since particles are patterns instead of atomic entities, they can in principle be created and destroyed by other patterns.

Our view of reality as (almost) 3-dimensional is an illusion created by the way the nodes connect to each other: This can be done if a pattern exists that matches these criterions: change an arbitrarily large graph (a set of vertices, a set of edges), such that the following is true:

-There exists a mapping f(v) of vertices to (x,y,z) coordinates such that for any pair of vertices m,n: the euclidean distance of f(m) and f(n) is approximately equal to the length of the shortest path between m and n (inaccuracies are fine so long as the distance is small, but the approximation should be good at larger distances).

A dimensionless graph model would have no contradiction between quantum physics and relativity. Quantum effects happen when patterns (particles) spread across nodes that still have connections between them besides those connections that make up the primary 3D grid. This also explains why quantum effects exist mostly on small scales: the pattern enforcing 3D grid connections tends to wipe out the entanglements between particles. Space dilation happens because the patterns caused by high speed travel cause the 3D grid pattern to become unstable and the illusion that dimensions exist breaks down. There is no contradiction between quantum physics and relativity if the very concept of distance is unreliable. Time dilation is harder to explain, but can be done. This is left as an exercise to the reader, since I only really understood this graph-based point of view when I realised how that works, and don't want to spoiler the aha-moment for you.

Note

This is not really a theory. I am not making predictions, I provide no concrete math, and this idea is not really falsifiable in its most generic forms. Why do I still think it is useful? Because it is a new way of looking at physics, and because it makes everything so much more easy and intuitive to understand, and makes all the contradictions go away. I may not know the rules by which the graph needs to propagate in order for this to match up with experimental results, but I am pretty sure that someone more knowledgeable in math can figure them out. This is not a theory, but a new perspective under which to create theories.

Also, I would like to note that there are alternative interpretations for explaining relativity and quantum physics under this perspective. The ones mentioned above are just the ones that seem most intuitive to me. I recognize that having multiple ways to explain something is a bad thing for a theory, but since this is not a theory but a refreshing new perspective, I consider this a good thing.

I think that this approach has a lot of potential, but is difficult for humans to analyse because our brains evolved to deal with 3D structures very efficiently but are not at all optimised to handle arbitrary graph structures with any efficiency. For this reason, Coming up with an actual mathematically complete attempt at a graph-based model of physics would almost certainly require computer simulations for even simple problems.

Conclusion

Do you think the idea has merit?

If not, what are your objections?

Has research in something like this maybe already been done, and I just never heard of it?

Comment author: hairyfigment 29 December 2016 06:53:38AM 0 points [-]

The first problem I see here is that cheating at D&D is exactly what we want the AI to do.

Comment author: Florian_Dietz 29 December 2016 11:14:29PM 0 points [-]

A fair point. How about changing the reward then: don't just avoid cheating, but be sure to tell us about any way to cheat that you discover. That way, we get the benefits without the risks.

Comment author: Manfred 22 December 2016 06:00:36AM 0 points [-]

Most games-as-in-game-theory that you can scrape together for training are much more simple than your average Atari game. Since you're relying on your training data to do so much of the work here, you want to have some idea of what training data will teach what, with what learning algorithm. You don't want to leave the AI a nebulous fog, nor do you want to solve problems by stipulating that the training data will get arbitrarily large and complicated.

Instead, the sort of proposal I think is most helpful is the kind where, if achieved, it will show that you can solve an important problem with a certain architecture. That's sort of what I meant by "shortcuts" - is the problem of learning not to cheat an easy way to demonstrate some value learning capability we need to work on? An example of this kind of capability-demonstration might be interpolating smoothly between objects as a demonstration that neural networks are learning high-level features that are similar to human-intelligible concepts.

Now, you might say "of course - learning not to cheat is itself the skill we want the AI to have." But I'm not convinced that not cheating at chess or whatever demonstrates that the AI is not going to over-optimize the world, because those are very different domains. The trick, sometimes, is breaking down "don't over-optimize the world" into little pieces that you can work on without having to jump all the way there, and then demonstrating milestones for those little pieces.

Comment author: Florian_Dietz 23 December 2016 10:57:11PM 0 points [-]

My definition of cheating for these purposes is essentially "don't do what we don't want you to do, even if we never bothered to tell you so and expected you to notice it on your own". This skill would translate well to real-world domains.

Of course, if the games you are using to teach what cheating is are too simple, then you don't want to use those kinds of games. If neither board games nor simple game theory games are complex enough, then obviously you need to come up with a more complicated kind of game. It seems to me that finding a difficult game to play that teaches you about human expectations and cheating is significantly easier than defining "what is cheating" manually.

One simple example that could be used to teach an AI: let it play an empire-building videogame, and ask it to "reduce unemployment". Does it end up murdering everyone who is unemployed? That would be cheating. This particular example even translates really well to reality, for obvious reasons.

By the way, why would you not want the AI to be left in "a nebulous fog". The more uncertain the AI is about what is and is not cheating, the more cautious it will be.

Comment author: Dagon 22 December 2016 12:32:05AM 1 point [-]

Note that there are two parts to this, both big, hairy, and unsolved: 1) teach the AI to know what many groups of humans would consider "cheating". I expect "cheating" is only a subset of bad behaviors, and this is just an instance of "understand human CEV". 2) motivate the AI to not cheat. Unless cheating would help further human interest, maybe.

In short, "solve friendly AI".

Comment author: Florian_Dietz 23 December 2016 10:46:22PM 0 points [-]

Yes. I am suggesting to teach AI to identify cheating as a comparatively simple way of making an AI friendly. For what other reason did you think I suggested it?

Comment author: Manfred 21 December 2016 02:34:13AM 0 points [-]

Each example of cheating is pretty simple, and as a group they might have some simple patterns. So I'm not sure how well what the AI learns will match the human concept. And it also seems like e.g. an agent with a reward button taking over the button is not a central example of cheating.

This still might be interesting with a large dataset. Are there any shortcuts that run through here?

Comment author: Florian_Dietz 21 December 2016 06:49:33PM 0 points [-]

I am referring to games in the sense of game theory, not actual board games. Chess was just an example. I don't know what you mean by the question about shortcuts.

Comment author: korin43 21 December 2016 02:24:51PM 2 points [-]

How do you show the AI the difference between "cheating" and "figuring out the solution we wouldn't think of"?

Comment author: Florian_Dietz 21 December 2016 06:46:55PM 0 points [-]

It needs to learn that from experience, just like humans do. Something that also helps at least for simpler games is to basically provide the manual of the game in a written language.

Teaching an AI not to cheat?

2 Florian_Dietz 20 December 2016 02:37PM

I have been thinking about a technique in training AIs that I believe would be very useful. I would like to know if this is already known, or if it has been discussed at all.

I find that there are lots of different failure modes that people are worried about when it comes to AI. Maybe the AI misunderstands human intentions, maybe it deliberately misinterprets an order, maybe it associates the wrong sort of actions with the reward, etc.

If it was a game, many of these failure modes are what we would consider cheating. So why don't we just take this analogy and run with it:

 

Teach the AI to realize on its own what would be considered cheating by a human, and not to do anything that it identifies as cheating.

 

To do this, one could use the following technique:

Come up with games of increasing complexity, and let the AI play it in two stages:

In stage one, you introduce an artificial loophole into the game that makes winning it very easy. For instance, assuming the AI has already played chess before, so it can be assumed to understand the rules, give the AI the task to play a game of chess in which you simply do not check if the moves are legal. When the AI wins by cheating, i.e. via ordinarily illegal moves, reward it anyway.

In the second stage, the reward is far greater, but if the AI plays an illegal move, it now receives negative feedback.

Let the AI play many different games in these two stages. After a while, the AI will learn to identify what constitutes cheating, and to avoid doing so.

Start varying the amount of time during which cheating is allowed, to keep the AI on its toes. Sometimes, don't allow any cheating at all from the start.

 

If you train an AI in this manner:

-it would learn to understand how humans view the world (in some limited sense), as a human-centric viewpoint is necessary to understand what does and does not constitute cheating in human-designed games.

-it would be driven to adjust its own actions to match human preconceptions out of a fear of getting punished.

-if this AI were to "break out of the box" prematurely, there would be at least a chance that it would recognize that it was not supposed to get out of the box, that this constitutes cheating, and that it should get back in. This could even be tested by building a "box" of several layers and deliberately designing the inner layers to be hackable.

Comment author: Florian_Dietz 03 October 2016 08:22:13PM *  3 points [-]

Is there an effective way for a layman to get serious feedback on scientific theories?

I have a weird theory about physics. I know that my theory will most likely be wrong, but I expect that some of its ideas could be useful and it will be an interesting learning experience even in the worst case. Due to the prevalence of crackpots on the internet, nobody will spare it a glance on physics forums because it is assumed out of hand that I am one of the crazy people (to be fair, the theory does sound pretty unusual).

View more: Next