Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

My take on the tool VS agent distinction:

  • A tool runs a predefined algorithm whose outputs are in a narrow, well-understood and obviously safe space.

  • An agent runs an algorithm that allows it to compose and execute its own algorithm (choose actions) to maximize its utility function (get closer to its goal). If the agent can compose enough actions from a large enough set, the output of the new algorithm is wildly unpredictable and potentially catastrophic.

This hints that we can build safe agents by carefully curating the set of actions it chooses from so that any algorithm composed from the set produces an output that is in a safe space.

I think being as honest as reasonably sensible is good for oneself. Being honest applies pressure on oneself and one’s environment until the both closely match. I expect the process to have its ups and downs but to lead to a smoother life on the long run.

An example that comes to mind is the necessity to open up to have meaningful relationships (versus the alternative of concealing one’s interests which tends to make conversations boring).

Also honesty seems like a requirement to have an accurate map of reality: having snappy and accurate feedback is essential to good learning, but if one lies and distorts reality to accomplish one’s goals, reality will send back distorted feedback causing incorrect updates of one’s beliefs.

On another note: this post immediately reminded me of the buddhist concept of Right Speech, which might be worth investigating for further advice on how to practice this. A few quotes:

"Right speech, explained in negative terms, means avoiding four types of harmful speech: lies (words spoken with the intent of misrepresenting the truth); divisive speech (spoken with the intent of creating rifts between people); harsh speech (spoken with the intent of hurting another person's feelings); and idle chatter (spoken with no purposeful intent at all)."

"In positive terms, right speech means speaking in ways that are trustworthy, harmonious, comforting, and worth taking to heart. When you make a practice of these positive forms of right speech, your words become a gift to others. In response, other people will start listening more to what you say, and will be more likely to respond in kind. This gives you a sense of the power of your actions: the way you act in the present moment does shape the world of your experience."

Thanissaro Bhikkhu (source: https://www.accesstoinsight.org/lib/authors/thanissaro/speech.html)

I also thought about something along those lines: explaining the domestication of wolves to dogs, or maybe prehistoric wheat to modern wheat, then extrapolating to chimps. Then I had a dangerous thought, what would happen if we tried to select chimps for humaneness?

goals appear only when you make rough generalizations from its behavior in limited cases.

I am surprised no one brought up the usual map / territory distinction. In this case the territory is the set of observed behaviors. Humans look at the territory and with their limited processing power they produce a compressed and lossy map, here called the goal.

The goal is a useful model to talk simply about the set of behaviors, but has no existence outside the head of people discussing it.

This is a great use case for AI: expert knowledge tailored precisely to one’s needs

azergante1-1

Is the "cure cancer goal ends up as a nuke humanity action" hypothesis valid and backed by evidence?

My understanding is that the meaning of the "cure cancer" sentence can be represented as a point in a high-dimensional meaning space, which I expect to be pretty far from the "nuke humanity" point. 

For example "cure cancer" would be highly associated with saving lots of lives and positive sentiments, while "nuke humanity" would have the exact opposite associations, positioning it far away from "cure cancer".

A good design might specify that if the two goals are sufficiently far away they are not interchangeable. This could be modeled in the AI as an exponential decrease of the reward based on the distance between the meaning of the goal and the meaning of the action.

Does this make any sense? (I have a feeling I might be mixing concepts coming from different types of AI)

If you know your belief isn't correlated to reality, how can you still believe it?

 

Interestingly, physics models (map) are wrong (inaccurate) and people know that but still use them all the time because they are good enough with respect to some goal.

Less accurate models can even be favored over more accurate ones to save on computing power or reduce complexity.

As long as the benefits outweigh the drawbacks, the correlation to reality is irrelevant.

Not sure how cleanly this maps to beliefs since one would have to be able to go from one belief to another, however it might be possible by successively activating different parts of the brain that hold different beliefs, in a way similar to someone very angry that completely switches gears to answer an important phone call.

@Eliezer, some interesting points in the article, I will criticize what frustrated me:

> If you see a beaver chewing a log, then you know what this thing-that-chews-through-logs looks like,
> and you will be able to recognize it on future occasions whether it is called a “beaver” or not.
> But if you acquire your beliefs about beavers by someone else telling you facts about “beavers,”
> you may not be able to recognize a beaver when you see one.

Things do not have intrinsic meaning, rather meaning is an emergent property of
things in relation to each other: for a brain, an image of a beaver and the sound
"beaver" are just meaningless patterns of electrical signals.

Through experiencing reality the brain learns to associate patterns based on similarity, co-occurence and so on, and labels these clusters with handles in order to communicate. ’Meaning’ is the entire cluster itself, which itself bears meaning in relation to other clusters.

If you try to single out a node off the cluster, you soon find that it loses all meaning and
reverts back to meaningless noise.

> G1071(G1072, G1073)

Maybe the above does not seem dumb now? experiencing reality is basically entering and updating relationships that eventually make sense as a whole in a system.

I feel there is a huge difference in our models of reality:

In my model everything is self-referential, just one big graph where nodes barely exist (only aliases for the whole graph itself). There is no ground to knowledge, nothing ultimate. The only thing we have
is this self-referential map, from which we infer a non-phenomenological territory.

You seem to think the territory contains beavers, I claim beavers exist only in the map, as a block arbitrarily carved out of our phenomenological experience by our brain, as if it were the only way to carve a concept out of experience and not one of infinitely many valid ways (e.g. considering the beaver and the air around and not have a concept for just a beaver with no air), and as if only part experience could be considered without being impacted by the whole of experience (i.e. there is no living beaver without air).

This view is very influenced by emptiness by the way.
 

The examples seem to assume that "and" and "or" as used in natural language work the same way as their logical counterpart. I think this is not the case and that it could bias the experiment’s results.

As a trivial example the question "Do you want to go to the beach or to the city?" is not just a yes or no question, as boolean logic would have it.

Not everyone learns about boolean logic, and those who do likely learn it long after learning how to talk, so it’s likely that natural language propositions that look somewhat logical are not interpreted as just logic problems.

 

I think that this is at play in the example about Russia. Say you are on holidays and presented with one these 2 statements:

1. "Going to the beach then to the city"

2. "Going to the city"

The second statement obviously means you are going only to the city, and not to the beach nor anywhere else before.

 

Now back to Russia:

1. "Russia invades Poland, followed by suspension of diplomatic relations between the USA and the USSR” 

2. “Suspension of diplomatic relations between the USA and the USSR”

Taken together, the 2nd proposition strongly implies that Russia did not invade Poland: after all if Russia did invade Poland no one would have written the 2nd proposition because it would be the same as the 1st one. 

And it also implies that there is no reason at all for suspending relations: the statements look like they were made by an objective know-it-all, a reason is given in the 1st statement, so in that context it is reasonable to assume that if there was a reason for the 2nd statement it would also be given, and the absence of further info means there is no reason.

Even if seeing only the 2nd proposition and not the 1st, it seems to me that humans have a need to attribute specific causes to effects (which might be a cognitive bias), and seeing no explanation for the event, it is natural to think "surely, there must be SOME reason, how likely is it that Russia suspends diplomatic relations for no reason?",  but confronted to the fact that no reason is given, the probability of the event is lowered. 

It seems that the proposition is not evaluated as pure boolean logic, but perhaps parsed taking into account the broader social context, historical context and so on, which arguably makes more sense in real life.