Martin Randall - LessWrong

Authors Have a Responsibility to Communicate Clearly

Bob's statement 1: "I literally have a packet of blue BIC pens in my desk drawer" was not literally true, and that error was not relevant to the proposition that BIC make blue pens. I'm okay with assigning "basically full credit" for that statement.

Bob's statement 2: "All I really meant was that I had blue pens at my house" is not literally true. For what proposition is that statement being used as evidence? I don't see an explicit one in mattmacdermott's hypothetical. It's not relevant to the proposition that BIC make blue pens. This is the statement for which I assigned a "large demerit for being untrustworthy about the meaning of his own words, in a low stakes situation where there was no reason to lie".

I don't think that Bob's statement 2 is an error of no significance. If I'm Alice, and Bob is my friend, then he apparently just lied to my face. Hopefully it's a one-off slip of the tongue, and not part of a pattern.

Authors Have a Responsibility to Communicate Clearly

Martin Randall9d139

We may have different values, or we may be imagining a different hypothetical, or something else. I'll try to elaborate.

Like you, it seems blindingly obvious to me what Bob was trying to communicate by saying "I literally have a packet of blue BIC pens in my desk drawer". To be explicit, Bob is stating his belief that there is a packet of blue BIC pens in his desk drawer. He doesn't specify his credence, but it's probably somewhere 90-99% depending on other factors.

Bob's words imply other things that have been mentioned, all of which are straightforwardly true:

Blue BIC pens exist in the world.
There is a packet of blue BIC pens in Bob's house.
Bob can prove the existence of blue BIC pens by showing them to Alice.
Bob remembers the packet of blue BIC pens being in the drawer.
It would be surprising to Bob if there were no packet of blue BIC pens in the drawer.

I'm okay with assigning "basically full credit" for Bob's statement. I wouldn't call it "sloppy"; I wouldn't say it was "too enthusiastic"; I wouldn't call for a "weaker version" or "more carefully worded argument". It's okay to make mistakes, try to fix them, and learn from them too.

The statement that I say is false and bad is:

"All I really meant was that I had blue pens at my house"

To me, Bob's words here prioritize status-defending over clarity. They remind me of this from TurnTrout's opening post:

An author committed to clarity might say something like: "I can see how my words led you to believe X. To be clear, what I mean is Y." This response takes responsibility.

A status-defending author might say something more like: "You are wrong to read it as X. It obviously means Y, and you are being uncharitable." This response deflects responsibility.

I'm not clear what your position is exactly (and it may not be mattmacdermott's position). I guess one of:

Bob's statement is literally false, but should be understood as something else (what?) that is true.
Bob's statement is true, because the meaning of Bob's earlier statements retroactively changed when Bob opened the drawer.
Bob's statement is false, but given the low everyday-life stakes, it's ok.

I have some sympathy with these positions, but I respectfully disagree.

Some arguments against a land value tax

Martin Randall10d40

I wouldn't say your article was misleading, just oddly framed, for me. To me it is obvious that taxes introduce distortions (with rare exceptions like carbon taxes that are addressing negative externalities). So "an LVT (can) discourage searching for new uses of land" is a true fact but not yet an "argument against" - to become an argument it needs to be supplemented with "... and this distortion is more damaging than existing taxes for the same revenue".

That said, in the last six months the tariff discussions have been eye-opening. The quality of debate has been very low, and basic facts like "tariffs are taxes", "tariffs raise revenue", "tariffs increase prices", and "tariffs reduce trade" apparently need to be explained at length. This aligns with your description of parts of pro-LVT Twitter, so I understand better where you are coming from than when I wrote my comment above.

Authors Have a Responsibility to Communicate Clearly

Martin Randall11d111

Based on your description of that incident, I expect that when Bob said:

"I literally have a packet of blue BIC pens in my desk drawer. We will go to my house, open the drawer, and you will see them."

that he did not, in fact, really mean:

"I have blue pens at my house"

That is a bizarre interpretation of that set of words. Why did Alice and Bob open the drawer to look for pens if they didn't both interpret his words as indicating that there was, literally, a packet of blue BIC pens in Bob's desk drawer?

So when Bob says:

"All I really meant was that I had blue pens at my house"

then he is making a false statement. Perhaps he is knowingly making a false statement - that's ethically bad. Perhaps he is unable to comprehend the meaning of his own words - that's epistemically bad. Either way it's not fine.

Instead Bob could say a true thing like:

"I was wrong about where the pens were, but right about BIC making blue pens"

Or a shorter true thing like:

"See, BIC makes blue pens".

My score for Bob:

full credit for knowing whether BIC makes blue pens.
partial credit for mostly knowing the location of his pens.
large demerit for being untrustworthy about the meaning of his own words, in a low stakes situation where there was no reason to lie. Get a grip, Bob.

A Bear Case: My Predictions Regarding AI Progress

Martin Randall4mo50

Competently zero-shotting games like Pokémon without having been trained to do that, purely as the result of pretraining-scaling plus transfer learning from RL on math/programming.

Here is a related market inspired by the AI timelines dialog, currently at 30%:

Note that in this market the AI is not restricted to only "pretraining-scaling plus transfer learning from RL on math/programming", it is allowed to be trained on a wide range of video games, but it has to do transfer learning to a new genre. Also, it is allowed to transfer successfully to any new genre, not just Pokémon.

I infer you are at ~20% for your more restrictive prediction:

80% bear case is correct, in which case P=5%
20% bear case is wrong, in which case P=80% (?)

So perhaps you'd also be at ~30% for this market?

I'm not especially convinced by your bear case, but I think I'm also at ~30% on the market. I'm tempted to bet lower because of the logistics of training the AI, finding a genre that it wasn't trained on (might require a new genre to be created), and then having the demonstration occur, all in the next nine months. But I'm not sure I have an edge over the other bettors on this one.

Self-fulfilling misalignment data might be poisoning our AI models

Martin Randall4mo*62

It makes sense that you don't want this article to opine on the question of whether people should not have created "misalignment data", but I'm glad you concluded that it wasn't a mistake in the comments. I find it hard to even tell a story where this genre of writing was a mistake. Some possible worlds:

1: it's almost impossible for training on raw unfiltered human data to cause misaligned AIs. In this case there was negligible risk from polluting the data by talking about misaligned AIs, it was just a waste of time.

2: training on raw unfiltered human data can cause misaligned AIs. Since there is a risk of misaligned AIs, it is important to know that there's a risk, and therefore to not train on raw unfiltered human data. We can't do that without talking about misaligned AIs. So there's a benefit from talking about misaligned AIs.

3: training on raw unfiltered human data is very safe, except that training on any misalignment data is very unsafe. The safest thing is to train on raw unfiltered human data that naturally contains no misalignment data.

Only world 3 implies that people should not have produced the text in the first place. And even there, once "2001: A Space Odyssey" (for example) is published the option to have no misalignment data in the corpus is blocked, and we're in world 2.

Weirdness Points

Martin Randall4mo50

Alice should already know what kind of foods her friends like before inviting them to a dinner party where she provides all the food. She could have gathered this information by eating with them at other events, such as restaurants, pot lucks, or at mutual friends. Or she could have learned it in general conversation. When inviting friends to a dinner party where she provides all the food, Alice should say what the menu is and ask for allergies and dietary restrictions. When people are at her dinner party, Alice should notice if someone is only picking at their food.

Bob should be honest about his food preferences instead of silently resenting the situation. In his culture it's rude to ask Alice to serve meat. Fine, don't do that. But it's not rude to have food preferences and express them politely, so do that. I'm not so much saying "communicate better" as "use your words". If Bob can't think of any words he can ask an LLM. Claude 3.7 suggests:

"I'd love to come! I've been having trouble enjoying vegan food - would it be okay if I brought something to share?"

It's a messed up situation and it mostly sounds to me like Alice and Bob are idiots. Since lsuser doesn't appear to be an idiot, I doubt he is in this situation.

Weirdness Points

Martin Randall4mo20

I agree that constraints make things harder, and that being vegan is a constraint, but again that is separate to weirdness. If Charles is hosting a dinner party on Friday in a "fish on Friday" culture then Charles serving meat is weird in that culture but it means Charles is less constrained, not more. If anything the desire to avoid weirdness can be a constraint. There are many more weird pizza toppings than normal pizza toppings.

Given the problem that Alice and Bob are having, a good approach is that they communicate better, so that they know there is a problem, and what it is. An approach of being less weird may cause more problems than it solves.

Weirdness Points

Martin Randall4mo50

I don't think that's about weirdness. Bob could have the exact same thoughts and actions if Alice provides some type of "normal" food (for whatever counts as "normal" in Bob's culture), but Bob hates that type of food, or hates the way Alice cooks it, or hates the place Alice buys it, or whatever.

Alice and Bob are having trouble communicating, which will cause problems no matter how normal (or weird) they both are.

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Martin Randall4mo20

That's what I meant by "base model", one that is only trained on next token prediction. Do I have the wrong terminology?

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments