simon - LessWrong

FWIW there is a theory that there is a cycle of language change, though it seems maybe there is not a lot of evidence for the isolating -> agglutinating step. IIRC the idea is something like that if you have a "simple" (isolating) language that uses helper words instead of morphology eventually those words can lose their independent meaning and get smushed together with the word they are modifying.

Intention to Treat

simon26d160

Also, when doing a study, please write down afterwards whether you used intention to treat or not.

Example: I encountered a study that says post meal glucose levels depend on order in which different parts of the meal were consumed. But the study doesn't say whether every participant consumed the entire meal, and if not, how that was handled when processing the data. Without knowing if everyone consumed everything, I don't know if the differences in blood glucose were caused by the change in order, or by some participants not consuming some of the more glucose-spiking meal components.

In that case, intention to treat (if used) makes the result of the study less interesting since it provides another effect that might "explain away" the headline effect.

2024 Unofficial LessWrong Survey Results

simon1mo71

Issues with the dutch book beyond the marginal value of money:

It's not as clear as it should that the LLM IQ loss question is talking about a permanent loss (I may have read it as temporary when answering)
Although the LLM IQ drop question does say "your IQ" there's an assumption that that sort of thing is a statistical average - and I think the way I use LLMs, for example, is much less likely to drop my IQ than the average person's usage.
I think is that the LessWrong subscription question is implictly asking about the marginal value of LessWrong given the existence of other resources while the relative LessWrong/LLM value question is implicitly leaning more towards non-marginal value obtained, which might be very many times more

impact: these issues increase LLM/IQ and (Lesswrong/LLM relative to LessWrong/$), which cause errors in the same direction in the LLM/IQ/$/Lesswrong/LLM cycle, potentially by a very large multiplier.

Marginal value due to the high IQ gain of 5 lowers $/IQ which increases IQ/$. This also acts in the same direction.

(That's my excuse anyway. I suspected the cycle when answering and was fairly confident, without actually checking, that I was going to be way off from a "consistent" value. I gave my excuse as a comment in the survey itself that I was being hasty, but on reflection I still endorse an "inconsistent" result here, modulo the fact that I likely misread at least one question).

Maintaining Alignment during RSI as a Feedback Control Problem

simon1mo20

Control theory I think often tends to assume that you are dealing with continuous variables. Which I think the relevant properties of AIs are likely (in practice) not - even if the underlying implementation uses continuous math RSI will make finite changes and even small changes could cause large differences in results.

Also, the dynamics here are likely to depend on capability thresholds which could cause trend extrapolation to be highly misleading.

Also, note that RSI could create a feedback loop which could enhance agency including towards nonaligned goals (agentic AI convergently wants to enhance its own agency).

Also beware that agency increases may cause increases in apparent capability because of Agency Overhang.

Complete Feedback

simon2mo20

The AI system accepts all previous feedback, but it may or may not trust anticipated future feedback. In particular, it should be trained not to trust feedback it would get by manipulating humans (so that it doesn't see itself as having an incentive to manipulate humans to give specific sorts of feedback).
I will call this property of feedback "legitimacy". The AI has a notion of when feedback is legitimate, and it needs to work to keep feedback legitimate (by not manipulating the human).

Legitimacy is good - but if an AI that's supposed to be intent-aligned to the user would find that it has an "incentive" to purposefully manipulate the user in order to get particular feedback from the user, unless it pretends that it would ignore that feedback, it's already misaligned and that misalignment should be dealt with directly IMO - this feels to me like a band-aid over a much more serious problem.

Escape from Alderaan I

simon2mo62

Luke ignited the lightsaber Obi-Wan stole from Vader.

This temporarily confused me until I realized

it was not talking about the lightsaber Vader was using here, but about the one that Obi-Wan took from him in the Revenge of the Sith and gave to Luke near the start of A New Hope.

Fluoridation: The RCT We Still Haven't Run (But Should)

simon3mo6-1

We may thus rule out negative effects larger than
0.14 standard deviations in cognitive ability if fluoride is increased by
1 milligram/liter (the level often considered when artificially fluoridat-
ing the water).

That's a high level of hypothetical harm that they are ruling out (~2 IQ points?). I would take the dental harms many times over to avoid that much cognitive ability loss.

D&D.Sci Dungeonbuilding: the Dungeon Tournament Evaluation & Ruleset

simon3mo20

actually, there are ~100 rows in the dataset where Room2=4, Room6=8, and Room3=5=7.

I actually did look at that (at least some subset with that property) at some point, though I didn't (think of/ get around to) re-looking at it with my later understanding.

In general, I think this is a realistic thing to occur: 'other intelligent people optimizing around this data' is one of the things that causes the most complicated things to happen in real-world data as well.

Indeed, I am not complaining! It was a good, fair difficulty to deal with.

That being said, there was one aspect I did feel was probably more complicated than ideal, and that was the combination of the tier-dependent alerting with the tiers not having any other relevance than this one aspect. That is, if the alerting had in each case been simply dependent on whether the adventurers were coming from an empty room or not, it would have been a lot simpler to work out. And if there was tier dependent alerting, but the tiers were more obvious in other ways*, it would still have been tricky but at least there would be a path to recognize the tiers and then try to figure out other ways that they might have relevance. The way it was it seemed to me you pretty much had to look at what were (ex ante) almost arbitrary combinations of (current encounter, next encounter) to figure that aspect out, unless you actually guessed the rationale of the alerting effect.

That might be me rationalizing my failure to figure it out though!

* e.g. perhaps the traps/golems could have had the same score as the same-tier nontrap encounter when alerted (or alternatively when not alerted)

Rebuttals for ~all criticisms of AIXI

simon3mo61

The biggest problem about AIXI in my view is the reward system - it cares about the future directly, whereas to have any reasonable hope of alignment an AI in my view needs to care about the future only via what humans would want about the future (so that any reference to the future is encapsulated in the "what do humans want?" aspect).

I.e. the question it needs to be answering is something like "all things considered (including the consequences of my current action on the future, as well as taking into account my possible future actions) what would humans, as they exist now, want me to do at the present moment?"

Now maybe you can take that question and try to slice it up into rewards at particular timesteps, which change over time as what is known about what humans want changes, without introducing corrigibility issues, but the AIXI reward framework isn't really buying you anything imo even if that works, relative to directly trying to get an AI to solve the question.

On the other hand approximating Solomonoff induction might afaik be a fruitful approach, though the approximations are going to have to be very aggressive for practical performance. I do agree embeddding/self-reference can probably be patched in.

On Eating the Sun

simon3mo6-3

I think that it's likely to take longer than 10000 years, simply because of the logistics (not the technology development, which the AI could do fast).

The gravitational binding energy of the sun is something on the order of 20 million years worth of its energy output. OK, half of the needed energy is already present as thermal energy, and you don't need to move every atom to infinity, but you still need a substantial fraction of that. And while you could perhaps generate many times more energy than the solar output by various means, I'd guess you'd have to deal with inefficiencies and lots of waste heat if you try to do it really fast. Maybe if you're smart enough you can make going fast work well enough to be worth it though?

LESSWRONG
LW

Posts

Wikitag Contributions

Comments