User Comment Replies

A simple case for extreme inner misalignment

Wouldn't a goal be mainly "past" data though? Though I guess the application of the goal depends on recognizing features when trying to apply it. I guess it depends how far out of context/distribution one is trying to apply the goal in the future.

A simple case for extreme inner misalignment

Rene de Visser9mo50

Surely the point of compression is that what you are compressing is preserved. i.e. the uncompressed version is roughly reproduced. Better compression means you preserve the important aspects while using less space.

Shouldn't the goal be preserved by the compression? I don't get this post at all.

2Seth Herd9mo

I agree 100%. This post is basically arguing that greater intelligence will get its goals more wrong in future versions. That would be dumber, not smarter. The post frames the hypothesis as "greater intelligence compresses more" without hugely arguing that's true and inevitable. I think the premise is simply false. Better compression is an element of greater intelligence up to some point (useful abstract representations that aid thinking with limited computational resources), but not further beyond that point with any necessity.

3the gears to ascension9mo

The structure of past data is preserved when creating a compressor. Future data is only constrained by smoothness.

Why Q*, if real, might be a game changer

Rene de Visser1y*10

I wonder if giving lower rewards for correctly guessing common tokens, and higher rewards for correctly guessing uncommon tokens would improve models? I don't think I've seen anyone trying this.

Found: https://ar5iv.labs.arxiv.org/html/1902.09191 - Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss .

2gwern1y

It's not obvious that 'uncommon' tokens are good or that that's a good approach. They could also just be unlikely or garbage, and your screening method for filtering for 'uncommon' tokens may ensure that they are garbage, or otherwise sabotage your model. (This is the 'mammogram screening problem': even if you have a good filter, if you run it across trillions of tokens, you will wind up throwing out many good tokens and keeping many bad tokens. There are a number of LLM-related papers about the horrificly bad data you can wind up compiling if you neglect data cleaning, particularly in multilingual translation when you're trying to scrape rare languages off the general Internet.) Nor are good datapoints necessarily made up of uncommon tokens: there are zero uncommon tokens in my 'microwave' example. (Data pruning & active learning are hard.)

ProjectLawful.com: Eliezer's latest story, past 1M words

Rene de Visser3y30

I'd also like an EPUB version that is stripped as possible. I guess it might be necessary to prepend the characters name to know who is saying what, but I find the rest very distracting. I find it makes it hard to read.

Answer by Rene de VisserMar 28, 202210

"Most of us will find it easy to believe that nuclear war becomes more probable if Putin is massively defeated"

How do you know? What do you mean by "Putin is massively defeated". Do you mean Putin is ousted? I guess you mean that Putin's attack on the Urkraine looks bad. Hard to know what will happen then. Depending on what happens the danger of future nuclear war may either drop, or get higher.

Who will draw what conclusions is hard to judge (particular for those in positions of influence within the russian elite).

I think that Ukraine will not become a NATO country regardless of what happens. This seems to be the general concensus.

We're already in AI takeoff

Rene de Visser3y30

I was thinking specifically here of maximizing the value function (desires) across the agents interacting with other. Or more specially adapting the system in a way that it self maintains "maximizing the value function (desires) across the agents" property.

An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don't maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforceme... (read more)

2TekhneMakre3y

I doubt that because intelligence explosions or their leadups make things local.

We're already in AI takeoff

Rene de Visser3y10

How do you know they don't generalize? As far as I know, no one has solved these problems for coallitions of agents, regardless of human, theoritical or otherwise.

2TekhneMakre3y

Well the standard example is evolution: the compact mechanisms discovered first by the gradient-climbing search for fit organisms generalized to perform effectively in many domains, but not particularly to maximize fitness---we don't monomaniacally maximize number of offspring (which would improve our genetic fitness a lot relative to what we actually do). Human coalitions are made of humans, and humans come ready built with roughly the same desires and shape of cognition as you. That makes them vastly easier to interface with and approximately understand intuitively.

We're already in AI takeoff

Rene de Visser3y10

What do you mean by "technical" here?

I think solving the alignment problem for government, corporations, and other coallitions would probably help solving the alignment problem in AGI.

I guess you are saying that even if we could solve the above alignment problems it would still not go all the way to solving it for AGI? What particular gaps are you thinking of?

3TekhneMakre3y

Yeah, mainly things such that solving them for human coalitions/firms doesn't generalize. It's hard to point to specific gaps because they'll probably involve mechanisms of intelligence, which I / we don't yet understand. The point is that the hidden mechanisms that are operating in human coalitions are pretty much just the ones operating in individual humans, maybe tweaked by being in a somewhat different local context created by the coalition (Bell Labs, scientific community, job in a company, role in a society, position in a government, etc. etc.). We're well out of distribution for the ancestral environment, but not *that* far out. Humans, possibly excepting children, don't routinely invent paradigm-making novel cognitive algorithms and then apply them to everything; that sort of thing only happens at a super-human level and what effects on the world it's pointed at are not strongly constrained by it's original function. By "technical" I don't mean anything specific, exactly. I'm gesturing vaguely at the cluster of things that look like math problems, math questions, scientific investigations, natural philosophy, engineering; and less like political problems, aesthetic goals, lawyering, warfare, cultural change. The sort of thing that takes a long time and might not happen at all because it involves long chains of prerequisites on prerequisites. Art might be an example of something that's not "technical" but still matches this definition; I don't know the history but from afar it seems like there's actually quite a lot of progress in art and it's somewhat firmly sequential / prerequisited, like perspective is something you invent, and you only get cubism after perspective, and cubism seems like a stepping stone towards more abstractionism.... So if the fate of everything depended on artistic progress, we'd want to be persistently working on art, refining and discovering concepts, even if we weren't pure of soul.

Recognizing and Dealing with Negative Automatic Thoughts

Rene de Visser3y30

Yes, the postive reframing step from TEAM (version of CBT) / Feeling great by Dr David Burns is missing from the above, as is the "Magic Dial" step.

A bit odd, as I would have guessed that the above lists or taken directly from "Feeling Great", or from his web site.

2Shmi3y

The attribution is in the link.

Rene de Visser3y20

An agent typically maximizes their expected utility. i.e. they make the choices under their control that lead to the highest expected utility.

If they predict their efforts to solving aging and mitigating other risks to themselves have minimal effect on the expected utility they will spend most of their time playing Factorio while they can. This will lead to to the maximum expected utility.

If they spend all their time trying to not die, and then they die their total utility will be zero.

5rank-biserial3y

The idea isn't to spend all your time trying not to die. The idea is to spend fifty years now so you can have millions of factorio-years later.

How much chess engine progress is about adapting to bigger computers?

Rene de Visser4y10

I think you can compare modern chess programs with each other to evaluate this.

Some comparisons have been made between different modern chess engines in TCEC.

Stockfish is particularly well adapted to using lots of cores. i.e. Stockfish has a much larger advantage over the other modern programs when lots of CPU cores a available as they have optimized hash table contention very well.

If you compare NNUE stockfish to classic stockfish is also the question of how much strength stockfish NNUE loses when playing on hardware that does not support SIMD.

Similarly y... (read more)

2paulfchristiano4y

I'm pretty interested in understanding the size of this effect by scaling down the memory use as well as compute to historical levels. (This is one of my concerns about hippke's experiment, though it seems like they think it's not a big factor.)

Self-confirming predictions can be arbitrarily bad

Rene de Visser6y10

I am not sure what exactly you are meaning by predicting. You can tell the donor a different amount, than you are internally expecting to obtain.

Pattern6y100

The post concerns self-confirming predictions. The donor asked for a prediction of how much money they'll give you...after they hear your prediction. A prediction you give them would be "self-confirming" if they gave you the amount you specified. Here "prediction" refers to "the amount you tell them", as opposed to the amount

you are internally expecting to obtain.

which no one other than you actually knows.

LESSWRONG
LW

All of Rene de Visser's Comments + Replies