Is a language model performing utility maximization during training?
Let's ignore RLHF for now and just focus on next token prediction. There's an argument that, of course the LM is maximizing a utility function - namely it's log score on predicting the next token, over the distribution of all text on the internet (or whatever it was trained on). An immediate reaction I have to this is that this isn't really what we want, even ignoring that we want the text to be useful (as most internet text isn't).
This is clearly related to all the problems ar...
Are language models utility maximizes?
I think there's two different major "phases" of a language model, training and runtime. During training, the model is getting "steered" toward some objective function - first getting the probability of the next token "right", and then getting positive feedback from humans during rlhf (I think? I should read exactly how rlhf works). Is this utility maximization? It doesn't feel like it - I think I'll put my thoughts on this in another comment.
During runtime, at first glance, the model is kind of "deter...
There's sort of a general "digestive issues are the root of all anxiety/evil" thread I've seen pop up in a bunch of rationalist-adjacent spaces:
I'm curious if there's any synthesis / study / general theory of this.
As my own datapoint, I've had pretty bad digestive issues, trouble eating, and ahedonia for a while. I recent...
cure Eliezer's chronic fatigue so he can actually attempt to
grant humanity a couple more bits of information-theoretic dignitysave the world
Possibly relevant: I know someone who had chronic fatigue syndrome which largely disappeared after she had her first child. I could possibly put her in contact with Eliezer or someone working on the problem.
[Meta] The jump from Distinct Configurations to Collapse Postulates in the Quantum Physics and Many Worlds sequence is a bit much - I don't think the assertiveness of Collapse Postulates is justified without a full explanation of how many worlds explains things. I'd recommend adding at least On Being Decoherent in between.
Which random factors caused the frostwing snippers to die out? Them migrating out? Competitors or predators migrating in? Or is there some chance of not getting the seed, even if they're the only species left? I didn't get a good look at the source code, but I thought things were fairly deterministic once only one species was left.
In most formulations, the five people are on the track ahead, not in the trolley.
I took a look at the course you mentioned:
It looks like I got some of the answers wrong.
Where am I?
In the trolley. You, personally, are not in immediate danger.
Who am I?
A trolley driver.
Who's in the trolley?
You are. No one in the trolley is in danger.
Who's on the tracks?
Five workers ahead, one to the right.
Do I work for the trolley company?
Yes.
The problem was not as poorly specified as you implied it to be.
What year is it?
Current year.
Where am I?
Near a trolley track.
Who am I?
Yourself.
Who's in the trolley?
You don't know.
Who's on the tracks?
You don't know.
Who designed the trolley?
You don't know.
Who is responsible for the brake failure?
You don't know.
Do I work for the trolley company?
Assume that you're the only person who can pull the lever in time, and it wouldn't be difficult or costly for you to do so. If your answer still depends on whether or not you work for the trolley company, you are different from most people, and should explain both cases expli...
If Scarlet pressed the PANIC button then she would receive psychiatric counseling, three months mandatory vacation, optional retirement at full salary and disqualification for life from the most elite investigative force in the system.
This sounds familiar, but some quick searching didn't bring anything up. Is it a reference to something?
From the old wiki discussion page:
I'm thinking we can leave most of the discussion of probability to Wikipedia. There might be more to say about Bayes as it applies to rationality but that might be best shoved in a separate article, like Bayesian or something. Also, I couldn't actually find any OB or LW articles directly about Bayes' theorem, as opposed to Bayesian rationality--if anyone can think of one, please add it. --A soulless automaton 19:31, 10 April 2009 (UTC)
For wiki pages which are now tags, should we remove linked LessWrong posts, since they are likely listed below?
What should the convention be for linking to people's names? For example, I have seen the following:
Finally, should the "see also" section be a comma-separated list after the first paragraph, or a bulleted list at the end of the page?
Thanks. I had skimmed that paper before, but my impression was that it only briefly acknowledged my main objection regarding computational complexity on page 4. Most of the paper involves analogies with evolution/civilization which I don't think are very useful-my argument is that the difficulty of designing intelligence should grow exponentially at high levels, so the difficulty of relatively low-difficulty tasks like designing human intelligence doesn't seem that important.
On page 35, Eliezer writes:
I am not aware of anyone who has defended...
I believe that fast takeoff is impossible, because of computational complexity.
This post presents a pretty clear summary of my thoughts. Essentially, if the problem of “designing an AI with intelligence level n” scales at any rate greater than linear, this will counteract any benefit an AI received from its increased intelligence, and so its intelligence will converge. I would like to see a more formal model of this.
I am aware that Gwern has responded to this argument, but I feel like he missed the main point. He gives many arguments showi...
Yudkowsky addresses some of these objections in more detail in "Intelligence Explosion Microeconomics".
Yes, you could make the code more robust by allowing the agent to act once its found a proof that any action is superior. Then, it might find a proof like
U(F) = 5
U(~F) = 10
10 > 5
U(~F) > U(F)
However, there's no guarantee that this will be the first proof it finds.
When I say "look for a proof", I mean something like "for each of the first 10^(10^100)) Godel numbers, see if it encodes a proof. If so, return that action.
In simple cases like the one above, it likely will find the correct proof first. However, as the universe gets more complicated (as our universe is), there is a greater chance that a spurious proof will be found first.
Did you consider withdrawal effects at all? A day without caffeine having used it the previous few days is going to be very different from one where you haven't used caffeine in months.