LESSWRONG
LW

All of Tofly's Comments + Replies

Did you consider withdrawal effects at all? A day without caffeine having used it the previous few days is going to be very different from one where you haven't used caffeine in months.

2niplav2y

That's a good point. I have, but shelved them for sometime later™. If it were true that withdrawal effects are important, you'd expect the difference in placebo-caffeine scores to drop as the experiment progresses, but I haven't checked yet.

Tofly's Shortform

Tofly2y10

What exactly is being done - what type of thing is being created - when we run a process like "use gradient descent to minimize a loss function on training data, as long as the loss function is also being minimized on test data"?

Tofly's Shortform

Tofly2y10

Is a language model performing utility maximization during training?

Let's ignore RLHF for now and just focus on next token prediction. There's an argument that, of course the LM is maximizing a utility function - namely it's log score on predicting the next token, over the distribution of all text on the internet (or whatever it was trained on). An immediate reaction I have to this is that this isn't really what we want, even ignoring that we want the text to be useful (as most internet text isn't).

This is clearly related to all the problems ar... (read more)

1Tofly2y

Tofly's Shortform

Tofly2y10

Are language models utility maximizes?

I think there's two different major "phases" of a language model, training and runtime. During training, the model is getting "steered" toward some objective function - first getting the probability of the next token "right", and then getting positive feedback from humans during rlhf (I think? I should read exactly how rlhf works). Is this utility maximization? It doesn't feel like it - I think I'll put my thoughts on this in another comment.

During runtime, at first glance, the model is kind of "deter... (read more)

2Vladimir_Nesov2y

Decision theory likes to put its foot down on a particular preference, and then ask what follows. During inference, a pre-trained model seems to be encoding something that can loosely be thought of as (situation, objective) pairs. The embeddings it computes (residual stream somewhere in the middle) is a good representation of the situation for the purpose of pursuing the objective, and solves part of the problem of general intelligence (being able to pursue ~arbitrary objectives allows pursuing ~arbitrary instrumental objectives). Fine-tuning can then essentially do interpretability to the embeddings to find the next action useful for pusuing the objective in the situation. System prompt fine-tuning makes specification of objectives more explicit. This plurality of objectives is unlike having a specific preference, but perhaps there is some "universal utility" of being a simulator that seeks to solve arbitrary decision problems given by (situation, objective) pairs, and to take intentional stance on situations that don't have an objective explicitly pointed out, eliciting an objective that fits the situation, and then pursuing that. With an objective found in environment, this is similar to one of the things corrigibility does, adopting preference that's not originally part of the agent. And if elicitation of objectives for a situation can be made pseudokind, this line of thought might clarify the intuition that the concept of pseudokindness/respect-for-boundaries has some naturality to it, rather than being purely a psychological artifact of desperate search for rationalizations that would argue possibility of humanity's survival.

1Tofly2y

Is a language model performing utility maximization during training? Let's ignore RLHF for now and just focus on next token prediction. There's an argument that, of course the LM is maximizing a utility function - namely it's log score on predicting the next token, over the distribution of all text on the internet (or whatever it was trained on). An immediate reaction I have to this is that this isn't really what we want, even ignoring that we want the text to be useful (as most internet text isn't). This is clearly related to all the problems around overfitting. My understanding is that in practice, this is solved through a combination of regularization, and stopping training once test loss stops decreasing. So even if a language model was a UM during training, we already have some guardrails on it. Are they enough?

Tofly's Shortform

Tofly2y20

I'm going to use this space to blurt some thoughts about ai risk.

1Tofly2y

Are language models utility maximizes? I think there's two different major "phases" of a language model, training and runtime. During training, the model is getting "steered" toward some objective function - first getting the probability of the next token "right", and then getting positive feedback from humans during rlhf (I think? I should read exactly how rlhf works). Is this utility maximization? It doesn't feel like it - I think I'll put my thoughts on this in another comment. During runtime, at first glance, the model is kind of "deterministic" (wrong word), in that it's "just multiplying matrices", but maybe it "learned" some utility maximizers during training and they're embedded within it. Not sure if this is actually possible, or if it happens in practice, and if the utility maximizers are dominate the agent or can be "overruled" by other parts of it.

Product Endorsement: Apollo Neuro

Tofly2y40

Personal data point: I tried this last night, and had 4 vivid and intense nightmares. This is not a usual experience for me.

5Elizabeth2y

oh no. Can I ask you to fill out the form, so I get all the data in one place?

Tofly's Shortform

Tofly2y10

There's sort of a general "digestive issues are the root of all anxiety/evil" thread I've seen pop up in a bunch of rationalist-adjacent spaces:

I'm curious if there's any synthesis / study / general theory of this.

As my own datapoint, I've had pretty bad digestive issues, trouble eating, and ahedonia for a while. I recent... (read more)

The lessons of Xanadu

Tofly3y164

Reminds me of arbital.

7plex3y

Fun trivia: Arbital was internally called project zanadu pre-naming, as an attempt to keep awareness of these failure modes in mind.

2mako yass3y

Arbital was functional and fine. It only failed at the adoption stage for reasons that're still mysterious to me. I'm reluctant to even say that it did fail, I still refer people to a couple of the articles there pretty often.

The case for turning glowfic into Sequences

Tofly3y10

cure Eliezer's chronic fatigue so he can actually attempt to ~~grant humanity a couple more bits of information-theoretic dignity~~ save the world

Possibly relevant: I know someone who had chronic fatigue syndrome which largely disappeared after she had her first child. I could possibly put her in contact with Eliezer or someone working on the problem.

7Said Achmiz3y

Wouldn’t this solution be, ahem, biologically infeasible for Eliezer to implement?

All I Know is that I Know Nothing

Tofly3y30

The entrepreneur contacted me again the next day..."I have been cooking all week."

Hmmmmmm...

2lsusr3y

Fixed. Thanks.

Replacing Karma with Good Heart Tokens (Worth $1!)

Tofly3y00

I strongly upvoted the comment above.

(Then I retracted my upvote.)

0gjm3y

For the avoidance of doubt, that is the response I was expecting. :-)

Replacing Karma with Good Heart Tokens (Worth $1!)

[+]Tofly3y-70

-7Gordon Seidoh Worley3y

0gjm3y

I strongly upvoted the comment above. (Then I retracted my upvote.)

-1[comment deleted]3y

0AprilSR3y

This is a good lesson about game theory. Strong upvote.

February 2022 Open Thread

Tofly3y20

[Meta] The jump from Distinct Configurations to Collapse Postulates in the Quantum Physics and Many Worlds sequence is a bit much - I don't think the assertiveness of Collapse Postulates is justified without a full explanation of how many worlds explains things. I'd recommend adding at least On Being Decoherent in between.

2Pattern3y

I was going to suggest this should go on a talk page, but it looks like tags have those but not Sequences.

Cornell Meetup

Answer by ToflyNov 23, 202140

I am a first year CS PhD student at Cornell, and interested (though not currently working on it). I will DM you.

[Prediction] We are in an Algorithmic Overhang, Part 2

Tofly4y60

The brain may also be excessively complicated to defend against parasites.

2021 Darwin Game - Tundra

Tofly4y10

Which random factors caused the frostwing snippers to die out? Them migrating out? Competitors or predators migrating in? Or is there some chance of not getting the seed, even if they're the only species left? I didn't get a good look at the source code, but I thought things were fairly deterministic once only one species was left.

3lsusr4y

The number of babies an organism has a random element. If it eats enough to replace itself then it will replace itself once in average but might do so zero or multiple times on average. Migration is random too so there's a small chance of it migrating away too. The frosting snippers mostly needn't worry about invaders thanks to the heat. A species will always get the seed if it is the only species left.

6SarahNibs4y

IIRC every non-eaten organism gets 80% of its size, plus its nutrition, x10, in lottery tickets. Every lottery ticket has a 1/size/10 of being a new organism in the next generation.

The Trolley Problem

Tofly4y30

In most formulations, the five people are on the track ahead, not in the trolley.

I took a look at the course you mentioned:

It looks like I got some of the answers wrong.

Where am I?

In the trolley. You, personally, are not in immediate danger.

Who am I?

A trolley driver.

Who's in the trolley?

You are. No one in the trolley is in danger.

Who's on the tracks?

Five workers ahead, one to the right.

Do I work for the trolley company?

Yes.

The problem was not as poorly specified as you implied it to be.

The Trolley Problem

Tofly4y*10

What year is it?

Current year.

Where am I?

Near a trolley track.

Who am I?

Yourself.

Who's in the trolley?

You don't know.

Who's on the tracks?

You don't know.

Who designed the trolley?

You don't know.

Who is responsible for the brake failure?

You don't know.

Do I work for the trolley company?

Assume that you're the only person who can pull the lever in time, and it wouldn't be difficult or costly for you to do so. If your answer still depends on whether or not you work for the trolley company, you are different from most people, and should explain both cases expli... (read more)

2lsusr4y

If I don't work for the trolley company then I shouldn't touch the equipment because uninvited non-specialists messing with heavy machinery is dangerous (some special circumstances excepted) and disruptive. It makes the world less safe. Since I'm not pulling the lever, I have my hands free to film the trolley crash. This is my most likely path to systemic change. It causes the least harm and protects the most people. If I do work for the trolley company then I should pull the lever because my obligation to protect the many passengers in the trolley doing exactly what they're supposed to outweighs my obligation to protect a single idiot playing on the tracks. (He's probably not a worker who's supposed to be there because if he was I would know who he is and you specified that I don't.)

Vaccination and House Rules

Tofly4y*20

[This comment is no longer endorsed by its author]Reply

4jefftk4y

Fearmongering. There's nothing sensitive on the card.

Matryoshka Faraday Box

Tofly4y10

If Scarlet pressed the PANIC button then she would receive psychiatric counseling, three months mandatory vacation, optional retirement at full salary and disqualification for life from the most elite investigative force in the system.

This sounds familiar, but some quick searching didn't bring anything up. Is it a reference to something?

6ejacob4y

The Hitchhiker's Guide to the Galaxy has "DON'T PANIC" printed in "large, friendly letters." The description of the PANIC button here is similar. Might not be related.

Bayes Theorem

Tofly5y30

From the old wiki discussion page:

I'm thinking we can leave most of the discussion of probability to Wikipedia. There might be more to say about Bayes as it applies to rationality but that might be best shoved in a separate article, like Bayesian or something. Also, I couldn't actually find any OB or LW articles directly about Bayes' theorem, as opposed to Bayesian rationality--if anyone can think of one, please add it. --A soulless automaton 19:31, 10 April 2009 (UTC)

I'd rather go for one article than break out a separate one for Bay

... (read more)

2Ruby5y

Thanks, Tofly! Flawless job with the edits.

The Wiki is Dead, Long Live the Wiki! [help wanted]

Tofly5y10

For wiki pages which are now tags, should we remove linked LessWrong posts, since they are likely listed below?

What should the convention be for linking to people's names? For example, I have seen the following:

LessWrong profile
Personal website/blog
Wiki/tag page on person
Wikipedia article on person
No link
No name

Finally, should the "see also" section be a comma-separated list after the first paragraph, or a bulleted list at the end of the page?

3Ruby5y

Great questions! I'll answer with what I think, and people can argue with me if they want about what the conventions should be. If they seem like not especially top posts for the tag, then yes. If the list seems like it is really top posts for the tag, then seems good to keep them. By top I mean "introduces the term" or "is really good explanation for it". For a LessWrong user with a modern LessWrong account, I think we should link to their current profile page, especially if it's currently linking to a wiki page that's not very good. If there's current a link to a wiki page for the person and it has a bunch of details, I think that should be kept, but we should update the wiki profile page to have a link to their LessWrong profile. If there's a link to a Wikipedia page (e.g. Kurzweil) or a personal blog, I think that's good to leave as is. If there's no link, that's fine to leave as is for the purpose of this import edit, but supererogatory to add a link to elsewhere. No name seems fine if there's a link to a post. You can see the name via hover-over. I personally think that the comma-separated list near the top is very valuable for helping quickly discover adjacent topics they might be interested in. My thought is: there should be a comma-separated list near the top for especially related tags, wikis, and maybe sequences. If there's additional stuff, e.g., links to external links to other resources, posts, whatever, those should be in a separate list further down on the page. I'm actually unsure of what the title of those sections should be when there are two. Seems weird to have them both be "see also", so sometimes I've used "related tags" and I saw someone else write "related tags and wikis" which also seems fine. I'm interested in suggestions.

Tofly's Shortform

Tofly5y10

Thanks. I had skimmed that paper before, but my impression was that it only briefly acknowledged my main objection regarding computational complexity on page 4. Most of the paper involves analogies with evolution/civilization which I don't think are very useful-my argument is that the difficulty of designing intelligence should grow exponentially at high levels, so the difficulty of relatively low-difficulty tasks like designing human intelligence doesn't seem that important.

On page 35, Eliezer writes:

I am not aware of anyone who has defended

... (read more)

Tofly's Shortform

Tofly5y*60

I believe that fast takeoff is impossible, because of computational complexity.

This post presents a pretty clear summary of my thoughts. Essentially, if the problem of “designing an AI with intelligence level n” scales at any rate greater than linear, this will counteract any benefit an AI received from its increased intelligence, and so its intelligence will converge. I would like to see a more formal model of this.

I am aware that Gwern has responded to this argument, but I feel like he missed the main point. He gives many arguments showi... (read more)

3David Scott Krueger (formerly: capybaralet)5y

Regarding your "intuition that there should be some “best architecture”, at least for any given environment, and that this architecture should be relatively “simple”.", I think: 1) I'd say "task" rather than "environment", unless I wanted to emphasize that I think selection pressure trumps the orthogonality thesis (I'm ambivalent, FWIW). 2) I don't see why it should be "simple" (and relative to what?) in every case, but I sort of share this intuition for most cases... 3) On the other hand, I think any system with other agents probably is much more complicated (IIUC, a lot of people think social complexity drove selection pressure for human-level intelligence in a feedback loop). At a "gears level" the reason this creates an insatiable drive for greater complexity is that social dynamics can be quite winner-takes-all... if you're one step ahead of everyone else (and they don't realize it), then you can fleece them.

3David Scott Krueger (formerly: capybaralet)5y

I don't think asymptotic reasoning is really the right tool for the job here. We *know* things level off eventually because of physical limits (https://en.wikipedia.org/wiki/Limits_of_computation). But fast takeoff is about how fast we go from where we are now to (e.g.) a superintelligence with a decisive strategic advantage (DSA). DSA probably doesn't require something near the physical limits of computation.

2ChristianKl5y

There's no good reason to assume protein folding to be NP-hard. DeepMind seems to make good progress on it.

3Donald Hobson5y

The most "optimistic" model in that post is linear. That is a model where making something as smart as you is a task of fixed difficulty. The benifit you gain by being smarter counterbalances the extra difficulty of making something smarter. In all the other models, making something as smart as you gets harder as you get smarter. (I am not talking about biological reproduction here, or about an AI blindly copying itself, I am talking about writing code that is as smart as you are from scratch). Suppose we gave a chicken and a human access to the same computer, and asked each to program something at least as smart as they were. I would expect the human to do better than the chicken. Likewise I would bet on a team of IQ 120 humans producing an AI smarter than they are over a team of IQ 80 humans producing something smarter than they are. (Or anything, smarter than a chicken really). A few extra IQ points will make you a slightly better chessplayer, but is the difference between not inventing minmax and not being able to write a chess playing program at all, and inventing minmax. Making things smarter than yourself gets much easier as you get smarter, which is why only smart humans have a serious chance of managing it. Instead of linear, try squareroot, or log.

6Donald Hobson5y

Firstly, if we are talking actual computational complexity, then the mathematical background is already implicitly talking about the fastest possible algorithm to do X. Whether or not P=NP is an unsolved problem. Predicting how a protien will fold is in BQP, which might be easier than NP. (another unsolved problem) Computational complexity classes often don't matter in practice. If you are solving the travelling salesman problem, you rarely need the shortest path, a short path is good enough. Secondly, the P vs NP is worst case. There are some special cases of the travelling salesman that are easy to solve. Taking an arbitrary protein and predicting how it will fold might be computationally intractable, but the work here is done by the word "arbitrary". There are some protiens that are really hard to predict, and some that are easier. Can molecular nanotech be made using only the easily predicted protiens? (Also, an algorithm doesn't have an intelligence level, it has an intelligence to compute relation. Once you have invented minmax, increasing the recursion depth takes next to no insight into intelligence. Given a googleplex flop computer, your argument obviously fails, because any fool could bootstrap intelligence on that.) I agree. I think that AIXI, shows that there is a simple optimal design with unlimited compute. There being no simple optimal design with finite compute would be somewhat surprising. (I think logical induction is something like only exponentially worse than any possible mathematical reasoner in use of compute) A model in which both are true. Suppose that there was a design of AI that was optimal for its compute. And suppose this design was reasonably findable, ie a bunch of smart humans could find this design with effort. And suppose this design was really, really smart. (Humans often get the answer wrong even on the cases where the exact maths takes a trivial amount of compute, like the doctors with a disease that has prevalence 1 in 1

2avturchin5y

I wrote once about levels of AI self-improvement and come to a similar conclusion: any more advance version of such AI will require more and more extensive testing, to ensure its stability and alignment, and the complexity of the testing task will grow very quickly, thus slowing down any intelligent explosion. This, however, doesn't preclude creation of Dangerous AI (capable to solve the task of human extinction and just slight superhuman in some domains).

Zack_M_Davis5y100

Yudkowsky addresses some of these objections in more detail in "Intelligence Explosion Microeconomics".

Thoughts on the 5-10 Problem

Tofly6y20

Doesn't that mean the agent never makes a decision?

[This comment is no longer endorsed by its author]Reply

2Suh Dude6y

Not really. It means that the agent never mathematically proves that it will make a decision before it makes it. The agent makes decisions by comparing expected utilities on different actions, not by predicting what it will do.

Thoughts on the 5-10 Problem

Tofly6y10

Yes, you could make the code more robust by allowing the agent to act once its found a proof that any action is superior. Then, it might find a proof like

U(F) = 5

U(~F) = 10

10 > 5

U(~F) > U(F)

However, there's no guarantee that this will be the first proof it finds.

When I say "look for a proof", I mean something like "for each of the first 10^(10^100)) Godel numbers, see if it encodes a proof. If so, return that action.

In simple cases like the one above, it likely will find the correct proof first. However, as the universe gets more complicated (as our universe is), there is a greater chance that a spurious proof will be found first.