All of Past Account's Comments + Replies

Just wanted to give some validation. I left a comment on this post a while ago pointing out how one (or apparently a few) users can essentially down vote you however they like to silence opinions they don't agree with. Moderation is tricky and it is important to remember why. Most users on a website forum are lurkers meaning that trying to gather feedback on moderation policies has a biased sampling problem. The irony on likely not being able to leave another comment or engage in discussion is not lost on me.

At first, I thought getting soft-banned meant my... (read more)

6habryka
FWIW, my sense is that the rate-limit system triggering was a mistake on your account, and we tweaked the numbers to make that no longer happen. Still sucks that you got rate-limited for a while, but the numbers are quite different now, and you almost certainly would not have been caught in the manual review that is part of these rate limits.
-7[anonymous]

Hi, I think this is incorrect. I had to wait 7 days to write this comment and then almost forgot to. I wrote a comment critiquing a very long post (which was later removed) and was down-voted (by a single user I think) after justifying why I wrote the comment with AI-assistance. My understanding is that a single user with enough karma power can effectively "silence" any opinion they don't like by down-voting a few comments in an exchange.

I think the site has changed enough over the last several months that I am considering leaving. For me personally, choos... (read more)

3habryka
Huh, that is an update on me on how quickly rate-limiting kicks in. I don't think it's the case that a single user can effectively silence any opinion here (none of your previous few comments were downvoted by a single user as far as I can tell), but having a rate-limit that harsh just because of a single exchange seems quite bad to me. I'll talk to Raemon and Ruby about at least adjusting the values here.
4Raemon
No, because we also have a requirement of minimum-number of downvoters. (I think the current implementation has important flaws and I do still need to improve it which has been on my TODO list and hopefully will get done soon). But, even in the current implementation,  a single downvote can't rate limit you.
2cfoster0
It's absolutely fine if you want to use AI to help summarize content, and then you check that content and endorse it. I still ask if you could please flag it as such, so the reader can make an informed decision about how to read/respond to the content?
2kwiat.dev
My point is that your comment was extremely shallow, with a bunch of irrelevant information, and in general plagued with the annoying ultra-polite ChatGPT style - in total, not contributing anything to the conversation. You're now defensive about it and skirting around answering the question in the other comment chain ("my endorsed review"), so you clearly intuitively see that this wasn't a good contribution. Try to look inwards and understand why.
2cfoster0
Is this an AI summary (or your own writing)? If so, would you mind flagging it as such?
3kwiat.dev
Is it a thing now to post LLM-generated comments on LW?

So are you suggesting that ChatGPT gets aligned to the values of the human contractor(s) that provide data during finetuning, and then carries these values forward when interacting with users?

You are correct that this appears to stand in contrast one of the key benefits of CIRL games. Namely, that they allow the AI to continuously update towards the user's values. The argument I present is that ChatGPT can still learn something about the preferences of the user it is interacting with through the use of in-context value learning. During deployment, ChatGPT will then be able to learn preferences in-context allowing for continuous updating towards the user's values like in the CIRL game.

The reward is from the user which ranks candidate responses from ChatGPT. This is discussed more in OpenAI’s announcement. I edited the post to clarify this.

1Rachel Freedman
Thanks for the clarification! From OpenAI's announcement, it looks like this ranking only occurs during the finetuning portion of training (Step 2). But the user doesn't have the opportunity to provide this feedback after deployment. So are you suggesting that ChatGPT gets aligned to the values of the human contractor(s) that provide data during finetuning, and then carries these values forward when interacting with users? I'm asking because one of the key benefits of CIRL games (also called "assistance games") is that they allow the AI to continuously update towards the user's values, without freezing for deployment, and I don't fully understand the connection here.
Past Account*Ω-35-5

[Deleted]

5VojtaKovarik
Explanation for my strong downvote/disagreement: Sure, in the ideal world, this post would have a much better scholarship. In the actual world, there are tradeoffs between the number of posts and the quality of scholarship. The cost is both the time and the fact that doing literature review is a chore. If you demand good scholarship, people will write slower/less. With some posts this is a good thing. With this post, I would rather have an attrocious scholarship and 1% higher chance of the sequence having one more post in it. (Hypothetical example. I expect the real tradeoffs are less favourable.)
Past Account*Ω-32-10

[Deleted]

7janus
Thanks for suggesting "Speculations concerning the first ultraintelligent machine". I knew about it only from the intelligence explosion quote and didn't realize it said so much about probabilistic language modeling. It's indeed ahead of its time and exactly the kind of thing I was looking for but couldn't find w/r/t premonitions of AGI via SSL and/or neural language modeling. I'm sure there's a lot of relevant work throughout the ages (saw this tweet today: "any idea in machine learning must be invented three times, once in signal processing, once in physics and once in the soviet union"), it's just that I'm unsure how to find it. Most people in the AI alignment space I've asked haven't known of any prior work either. So I still think it's true that "the space of large self-supervised models hasn't received enough attention". Whatever scattered prophetic works existed were not sufficiently integrated into the mainstream of AI or AI alignment discourse. The situation was that most of us were terribly unprepared for GPT. Maybe because of our "lack of scholarship". Of course, after GPT-3 everyone's been talking about large self supervised models as a path or foundation of AGI. My observations of the lack of foresight on SSL was referring mainly to pre-GPT. & after GPT the ontological inertia of not talking about SSL means post-GPT discourse has been forced into clumsy frames. I know about "The risks and opportunities of foundation models" - it's a good overview of SSL capabilities and "next steps" but it's still very present-day focused and descriptive rather than speculation in exploratory engineering vein, which I still feel is missing. "Foundation models" has hundreds of references. Are there any in particular that you think are relevant?
6Raemon
Sorry, was being kinda lazy and hoping someone had already thought about this. This was the newer Deepmind one: https://www.lesswrong.com/posts/mTGrrX8SZJ2tQDuqz/deepmind-generally-capable-agents-emerge-from-open-ended?commentId=bosARaWtGfR836shY#bosARaWtGfR836shY I was motivated to post by this algorithm from China I heard about today: https://www.facebook.com/nellwatson/posts/10159870157893559 I think this is the older deepmind paper: https://deepmind.com/research/publications/2019/playing-atari-deep-reinforcement-learning
7Rohin Shah
I am more annoyed by the sheer confidence people have. If they were saying "this is a possibility, let's investigate" that seems fine. Re: the rest of your comment, I feel like you are casting it into a decision framework while ignoring the possible decision "get more information about whether there is a problem or not", which seems like the obvious choice given lack of confidence. If at some point you become convinced that it is impossible / too expensive to get more information (I'd be really suspicious, but it could be true) then I'd agree you should bias towards worry. I would guess that the fact that people regularly fail to inhabit the mindset of "I don't know that this is a problem, let's try to figure out whether it is actually a problem" is a source of tons of problems in society (e.g. anti-vaxxers, worries that WiFi radiation kills you, anti-GMO concerns, worries about blood clots for COVID vaccines, ...). Admittedly in these cases the people are making a mistake of being confident, but even if you fixed the overconfidence they would continue to behave similarly if they used the reasoning in your comment. Certainly I don't personally know why you should be super confident that GMOs aren't harmful, and I'm unclear on whether humanity as a whole has the knowledge to be super confident in that.
3brp
What's the practical difference between "text" and one-hots of said "text"? One-hots are the standard for inputting text into models. It is only recently that we expect models to learn their preferred encoding for raw text (cf. transformers). By taking a small shortcut, the authors of this paper get to show off their agent work without loss of generality: one could still give one-hot instructions to an agent that is learning to act in the real life.
5Quintin Pope
The summary says they use text and a search for “text” in the paper gives this on page 32: “In these past works, the goal usually consists of the position of the agent or a target observation to reach, however some previous work uses text goals (Colas et al., 2020) for the agent similarly to this work.” So I thought they provided goals as text. I’ll be disappointed if they don’t. Hopefully, future work will do so (and potentially use pretrained LMs to process the goal texts).
7TurnTrout
I don't understand your point in this exchange. I was being specific about my usage of model; I meant what I said in the original post, although I noted room for potential confusion in my comment above. However, I don't know how you're using the word.  You used the word 'model' in both of your prior comments, and so the search-replace yields "state-abstraction-irrelevant abstractions." Presumably not what you meant? That's not a "concrete difference." I don't know what you mean when you talk about this "third alternative." You think you have some knockdown argument - that much is clear - but it seems to me like you're talking about a different consideration entirely. I likewise feel an urge to disengage, but if you're interested in explaining your idea at some point, message me and we can set up a higher-bandwidth call.
7TurnTrout
I read your formalism, but I didn't understand what prompted you to write it. I don't yet see the connection to my claims. Yeah, I don't want you to spend too much time on a bulletproof grounding of your argument, because I'm not yet convinced we're talking about the same thing.  In particular, if the argument's like, "we usually express reward functions in some featurized or abstracted way, and it's not clear how the abstraction will interact with your theorems" / "we often use different abstractions to express different task objectives", then that's something I've been thinking about but not what I'm covering here. I'm not considering practical expressibility issues over the encoded MDP: ("That's also a claim that we can, in theory, specify reward functions which distinguish between 5 googolplex variants of red-ghost-game-over.") If this doesn't answer your objection - can you give me an english description of a situation where the objection holds? (Let's taboo 'model', because it's overloaded in this context)
7TurnTrout
Why would we need that, and what is the motivation for "models"? The moment we give the agent sensors and actions, we're done specifying the rewardless MDP (and its model). ETA: potential confusion - in some MDP theory, the “model” is a model of the environment dynamics. Eg in deterministic environments, the model is shown with a directed graph. i don’t use “model” to refer to an agent’s world model over which it may have an objective function. I should have chosen a better word, or clarified the distinction. If, by "tasks", you mean "different agent deployment scenarios" - I'm not claiming that. I'm saying that if we want to predict what happens, we: 1. Consider the underlying environment (assumed Markovian) 2. Consider different state/action encodings we might supply the agent. 3. For each, fix a reward function distribution (what goals we expect to assign to the agent) 4. See what the theory predicts. There's a further claim (which seems plausible, but which I'm not yet making) that (2) won't affect (4) very much in practice. The point of this post is that if you say "the MDP has a different model", you're either disagreeing with (1) the actual dynamics, or claiming that we will physically supply the agent with a different state/action encoding (2). I don't follow. Can you give a concrete example?

Because . They are the same. Does that help?

2johnswentworth
I don't have any empirical evidence, but we can think about what a flat minimum with high noise would mean. It would probably mean the system is able to predict some data points very well, and other data points very poorly, and both of these are robust: we can make large changes to the parameters while still predicting the predictable data points about-as-well, and the unpredictable data points about-as-poorly. In human terms, it would be like having a paradigm in which certain phenomena are very predictable, and other phenomena look like totally-random noise without any hint that they even could be predictable. Not sure what it would look like in the perfect-training-prediction regime, though.

The term is meant to be a posterior distribution after seeing data. If you have a good prior you could take . However, note could be high. You want trade-off between the cost of updating the prior and the loss reduction.

Example, say we have a neural network. Then our prior would be the initialization and the posterior would be the distribution of outputs from SGD.

(Btw thanks for the correction)

1Steveot
Thanks, I finally got it. What I just now fully understood is that the final inequality holds with high πn0 probability (i.e., as you say, π0 is the data), while the learning bound or loss reduction is given for π. 
2ChristianKl
They don't speak about a having a PhD but ability to get a into a top 5 graduate program. Many people who do have the ability to get into a top 5 program don't get into a top 5 graduate program but persue other directions.  The number of people with that ability level is not directly dependent on the amount of of PhD's that are given out. 
2interstice
Right, but trying to fit an unknown function with linear combinations of those features might be extremely data-inefficient, such that it is basically unusable for difficult tasks. Of course you could do better if you're not restricted to linear combinations -- for instance, if the map is injective you could invert back to the original space and apply whatever algorithm you wanted. But at that point you're not really using the Fourier features at all. In particular, the NTK always learns a linear combination of its features, so it's the efficiency of linear combinations that's relevant here. You originally said that the NTK doesn't learn features because its feature class already has a good representation at initialization. What I was trying to convey (rather unclearly, admittedly) in response is: A) There exist learning algorithms that have universal-approximating embeddings at initialization yet learn features. If we have an example of P and !Q, P-->Q cannot hold in general, so I don't think it's right to say that the NTK's lack of feature learning is due to its universal-approximating property. B) Although the NTK's representation may be capable of approximating arbitrary functions, it will probably be very slow at learning some of them, perhaps so slow that using it is infeasible. So I would dispute that it already has 'good' representations. While it's universal in one sense, there might be some other sense of 'universal efficiency' in which it's lacking, and where feature-learning algorithms can outperform it. I agree that in practice there's likely to be some relationship between universal approximation and efficiency, I just think it's worth distinguishing them conceptually. Thanks for the paper link BTW, it looks interesting.
1interstice
Ah, rereading your original comment more carefully I see that you indeed didn't say anything about 'universal learning'. You're quite right that the NTK is a universal function approximator. My apologies. However, I still disagree that this is the reason that the NTK doesn't learn features. I think that 'universal function approximation' and 'feature learning' are basically unrelated dimensions along which a learning algorithm can vary. That is, it's quite possible to imagine a learning algorithm which constructs a sequence of different embeddings, all of which are universal approximators. The paper by Greg Yang I linked gives an example of such an algorithm(I don't think he explicitly proves this but I'm pretty sure it's true) What I was trying to get at with the 'universal learning' remarks is that, although the NTK does indeed contain all finite embeddings, I believe that it does not do so in a very efficient way -- it might require disproportionately many training points to pick out what are, intuitively, fairly simple embeddings. I believe this is what is behind the poor performance of empirical NTKs compared to SGD-trained nets, as I brought up in this comment, and ultimately explains why algorithms that do 'feature learning' can outperform those that don't -- the feature learning algorithms are able to find more efficient embeddings for a given set of inputs(of course, it's possible to imagine a fixed embedding that's 'optimally efficient' in some way, but as far as I'm aware the NTK has no such property). This issue of 'embedding efficiency' seems only loosely related to the universal approximation property. To formalize this, it would be nice to develop a theory of universal inference in the setting of classification problems akin to Solomonoff induction. To effectively model this in an asymptotic theory, I think it might be necessary to increase the dimension of the model input along with the number of data points, since otherwise all universal approxima
2interstice
There's a big difference between 'universal learner' and 'fits any smooth function on a fixed input space'. The 'universal learner' property is about data efficiency: do you have bounded regret compared to any learning algorithm in some wide class? Solomonoff induction has this property with respect to computable predictors on binary strings, for instance. There are lots of learning algorithms able to fit any finite binary sequence but which are not universal. I haven't seen a good formalism for this in the neural net case, but I think it would involve letting the input dimension increase with the number of data points, and comparing the asymptotic performance of various algorithms.
1interstice
I've never heard of any result suggesting this, what's your argument? I suspect the opposite -- by the central limit theorem the partial derivatives and activations at each layer tend toward samples from a fixed distribution(differing per layer but fixed across neurons). I think this means that the NTK embedding is 'essentially finite' and actually not universal(though I'm not sure). Note that to show universality it's not enough to show that all embeddings can be found, you'll also need an argument showing that their density in the NTK embedding is bounded above zero.

Your example is interesting and clarifies exchange rates. However,

The shadow price quantifies the opportunity cost, so if I'm paid my shadow price, then that's just barely enough to cover my opportunity cost.

This is an interpretive point I'd like to focus on. When you move a constraint, in this case with price, the underlying equilibrium of the optimization shifts. From this perspective your usage of the word 'barely' stops making sense to me. If you were to 'overshoot' you wouldn't be optimal in the new optimization problem.

At this point I understand ... (read more)

I suppose this is the most correct answer. I'm not really updating very much though. From my perspective I'll continue to see cheerful price as a psychological/subjective reinvention of shadow price.

Edit: It seems clear in this context, shadow price isn't exactly measurable. Cheerful price is just the upper estimate on the shadow price.

2Joar Skalse
Yes. I imagine this is why overtraining doesn't make a huge difference. See e.g., page 47 in the main paper.

I'm going to have to spend some time unpacking the very compact notation in the post, but here are my initial reactions.

I should apologize a bit for that. To a degree I wasn't really thinking about any of the concepts in the title and only saw the connection later.

First, very clean proof of the lemma, well done there.

Thanks!

Second... if I'm understanding this correctly, each neuron activation (or set of neuron activations?) would contain all the information from some-part-of-data relevant to some-other-part-of-data and the output.

To be honest, I... (read more)

2ChristianKl
When it comes to a forum like this, it's important to incentivise people who write posts. Part of the incentive is that people control the posts they write to say what they want to say. A system that works like Google Docs where the author can choose to accept or deny requests for change would likely work better.
2Ben Pace
I did suspect you'd confused it with the Alignment Newsletter :)

Much of the same is true of scientific journals. Creating a place to share and publish research is a pretty key piece of intellectual infrastructure, especially for researchers to create artifacts of their thinking along the way. 

The point about being 'cross-posted' is where I disagree the most. 

This is largely original content that counterfactually wouldn't have been published, or occasionally would have been published but to a much smaller audience. What Failure Looks Like wasn't crossposted, Anna's piece on reality-revealing puzzles wasn't cro... (read more)

3Ben Pace
 By "AN" do you mean the AI Alignment Forum, or "AIAF"?
3Viliam
Ironically, some people already feel threatened by the high standards here. Setting them higher probably wouldn't result in more good content. It would result in less mediocre content, but probably also less good content, as the authors who sometimes write a mediocre article and sometimes a good one, would get discouraged and give up. Ben Pace gives a few examples of great content in the next comment. It would be better to easier separate the good content from the rest, but that's what the reviews are for. Well, only one review so far, if I remember correctly. I would love to see reviews of pre-2018 content (maybe multiple years in one review, if they were less productive). Then I would love to see the winning content get the same treatment as the Sequences -- edit them and arrange them into a book, and make it "required reading" for the community (available as a free PDF).
4TurnTrout
The freshly updated paper answers this question in great detail; see section 6 and also appendix B.
2TurnTrout
Great question. One thing you could say is that an action is power-seeking compared to another, if your expected (non-dominated subgraph; see Figure 19) power is greater for that action than for the other.  Power is kinda weird when defined for optimal agents, as you say - when γ=1, POWER can only decrease. See Power as Easily Exploitable Opportunities for more on this. Shortly after Theorem 19, the paper says: "In appendix C.6.2, we extend this reasoning to k-cycles (k >1) via theorem 53 and explain how theorem19 correctly handles fig. 7". In particular, see Figure 19. The key insight is that Theorem 19 talks about how many agents end up in a set of terminal states, not how many go through a state to get there. If you have two states with disjoint reachable terminal state sets, you can reason about the phenomenon pretty easily. Practically speaking, this should often suffice: for example, the off-switch state is disjoint from everything else. If not, you can sometimes consider the non-dominated subgraph in order to regain disjointness. This isn't in the main part of the paper, but basically you toss out transitions which aren't part of a trajectory which is strictly optimal for some reward function. Figure 19 gives an example of this. The main idea, though, is that you're reasoning about what the agent's end goals tend to be, and then say "it's going to pursue some way of getting there with much higher probability, compared to this small set of terminal states (ie shutdown)". Theorem 17 tells us that in the limit, cycle reachability totally controls POWER.  I think I still haven't clearly communicated all my mental models here, but I figured I'd write a reply now while I update the paper. Thank you for these comments, by the way. You're pointing out important underspecifications. :) I think one problem is that power-seeking agents are generally not that corrigible, which means outcomes are extremely sensitive to the initial specification.
Load More