All of ChosunOne's Comments + Replies

I think the factors that determine your reference class can be related to changes over time in the environment you inhabit, not just how you are built.  This is what I mean by not necessarily related to reproduction.  If I cloned a billion people with the same cloning vat over 1000 years, how then would you determine reference class?  But maybe something about the environment would narrow that reference class despite all the people being made by the same process (like the presence or absence of other things in the environment as they relate to the people coming out of the vat).

2Ape in the coat
In principle - yes. Different settings of probability experiments naturally lead to different reference classes. But in case of DA this seem to boil down to the question of reproduction. A person created by this vat may start from assuming that you are a random person from that billion and then adjust their credence based on available evidence as usual. Like if you checked the year of your creation, now you treat yourself as a random person from the people created in this vat specifically in this year. You can frame is as change of reference class or as simple bayesian update on the available evidence.

I think there is another assumption (or another way of framing your assumption, I'm not sure if it is really distinct), which is that all people throughout time are alike. It's not necessarily groundless, but it is an assumption.  There could be some symmetry breaking factor that determines your position that doesn't simply align with "total humans that ever exist", and thus renders using the "birth rank" inappropriate to determine what number you have.  This thing doesn't have to be related to reproduction though.

1Ape in the coat
It's not a distinct assumption. It's the same assumption, but formulated in a terribly confusing manner as a lot of things in anthropics are.  The "likeness" is relevant for SSA reference class which is somewhat of a free parameter in the theory. The correct reference class is the group of people from which you could've been anyone. And this question is very much related to reproduction in case of Doomsday argument. I'll be talking about reference classes and how it's technically possible to salvage SSA with them in a future post.

It still seems slightly fuzzy in that other than check/mate situations no moves are fully mandatory and eg recaptures may occasionally turn out to be the wrong move?

Indeed it can be difficult to know when it is actually better not to continue the line vs when it is, but that is precisely what MCTS would help figure out.  MCTS would do actual exploration of board states and the budget for which states it explores would be informed by the policy network.  It's usually better to continue a line vs not, so I would expect MCTS to spend most of its bud... (read more)

In chess, a "line" is sequence of moves that are hard to interrupt.  There are kind of obvious moves you have to play or else you are just losing (such as recapturing a piece, moving king out of check, performing checkmate etc).  Leela uses the neural network more for policy, which means giving a score to a given board position, which then the MCTS can use to determine whether or not to prune that direction or explore that section more.  So it makes sense that Leela would have an embedding of powerful lines as part of its heuristic, since it... (read more)

4eggsyntax
Ah, ok, thanks for the clarification; I assumed 'line' just meant 'a sequence of moves'. I'm more of a go player than a chess player myself. It still seems slightly fuzzy in that other than check/mate situations no moves are fully mandatory and eg recaptures may occasionally turn out to be the wrong move? But I retract my claim that this paper is evidence of search, and appreciate you helping me see that.

The cited paper in Section 5 (Conclusion-Limitations) states plainly:


(2) We focus on look-ahead along a single line of play; we do not test whether Leela compares multiple different lines of play (what one might call search).  ... (4) Chess as a domain might favor look-ahead to an unusually strong extent.

The paper is more just looking at how Leela evaluates a given line rather than doing any kind of search.  And this makes sense.  Pattern recognition is an extremely important part of playing chess (as a player myself), and it is embedded in ... (read more)

1eggsyntax
It's not clear to me that there's a very principled distinction between look-ahead and search, since there's not a line of play that's guaranteed to happen. Search is just the comparison of look-ahead on multiple lines. It's notable that the paper generally talks about "look-ahead or search" throughout. That said, I haven't read this paper very closely, so I recognize I might be misinterpreting.
ChosunOne1110

My crux is that LLMs are inherently bad at search tasks over a new domain.  Thus, I don't expect LLMs to scale to improve search.

Anecdotal evidence:  I've used LLMs extensively and my experience is that LLMs are great at retrieval but terrible at suggestion when it comes to ideas.  You usually get something resembling an amalgamation of Google searches vs. suggestions from some kind of insight.

4eggsyntax
[EDIT: @ChosunOne convincingly argues below that the paper I cite in this comment is not good evidence for search, and I would no longer claim that it is, although I'm not necessarily sold on the broader claim that LLMs are inherently bad at search (which I see largely as an expression of the core disagreement I present in this post).] The recently-published 'Evidence of Learned Look-Ahead in a Chess-Playing Neural Network' suggests that this may not be a fundamental limitation. It's looking at a non-LLM transformer, and the degree to which we can treat it as evidence about LLMs is non-obvious (at least to me). But it's enough to make me hesitant to conclude that this is a fundamental limitation rather than something that'll improve with scale (especially since we see performance on planning problems, which in my view are essentially search problems, improving with scale).
ChosunOne62

To your question of what to do if you are outmatched and you only have an ASI at your disposal, I think the most logical thing to do is "do what the ASI tells you to".  The problem is that we have no way of predicting the outcomes if there is truly an ASI in the room.  If it's a superintelligence it is going to have better suggestions than anything you can come up with.

ChosunOne30

Then I wonder, at what point does that matter?  Or more specifically, when does that matter in the context of ai-risk?

Clearly there is some relationship between something like "more compute" and "more intelligence" since something too simple cannot be intelligent, but I don't know where that relationship breaks down.  Evolution clearly found a path for optimizing intelligence via proxy in our brains, and I think the fear is that you may yet be able to go quite further than human-level intelligence before the extra compute fails to deliver more me... (read more)

1Nicolas Villarreal
I think we're seeing where that relationship is breaking down presently, specifically of compute and intelligence, as, while it's difficult to see what's happening inside of top AI companies, it seems like they're developing new systems/techniques, not just scaling up the same stuff anymore. In principle, though, I'm not sure it's possible to know in advance when such a correlation will break down, unless you have a deeper model of the relationship between those correlations (first order signs) and the higher level concept in question, which, in this case we do not. 
ChosunOne31

So if I understand your point correctly, you expect something like "give me more compute" at some point fail to deliver more intelligence since intelligence isn't just "more compute"?

2Nicolas Villarreal
Yes. And in one sense that is trivial, there's plenty of algorithms you can run on extremely large compute that do not lead to intelligent behavior, but in another sense it is non-trivial because all the algorithms we have that essentially "create maps" as in representations of some reality need to have that domain specified that they're supposed to represent or learn, in order to create arbitrary domains an agent needs to make second order signs their goal - see my last post.
ChosunOne12

I don't think you claim has support as presented.  Part of the problem surrounding the question is that we still don't really have any way of measuring how "conscious" something is.  In order to claim that something is or isn't conscious, you should have some working definition of what conscious means, and how it can be measured.  If you want to have a serious discussion instead of competing emotional positions, you need to support the claim with points that can be confirmed or denied.  Why doesn't a goldfish have consciousness, or an e... (read more)

ChosunOne20

Well ultimately no information about the past is truly lost as far as we know.  A hyper-advanced civilization could collect all the thermal radiation from earth reflected off of various celestial bodies and recover a near complete history, at least in principle.  So I think the more you make it easy for yourself to be reconstructed/resurrected/what have you the sooner it would likely be, and the less alien of an environment you would find yourself in after the fact.  Cryo is a good example of having a reasonable expectation of where to end up barring catastrophe since you are preserving a lot of you in good form.

An interesting consequence of your description is that resurrection is possible if you can manage to reconstruct the last brain state of someone who had died.  If you go one one step further, then I think it is fairly likely that experience is eternal, since you don't experience any of the intervening time (akin to your film reel analogy with adding extra frames in between) being dead and since there is no limit to how much intervening time can pass.

1ProgramCrafter
*preferably not the last state but some where the person felt normal. I believe that's right! Though, if person can be reconstructed from N bits of information, and dead body retains K << N, then we need to save N-K bits (or maybe all N, for robustness) somewhere else. It's an interesting question how many bits can be inferred from social networks trace of the person, actually.

I'm curious how much space is left after learning the MSP in the network.  Does representing the MSP take up the full bandwidth of the model (even if it is represented inefficiently)?  Could you maintain performance of the model by subtracting out the contributions of anything else that isn't part of the MSP?

1Adam Shai
Cool question. This is one of the things we'd like to explore more going forward. We are pretty sure this is pretty nuanced and has to do with the relationship between the (minimal) state of the generative model, the token vocab size, and the residual stream dimensionality. One your last question, I believe so but one would have to do the experiment! It totally should be done. check out the Hackathon if you are interested ;)

I observe this behavior a lot when using GPT-4 to assist in code.  The moment it starts spitting out code that has a bug, the likelihood of future code snippets having bugs grows very quickly.

1Hoagy
I've not noticed this but it'd be interesting if true as it seems that the tuning/RLHF has managed to remove most of the behaviour where it talks down to the level of the person writing as evidenced by e.g. spelling mistakes. Should be easily testable too.
2Viliam
Sometimes it seems that humans do it, too. For example, when I make a typo, it is quite likely that I made another typo in the same paragraph. (Alternative explanation: I am more likely to make mistakes when I am e.g. tired, so having made a mistake is evidence for being tired, which increases the chance of other mistakes being made.) ((On the other hand, there may be a similar explanation for the GPT, too.))

I've found that using Bing/Chat-GPT has been enormously helpful in my own workflows.  No need to have to carefully read documentation and tutorials just to get a starter template up and running.  Sure it breaks here and there, but it seems way more efficient to look up stuff when it goes wrong vs. starting from scratch.  Then, while my program is running, I can go back and try to understand what all the options do.  

It's also been very helpful for finding research on a given topic and answering basic questions about some of the main ideas.  

I'm not sure how that makes the problem much easier?  If you get the maligned superintelligence mask, it only needs to get out of the larger model/send instructions to the wrong people once to have game over scenario.  You don't necessarily get to change it after the fact.  And changing it once doesn't guarantee it doesn't pop up again.

This could be true, but then you still have the issue of there being "superintelligent malign AI" as one of the masks if your pile of masks is big enough.

1Noosphere89
It is an easier problem, since there is no true identity, thus we can change it easily to a friendly mask rather than an unaligned mask.

At a crude level, the earth represents a source of resources with which to colonize the universe shard and maximize its compute power (and thus find the optimal path to achieve its other goals).  Simply utilizing all the available mass on earth to do that as quickly as possible hedges against the possibility of other entities in the universe shard from impeding progress toward its goals.  

The question I've been contemplating is "Is it worth it to actually try to spend any resources dissassembling all matter on Earth given the cost of needing to d... (read more)

Thanks for the measured response.

If I understand the following correctly:

Putin made it very clear on the day of the attack that he was threatening nukes to anyone who "interfered" in Ukraine, with his infamous "the consequences will be such as you have never seen in your entire history" -speech. NATO has been helping Ukraine by training their forces and supplying materiell for years before the invasion, and vowed to keep doing so. This can be considered "calling his bluff" to an extent, or as a piecemeal maneuver in it's own right. Yet they withdrew their

... (read more)
4Dojan
I think it was deliberately vague. This allows Putin room to choose his response due to exact later consequences, without being bound to his own word. The way NATO is interpreting it sure seems to be that weapons are ok but troops are not, and Putin has accepted that, with only some non-committal grumbling. I think the fact that NATO was already providing that before the invasion makes a strong "status quo" argument. Also it has historically counted as "not participating", however ridiculous and arbitrary this may seem. Scott Alexander wrote more on this. In my understanding this is very feasible indeed. Within hours of the invasion, the new status quo had emerged: NATO was sending weapons/money/intelligence and doing sanctions/UN hearings/etc, and Russia was advancing conventionally. The status quo hasn't really changed since then, except that a; Ukrainian resistance is is much stronger than expected, and b; western sanctions are much stronger than expected. If China came down on one side or the other, that would shift the status quo; or if Russia goes through with it's chemical weapons gambit, or if NATO escalates support. Or if the ground war starts leaning one way or the other. Breaking the status quo is always counted as a "Move", however contrived the status quo. I think this would be a major major crisis, going down in history alongside the Cuban missile crisis. I think Putin would basically interpret this as a totally unprovoked attack, at least publicly, likening it to Russian forces shooting down NATO planes inside NATO airspace. It would be a massive escalation, and Putin would have to do something in response, or loose all credibility. Whether that thing would then escalate further is hard to know. I don't want to find out. I'm not read up on the "MiG Valley" history, but my understanding is that a; everyone pretended that the pilots were not Russian, and b; this was before the doctrine of MAD was fully established. But again, I don't know the history

Yes I did, and it doesn't follow that nuclear retaliation is immediate. 

Beaufre notes that for piecemeal maneuvers to be effective, they have to be presented as fait accompli – accomplished so quickly that anything but nuclear retaliation would arrive too late to do any good and of course nuclear retaliation would be pointless

Failure to perform the fait accompli means that options other than nuclear retaliation are possible.  

When Putin called that obvious bluff, it would have damaged the credibility and thus the deterrence value of that same sta

... (read more)
5Dojan
My apologies. I found myself convinced of these very points after reading the article, but I can see now how my words could come across as standoffish. No insult intended :) My reading of both the text quoted and reality as presented, is that this line of thinking only applies when operating inside or very close to the opponents red lines. The next paragraph starts: And Ukraine is not a member. NATO's red line is crystal clear. Ukraine is outside of it. Everyone made it very clear to Putin that they didn't want him to invade, and that they would impose "costs" on him if he did. But no one threatened to nuke him over it.  Putin made it very clear on the day of the attack that he was threatening nukes to anyone who "interfered" in Ukraine, with his infamous "the consequences will be such as you have never seen in your entire history" -speech. NATO has been helping Ukraine by training their forces and supplying materiell for years before the invasion, and vowed to keep doing so. This can be considered "calling his bluff" to an extent, or as a piecemeal maneuver in it's own right. Yet they withdrew their personell from the country in the days and weeks leading up to the attack. Some have called Biden weak for doing that, essentially "clearing the way" for Putin by removing the tripwire force, and maybe he is. What is clear is that he didn't want for that bluff to be called.  Sending NATO troops into Ukraine to engage Russian forces is a very clear escalation, that Putin has specifically warned against. If this were to happen, Putin would have every incentive to nuke them inside Ukraine, or worse. He might be bluffing. I wouldn't bet on it.  "Red lines" aren't always geographical. Currently (it looks to me like) NATO's unambiguous red line is it's geographical border, while it is trying to establish some strategic ambiguity over use of chemical/biological weapons. This is going so-so, especially after the US's bluff was called in Syria with no consequences. Meanwhil

Given that Russia's attempt at a fait accompli  in Ukraine has failed, and that the situation already is a total war, I fail to see Russia's logic of nuclear deterrence against NATO involvement.  In a sense, NATO has already crossed the red lines that Russia stated would be considered acts of war, such as economic sanctions and direct military supply.  From the Russian perspective, would NATO intervention really invite a total nuclear response the way that a Russian attack on Poland would?  

NATO intervention and subsequent obliteration ... (read more)

4Dojan
Did you read the linked article? It argues extensively and precisely why what you suggest is not something that NATO can risk.  It is a total war for Ukraine, not for Russia. And even less for NATO. No one doubts that NATO could obliterate Russia's conventional forces, if it were guaranteed not to escalate beyond conventional warfare. Putin knows that too. Which is precisely why he couldn't and wouldn't leave any such guarantee.

In Zelenskyy's latest appeal to congress, he offered an alternative to a No Fly Zone, which is massive support for AA equipment and additional fighters.  By creating pressure for a NFZ, he's bought himself significant AA equipment boosts.

This is more or less what Kasparov believed back in 2015: 

I think one of the things to consider with this hypothesis is what is the signal that indicates an area is "overpopulated", and how should members of the species respond to that signal?  And how can this signal be distinguished from other causes?  For instance, an organism that has offspring that are unable to reproduce because they had limited resources will likely be outcompeted by an organism that produces fertile offspring regardless of the availability of resources.  

If you open up a variable that determines how likely your offspring ar... (read more)

1Brittany Ritenour
Not necessarily, but true as well. Good points, but gays, transgenders, and lesbians can now adopt children so they no longer have to produce their own offspring, which is the culprit to over population. I think the LGBTQ community is a much more humane response to over population then how we treat animals when they are over populated. I do think we are more over populated then the animals we try to "Control". Competing for more resources or having children compete for resources, there is no need to worry about extinction at the time because people have children each year.  It isn't the only reason alone, but Bo Burnham makes a joke about God Sending gays to fix over population, but boy did that go well, and what would be the point of responding that way if it wasn't a response to something we may not be aware of. Simply saying people are born gay, is putting people in boxes in predetermined molds that no one wants to be put into, much like the articles pushing that sex offenders run in the genes, but that's not true. I think, why we treat them so poorly, nothing adds up, nature vs nurture, its complex and its more then just genetics. Maybe more feminine men are designed to produce girl offspring. Feminine men have a place in life, and masculine women have a place, that shouldnt have to determine their life or who they are going to be. We are over populated and there as too be some response to it. 
1tailcalled
This is implausible, see the posts about group selection.

Are we also presuming that you can acquire all desired things instantaneously?  Even in a situation when all agents are functionally identical, if it costs 1 unit of time per x units of a resource, wouldn't trade still be useful in acquiring more than x units of a resource in 1 unit of time?  Time seems to me the ultimate currency that still needs to be "traded" in this scenario.  

My point here was that even if the deep learning paradigm is not anywhere close to as efficient as the brain, it has a reasonable chance of getting to AGI anyway since the brain does not use all that much energy.  The biggest models from GPT-3 can run on a fraction of what a datacenter can supply, hence the original question, how do we know AGI isn't just a question of scale in the current deep learning paradigm.  

Answer by ChosunOneΩ480

Here is a link to my forecast

AGI Timeline

And here are the rough justifications for this distribution:

I don't have much else to add beyond what others have posted, though it's in part influenced by an AIRCS event I attended in the past.  Though I do remember being laughed at for suggesting GPT-2 represented a very big advance toward AGI.  

I've also never really understood the resistance to why current models of AI are incapable of AGI.  Sure, we don't have AGI with current models, but how do we know it isn't a question of scale?  Our bra... (read more)

2TurnTrout
First, you ask why it isn't a question of scale. But then you seem to wonder why we need any more scaling? This seems to mix up two questions: can current hardware support AGI for some learning paradigm, and can it support AGI for the deep learning paradigm?
Answer by ChosunOne130

I've been feeding my parents a steady stream of facts and calmly disputing hypotheses that they couldn't support with evidence ("there are lots of unreported cases", "most cases are asymptomatic", etc.). It's taken time but my father helped influence a decision to shut down schools for the whole Chicago area, citing statistics I've been supplying from the WHO.

I think the best thing you can do if they don't take it seriously is to just whittle down their resistance with facts. I tend to only pick a few to tal... (read more)

It seems to me that trying to create a tulpa is like trying to take a shortcut with mental discipline. It seems strictly better to me to focus my effort on a single unified body of knowledge/model of the world than to try to maintain two highly correlated ones at the risk of losing your sanity. I wouldn't trust that a strong imitation of another mind would somehow be more capable than my own, and it seems like having to simulate communication with another mind is just more wasteful than just integrating what you know into your own.

Thinking about i... (read more)