They stated justification was primarily that the Standard Model of particle physics predicts metastability
Just to be sure, does this mean
1. That the standard model predicts that metastability is possible? i.e. it is consistent with the standard model for there to be metastability; or
2. If the standard model is correct, and certain empirical observations are correct, then we must be in a metastable state. i.e. the standard model together with certain empirical observations implies our actual universe is metastable?
I may be confused somehow. Feel free to ignore. But:
* At first I thought you meant the input alphabet to be the colors, not the operations.
* Instead, am I correct that "the free operad generated by the input alphabet of the tree automaton" is an operad with just one color, and the "operations" are basically all the labeled trees where labels of the nodes are the elements of the alphabet, such that the number of children of a node is always equal to the arity of that label in the input alphabet?
* That would make sense, as the algebra would then I guess assi...
More precisely, they are algebras over the free operad generated by the input alphabet of the tree automaton
Wouldn't this fail to preserve the arity of the input alphabet? i.e. you can have trees where a given symbol occurs multiple times, and with different amounts of children? That wouldn't be allowed from the perspective of the tree automaton right?
Noosphere, why are you responding for a second time to a false interpretation of what Eliezer was saying, directly after he clarified this isn't what he meant?
Here is an additional reason why it might seem less useful than it actually is: Maybe the people whose research direction is being criticized do process the criticism and change their views, but do not publicly show that they change their mind because it seems embarrassing. It could be that it takes them some time to change their mind, and by that time there might be a bigger hurdle to letting you know that you were responsible for this, so they keep it to themselves. Or maybe they themselves aren't aware that you were responsible.
but note that the gradual problem makes the risk of coups go up.
Just a request for editing the post to clarify: do you mean coups by humans (using AI), coups by autonomous misaligned AI, or both?
EDIT 3/5/24: In the comments for Counting arguments provide no evidence for AI doom, Evan Hubinger agreed that one cannot validly make counting arguments over functions. However, he also claimed that his counting arguments "always" have been counting parameterizations, and/or actually having to do with the Solomonoff prior over bitstrings.
As one of Evan's co-authors on the mesa-optimization paper from 2019 I can confirm this. I don't recall ever thinking seriously about a counting argument over functions.
I'm trying to figure out to what extent the character/ground layer distinction is different from the simulacrum/simulator distinction. At some points in your comment you seem to say they are mutually inconsistent, but at other points you seem to say they are just different ways of looking at the same thing.
"The key difference is that in the three-layer model, the ground layer is still part of the model's "mind" or cognitive architecture, while in simulator theory, the simulator is a bit more analogous to physics - it's not a mind at all, but rather the rul...
Minor quibble: It's a bit misleading to call B "experience curves", since it is also about capital accumulation and shifts in labor allocation. Without any additional experience/learning, if demand for candy doubles, we could simply build a second candy factory that does the same thing as the first one, and hire the same number of workers for it.
I just want to register a prediction that I think something like meta's coconut will in the long run in fact perform much better than natural language CoT. Perhaps not in this time-frame though.
I suspect you're misinterpreting EY's comment.
Here was the context:
"I think controlling Earth's destiny is only modestly harder than understanding a sentence in English - in the same sense that I think Einstein was only modestly smarter than George W. Bush. EY makes a similar point.
You sound to me like someone saying, sixty years ago: "Maybe some day a computer will be able to play a legal game of chess - but simultaneously defeating multiple grandmasters, that strains credibility, I'm afraid." But it only took a few decades to get from point A to point B....
"It's fine to say that this is a falsified prediction"
I wouldn't even say it's falsified. The context was: "it only took a few decades to get from [chess computer can make legal chess moves] to [chess computer beats human grandmaster]. I doubt that going from "understanding English" to "controlling the Earth" will take that long."
So insofar as we believe ASI is coming in less than a few decades, I'd say EY's prediction is still on track to turn out correct.
NEW EDIT: After reading three giant history books on the subject, I take back my previous edit. My original claims were correct.
Could you edit this comment to add which three books you're referring to?
One of the more interesting dynamics of the past eight-or-so years has been watching a bunch of the people who [taught me my values] and [served as my early role models] and [were presented to me as paragons of cultural virtue] going off the deep end.
I'm curious who these people are.
We should expect regression towards the mean only if the tasks were selected for having high "improvement from small to Gopher-7". Were they?
The reasoning was given in the comment prior to it, that we want fast progress in order to get to immortality sooner.
"But yeah, I wish this hadn't happened."
Who else is gonna write the article? My sense is that no one (including me) is starkly stating publically the seriousness of the situation.
"Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously"
I'm worried about people making character attacks on Yudkowsky (or other alignment researchers) like this. I think the people who think they can ...
"We finally managed to solve the problem of deceptive alignment while being capabilities competitive"
??????
"But I don't think you even need Eliezer-levels-of-P(doom) to think the situation warrants that sort of treatment."
Agreed. If a new state develops nuclear weapons, this isn't even close to creating a 10% x-risk, yet the idea of airstrikes on nuclear enrichment facillities, even though it is very controversial, has for a long time very much been an option on the table.
"if I thought the chance of doom was 1% I'd say "full speed ahead!"
This is not a reasonable view. Not on Longtermism, nor on mainstream common sense ethics. This is the view of someone willing to take unacceptable risks for the whole of humanity.
Also, there is a big difference between "Calling for violence", and "calling for the establishment of an international treaty, which is to be enforced by violence if necessary". I don't understand why so many people are muddling this distinction.
You are muddling the meaning of "pre-emptive war", or even "war". I'm not trying to diminish the gravity of Yudkowsky's proposal, but a missile strike on a specific compound known to contain WMD-developing technology is not a "pre-emptive war" or "war". Again I'm not trying to diminish the gravity, but this seems like an incorrect use of the term.
"For instance, personally I think the reason so few people take AI alignment seriously is that we haven't actually seen anything all that scary yet. "
And if this "actually scary" thing happens, people will know that Yudkowsky wrote the article beforehand, and they will know who the people are that mocked it.
I agree. Though is it just the limited context window that causes the effect? I may be mistaken, but from my memory it seems like they emerge sooner than you would expect if this was the only reason (given the size of the context window of gpt3).
Therefore, the waluigi eigen-simulacra are attractor states of the LLM
It seems to me like this informal argument is a bit suspect. Actually I think this argument would not apply to Solomonof Induction.
Suppose we have to programs that have distributions over bitstrings. Suppose p1 assigns uniform probability to each bitstring, while p2 assigns 100% probability to the string of all zeroes. (equivalently, p1 i.i.d. samples bernoully from {0,1}, p2 samples 0 i.i.d. with 100%).
Suppose we use a perfect Bayesian reasoner to sample bitstrings, bu...
Linking to my post about Dutch TV: https://www.lesswrong.com/posts/TMXEDZy2FNr5neP4L/datapoint-median-10-ai-x-risk-mentioned-on-dutch-public-tv
"When LessWrong was ~dead"
Which year are you referring to here?
A lot of people in AI Alignment I've talked to have found it pretty hard to have clear thoughts in the current social environment, and many of them have reported that getting out of Berkeley, or getting social distance from the core of the community has made them produce better thoughts.
What do you think is the mechanism behind this?
I think the biggest thing is a strong, high-stakes but still quite ambiguous status-hierarchy in the Bay Area.
I think there are lots of contributors to this, but I definitely feel a very huge sense of needing to adopt certain views, to display "good judgement", and to conform to a bunch of epistemic and moral positions in order to operate in the space. This is particularly harsh since the fall of FTX with funding being less abundant and a lot of projects being more in-peril and the stakes of being perceived as reasonable and competent by a very messy and in-substantial parts social process are even higher.
There is a general phenomenon where:
It seems to me quite likely that you are person B, thinking they explained something because THEY think their explanation is very good and contai...
Very late reply, sorry.
"even though reward is not a kind of objective", this is a terminological issue. In my view, calling a "antecedent-computation reinforcement criterion" an "objective" matches my definition of "objective", and this is just a matter of terminology. The term "objective" is ill-defined enough that "even though reward is not a kind of objective" is a terminological claim about objective, not a claim about math/the world.
The idea that RL agents "reinforce antecedent computations" is completely core to our story of deception. You could not ...
The core point in this post is obviously correct, and yes people's thinking is muddled if they don't take this into account. This point is core to the Risks from learned optimization paper (so it's not exactly new, but it's good if it's explained in different/better ways).
Maybe you have made a gestalt-switch I haven't made yet, or maybe yours is a better way to communicate the same thing, but: the way I think of it is that the reward function is just a function from states to numbers, and the way the information contained in the reward function affects the model parameters is via reinforcement of pre-existing computations.
Is there a difference between saying:
It seems to me that the basic conceptual point made in this post is entirely contained in our Risks from Learned Optimization paper. I might just be missing a point. You've certainly phrased things differently and made some specific points that we didn't, but am I just misunderstanding something if I think the basic conceptual claims of this post (which seems to be presented as new) are implied by RFLO? If not, could you state briefly what is different?
(Note I am still surprised sometimes that people still think certain wireheading scenario's make sense despite them having read RFLO, so it's plausible to me that we really didn't communicate everyrhing that's in my head about this).
I agree this is a good distinction.
"I think in the defense-offense case the actions available to both sides are approximately the same"
If attacker has the action "cause a 100% lethal global pandemic" and the defender has the task "prevent a 100% lethal global pandemic", then clearly these are different problems, and it is a thesis, a thing to be argued for, that the latter requires largely the same skills/tech as the former (which is what this offense-defense symmetry thesis states).
If you build an OS that you're trying to make safe against attacks, you might do e.g. what the seL4 mic...
Kind of a delayed response, but: Could you clarify what you think is the relation between that post and mine? I think they are somehow sort of related, but not sure what you think the relation is. Are you just trying to say "this is sort of related", or are you trying to say "the strategy stealing assumption and this defense-offense symmetry thesis is the same thing"?
In the latter case: I think they are not the same thing, neither in terms of their actual meaning nor their intended purpose:
I just had a very quick look at that site, and it seems to be a collection of various chip models with pictures of them? Is there actual information on quantities sold, etc? I couldn't find it immediately.
Yeah, I know they don't understand them comprehensively. Is this the point though? I mean they understand them at a level of abstraction necessary to do what they need, and the claim is they have basically the same kind of knowledge of computers. Hmm, I guess that isn't really communicated by my phrasing though, so maybe I should edit that
I think I communicated unclearly and it's my fault, sorry for that: I shouldn't have used the phrase "any easily specifiable task" for what I meant, because I didn't mean it to include "optimize the entire human lightcone w.r.t. human values". In fact, I was being vague and probably there isn't really a sensible notion that I was trying to point to. However, to clarify what I really was trying to say: What I mean by "hard problem of alignment" is : "develop an AI system that keeps humanity permanently safe from misaligned AI (and maybe other x risks), and ...
I'm surprised if I haven't made this clear yet, but the thing that (from my perspective) seems different between my and your view is not that Step 1 seems easier to me than it seems to you, but that the "melt the GPUs" strategy (and possibly other pivotal acts one might come up with) seems way harder to me than it seems to you. You don't have to convince me of "'any easily human-specifiable task' is asking for a really mature alignment", because in my model this is basically equivalent to fully solving the hard problem of AI alignment.
Some reasons:
"you" obviously is whoever would be building the AI system that ended up burning all the GPU's (and ensuring no future GPU's are created). I don't know such sequence of events just as I don't know the sequence of events for building the "burn all GPU's" system, except at the level of granularity of "Step 1. build a superintelligent AI system that can perform basically any easily human-specifiable task without destroying the world. Step 2. make that system burn all GPU's indefintely/build security services that prevent misaligned AI from destroying the worl...
I wonder if there is a bias induced by writing this on a year-by-year basis, as opposed to some random other time interval, like 2 years. I can somehow imagine that if you take 2 copies of a human, and ask one to do this exercise in yearly intervals, and the other to do it in 2-year intervals, they'll basically tell the same story, but the second one's story takes twice as long. (i.e. the second one's prediction for 2022/2024/2026 are the same as the first one's predictions for 2022/2023/2024). It's probably not that extreme, but I would be surprised if there was zero such effect, which would mean these timelines are biased downwards or upwards.
yeah, I probably overstated. Nevertheless:
"CEV seems way harder to me than ..."
yes, I agree it seems way harder, and I'm assuming we won't need to do it and that we could instead "run CEV" by just actually continuing human society and having humans figure out what they want, etc. It currently seems to me that the end game is to get to an AI security service (in analogy to state security services) that protects the world from misaligned AI, and then let humanity figure out what it wants (CEV). The default is just to do CEV directly by actual human brains, b...
Ok I admit I read over it. I must say though that this makes the whole thing more involved than it sounded at fist, since it would maybe require essentially escalating a conflict with all major military powers and still coming out on top? One possible outcome of this would be that the entire global intellectual public opinion turns against you, meaning you also possibly lose access to a lot of additional humans working with you on further alignment research? I'm not sure if I'm imagining it correctly, but it seems like this plan would either require so many elements that I'm not sure if it isn't just equivalent to solving the entire alignment problem, or otherwise it isn't actually enough.
But assuming that law enforcement figures out that you did this, then puts you in jail, you wouldn't be able to control the further use of such nanotech, i.e. there would just be a bunch of systems indefinitely destroying GPU's, or maybe you set a timer or some conditions on it or something. I certainly see no reason why Iceland or anyone in iceland could get away with this unless those systems rely on completely unchecked nanosystems to which the US military has no response. Maybe all of this is what Eliezer means by "melt the GPU's", but I thought he did...
I meant, is there a link to where you've written this down somewhere? Maybe you just haven't written it down.
I would be interested in reading a draft and giving feedback (FYI I'm currently a researcher in the AI safety team at FHI).
I'm also interested to read the draft, if you're willing to send it to me.
Since anywhere near 0% seems way overconfident to me at first sight, just a random highly speculative unsubstantiated thought: Could this be partly motivated reasoning, that they're afraid of a backlash against physics funding or something?