I think something like "alignment features" are plausibly a huge part of the story for why AI goes well.
At least, I think it is refreshing to take the x-risk goggles off for a second sometimes and remember that there is actually a huge business incentive to eg. solve "indirect prompt injections", perfect robust AI decision making in high stakes contexts, or find the holy grail of compute-scalable oversight.
Like, a lot of times there seems genuine ambiguity and overlap b/w "safety" research and normal AI research. The clean "capabilities"/"align...
A lot of good people are doing a lot of bad things that they don't enjoy doing all the time. That seems weird. They even say stuff like "I don't want to do this". But then they recite some very serious sounding words or whatever and do it anyways.
Lol, okay on review that reads as priveledged. Easy for rectangle-havers to say.
There is underlying violence keeping a lot of people "at work" and doing the things they don't want to do. An authoritarian violence keeping everyone in place.
The threat is to shelter, food, security, even humanity past a c...
For specific activities, I would suggest doubling down on activities that you already like to do or have interest in, but which you implicitly avoid "getting into" because they are considered low status. For example: improve your masturbation game, improve your drug game (as in plan fun companion activities or make it a social thing; not just saying do more/stronger drugs), get really into that fringe sub-genre that ~only you like, experiment with your clothes/hair style, explore your own sexual orientation/gender identity, just straight up stop doing any hobbies that you're only into for the status, etc.
I think the best way to cash in on the fun side of the fun/status tradeoff is probably mostly rooted in adopting a disposition and outlook that allows you to. I think most people self limit themselves like crazy to promote a certain image and that if you're really trying to extract fun-bang for your status-buck, then dissolving some of that social conditioning and learning to be silly is a good way to go. Basically, I think there's a lot of fun to be had for those who are comfortable acting silly or playful or unconventional. If you can unlock some of that...
While I agree that there are notable differences between "vegans" and "carnists" in terms of group dynamics, I do not think that necessarily disagrees with the idea that carnists are anti-truthseeking.
"carnists" are not a coherent group, not an ideology, they do not have an agenda (unless we're talking about some very specific industry lobbyists who no doubt exist). They're just people who don't care and eat meat.
It seems untrue that because carnists are not an organized physical group that has meetings and such, they are thereby incapable of having ...
Thanks! I haven't watched, but I appreciated having something to give me the gist!
Hotz was allowed to drive discussion. In debate terms, he was the con side, raising challenges, while Yudkowsky was the pro side defending a fixed position.
This always seems to be the framing which seems unbelievably stupid given the stakes on each side of the argument. Still, it seems to be the default; I'm guessing this is status quo bias and the historical tendency of everything to stay relatively the same year by year (less so once technology really started happening). I ...
Would the prize also go towards someone who can prove it is possible in theory? I think some flavor of "alignment" is probably possible and I would suspect it more feasible to try to prove so than to prove otherwise.
I'm not asking to try to get my hypothetical hands on this hypothetical prize money, I'm just curious if you think putting effort into positive proofs of feasibility would be equally worthwhile. I think it is meaningful to differentiate "proving possibility" from alignment research more generally and that the former would itself be worthwhile. I'm sure some alignment researchers do that sort of thing right? It seems like a reasonable place to start given an agent-theoretic approach or similar.
I appreciate the attempt, but I think the argument is going to have to be a little stronger than that if you're hoping for the 10 million lol.
Aligned ASI doesn't mean "unaligned ASI in chains that make it act nice", so the bits where you say:
any constraints we might hope to impose upon an intelligence of this caliber would, by its very nature, be surmountable by the AI
and
overconfidence to assume that we could circumscribe the liberties of a super-intelligent entity
feel kind of misplaced. The idea is less "put the super-genius in chains" and moreso to...
The doubling time for AI compute is ~6 months
Source?
In 5 years compute will scale 2^(5÷0.5)=1024 times
This is a nitpick, but I think you meant 2^(5*2)=1024
In 5 years AI will be superhuman at most tasks including designing AI
This kind of clashes with the idea that AI capabilities gains are driven mostly by compute. If "moar layers!" is the only way forward, then someone might say this is unlikely. I don't think this is a hard problem, but I thing its a bit of a snag in the argument.
...An AI will design a better version of itself a
Great post!
As much as a I like LessWrong for what it is, I think it's often guilty of a lot of the negative aspects of conformity and coworking that you point out here. Ie. killing good ideas in their cradle. Of course, there are trade-offs to this sort of thing and I certainly appreciate brass-tacks and hard-nosed reasoning sometimes. There is also a need for ingenuity, non-conformity, and genuine creativity (in all of its deeply anti-social glory).
Thank you for sharing this! It helped me feel LessWeird about the sorts of things I do in my own creative/ex...
There’s a dead zone between skimming and scrutiny where you could play slow games without analyzing them and get neither the immediate benefits of cognitively-demanding analysis nor enough information to gain a passive understanding of the underlying patterns.
I think this is a good point. I think there's a lot to be said for being intentional about how/what you're consuming. It's kind of easy for me to fall into a pit of "kind of paying attention" where I'm spending mental energy, but not retaining anything, but not really skimming either. I think it...
It strikes me that there is a difficult problem involved in creating a system that can automatically perform useful alignment research, which is generally pretty speculative and theoretical, without that system just being generally skilled at reasoning/problem solving. I am sure they are aware of this, but I feel like it is a fundamental issue worth highlighting.
Still, it seems like the special case of "solve the alignment problem as it relates to an automated alignment researcher" might be easier than "solve alignment problem for reasoning systems general...
I'm interested in getting involved with a mentorship program or a learning cohort for alignment work. I have found a few things poking around (mostly expired application posts), but I was wondering if anyone could point me towards a more comprehensive list. I found aisafety.community, but it still seems like it is missing things like bootcamps, SERI MATS, and such. If anyone is aware of a list of bootcamps, cohorts, or mentor programs or list a few off for me, I would really appreciate the direction. Thanks!
I have sometimes seen people/contests focused on writing up specific scenarios for how AI can go wrong starting with our current situation and fictionally projecting into the future. I think the idea is that this can act as an intuition pump and potentially a way to convince people.
I think that is likely net negative given the fact that state of the art AIs are being trained on internet text and stories where a good agent starts behaving badly are a key component motivating the Waluigi effect.
These sort of stories still seem worth thinking about, but perha...
This seems to be phrased like a disagreement, but I think you're mostly saying things that are addressed in the original post. It is totally fair to say that things wouldn't go down like this if you stuck 100 actual prisoners or mathematicians or whatever into this scenario. I don't believe OP was trying to claim that it would. The point is just that sometimes bad equilibria can form from everyone following simple, seemingly innocuous rules. It is a faithful execution of certain simple strategic approaches, but it is a bad strategy in situations like this ...
After reading this, I tried to imagine what an ML system would have to look like if there really were an equivalent of the kind of overhang that was present in evolution. I think that if we try to make the ML analogy such that SGD = evolution, then it would have to look something like: "There are some parameters which update really really slowly (DNA) compared to other parameters (neurons). The difference is like ~1,000,000,000x. Sometimes, all the fast parameters get wiped and the slow parameters update slightly. The process starts over and the fast param...
I am very interested in finding more posts/writing of this kind. I really appreciate attempts to "look at the game board" or otherwise summarize the current strategic situation.
I have found plenty of resources explaining why alignment is a difficult problem and I have some sense of the underlying game-theory/public goods problem that is incentivizing actors to take excessive risks in developing AI anyways. Still, I would really appreciate any resources that take a zoomed-out perspective and try to identify the current bottlenecks, key battlegrounds, local win conditions, and roadmaps in making AI go well.
The skepticism that I object to has less to do with the idea that ML systems are not robust enough to operate robots and more to do with people rationalizing based off of the intrinsic feeling that "robots are not scary enough to justify considering AGI a credible threat". (Whether they voice this intuition or not)
I agree that having highly capable robots which operate off of ML would be evidence for AGI soon and thus the lack of such robots is evidence in the opposite direction.
That said, because the main threat from AGI that I am concerned ab...
My off-the-cuff best guesses at answering these questions:
1. Current day large language models do have "goals". They are just very alien, simple-ish goals that are hard to conceptualize. GPT-3 can be thought of as having a "goal" that is hard to express in human terms, but which drives it to predict the next word in a sentence. It's neural pathways "fire" according to some form of logic that leads it to "try" to do certain things; this is a goal. As systems become more general, their goals they will continue to have goals. Their terminal goals can remain a...
For personal context: I can understand why a superintelligent system having any goals that aren't my goals would be very bad for me. I can also understand some of the reasons it is difficult to actually specify my goals or train a system to share my goals. There are a few parts of the basic argument that I don't understand as well though.
For one, I think I have trouble imagining an AGI that actually has "goals" and acts like an agent; I might just be anthropomorphizing too much.
1. Would it make sense to talk about modern large language models a...
As a counter example to the idea that safety work isn't compute constrained, here is a quote from an interpretability paper out of Anthropic, "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" :
... (read more)