A man asks one of the members of the tribe to find him some kindling so that he may start a fire. A few hours pass, and the second man returns, walking with a large elephant.
“I asked for kindling.” Says the first.
“Yes.” Says the second.
“Where is it?” Asks the first, trying to ignore the large pachyderm in the room.
The second gestures at the elephant, grinning.
“That’s an elephant.”
“I see that you are uninformed. You see, elephants are quite combustible, despite their appearance. Once heat reaches the right temperature, it...
I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.
I got into AI at the worst time possible
2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything woul...
Thanks for writing this, I think this is a common and pretty rough experience.
Have you considered doing cybersecurity work related to AI safety? i.e. work would help prevent bad actors stealing model weights and AIs themselves escaping. I think this kind of work would likely be more useful than most alignment work.
I'd recommend reading Holden Karnofsky's takes, as well as the recent huge RAND report on securing model weights. Redwood's control agenda might also be relevant.
I think this kind of work is probably extremely useful, and somewhat neg...
Thanks for sharing your experience here.
One small thought is that things end up feeling extremely neglected once you index on particular subquestions. Like, on a high-level, it is indeed the case that AI safety has gotten more mainstream.
But when you zoom in, there are a lot of very important topics that have <5 people seriously working on them. I work in AI policy, so I'm more familiar with the policy/governance ones but I imagine this is also true in technical (also, maybe consider swapping to governance/policy!)
Also, especially in hype waves, I...
It probably began training in January and finished around early April. And they're now doing evals.
My birds are singing the same tune.
Going to the moon
Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We've gone from small figh...
But you say “Look at how big those planes are getting! We’ve gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it’s just a matter of time before one of them will land on the moon!”
...And they were right? Humans did land on the moon roughly on that timeline (and as I recall, there were people before the moon landing at RAND and elsewhere who were extrapolating out the exponentials of speed, which was a major reason for such ill-fated projects like the supersonic interceptors fo...
"To the best of my knowledge, Vernor did not get cryopreserved. He has no chance to see the future he envisioned so boldly and imaginatively. The near-future world of Rainbows End is very nearly here... Part of me is upset with myself for not pushing him to make cryonics arrangements. However, he knew about it and made his choice."
I agree that consequentialist reasoning is an assumption, and am divided about how consequentialist an ASI might be. Training a non-consequentialist ASI seems easier, and the way we train them seems to actually be optimizing against deep consequentialism (they're rewarded for getting better with each incremental step, not for something that might only be better 100 steps in advance). But, on the other hand, humans don't seem to have been heavily optimized for this either*, yet we're capable of forming multi-decade plans (even if sometimes poorly).
*Actually, the Machiavellian Intelligence Hypothesis does seem to be optimizing consequentialist reasoning (if I attack Person A, how will Person B react, etc.)
This is the kind of political reasoning that I've seen poisoning LW discourse lately and gets in the way of having actual discussions. Will posits essentially an impossibility proof (or, in it's more humble form, a plausibility proof). I humor this being true, and state why the implications, even then, might not be what Will posits. The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that "even if we align ASI it may still go wrong". The premise grants that the duration of time it is...
I'm not sure who are you are debating here, but it doesn't seem to be me.
First, I mentioned that this was an analogy, and mentioned that I dislike even using them, which I hope implied I was not making any kind of assertion of truth. Second, "works to protect" was not intended to mean "control all relevant outcomes of". I'm not sure why you would get that idea, but that certainly isn't what I think of first if someone says a person is "working to protect" something or someone. Soldiers defending a city from raiders are not violating control theory or the l...
I've heard of many such cases of this from EA Funds (including myself). My impression is that they only had one person working full-time managing all three funds (no idea if this has changed since I applied or not).
An incapable man would kill himself to save the village. A more capable man would kill himself to save the village AND ensure no future werewolves are able to bite villagers again.
Though I tend to dislike analogies, I'll use one, supposing it is actually impossible for an ASI to remain aligned. Suppose a villager cares a whole lot about the people in his village, and routinely works to protect them. Then, one day, he is bitten by a werewolf. He goes to the Shammon, he tells him when the Full Moon rises again, he will turn into a monster, and kill everyone in the village. His friends, his family, everyone. And that he will no longer know himself. He is told there is no cure, and that the villagers would be unable to fight him off. He will grow too strong to be caged, and cannot be subdued or controlled once he transforms. What do you think he would do?
MIRI "giving up" on solving the problem was probably a net negative to the community, since it severely demoralized many young, motivated individuals who might have worked toward actually solving the problem. An excellent way to prevent pathways to victory is by convincing people those pathways are not attainable. A positive, I suppose, is that many have stopped looking to Yudkowsky and MIRI for the solutions, since it's obvious they have none.
I don't think this is the case. For awhile, the post with the highest karma was Paul Christiano explaining all the reasons he thinks Yudkowsky is wrong.
Fair. What would you call a "mainstream ML theory of cognition", though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis
It tends not to get talked about much today, but there was the PDP (connectionist) camp of cognition vs. the camp of "everything else" (including ideas such as symbolic reasoning, etc). The connectionist camp created a rough model of how they thought cognition worked, a lot of cognitive scientists scoffed at it, Hinton tried putting it into actual practice, b...
"My position is that there are many widespread phenomena in human cognition that are expected according to my model, and which can only be explained by the more mainstream ML models either if said models are contorted into weird shapes, or if they engage in denialism of said phenomena."
Such as? I wouldn't call Shard Theory mainstream, and I'm not saying mainstream models are correct either. On human's trying to be consistent decision-makers, I have some theories about that (and some of which are probably wrong). But judging by how bad humans are at i...
This isn't what I mean. It doesn't mean you're not using real things to construct your argument, but that doesn't mean the structure of the argument reflects something real. Like, I kind of imagine it looking something like a rationalist Jenga tower, where if one piece gets moved, it all crashes down. Except, by referencing other blog posts, it becomes a kind of Meta-Jenga: a Jenga tower composed of other Jenga towers. Like "Coherent decisions imply consistent utilities". This alone I view to be its own mini Jenga tower. This is where I think String Theori...
I dislike the overuse of analogies in the AI space, but to use your analogy, I guess it's like you keep assigning a team of engineers to build a car, and two possible things happen. Possibility One: the engineers are actually building car engines, which gives us a lot of relevant information for how to build safe cars (toque, acceleration, speed, other car things), even if we don't know all the details for how to build a car yet. Possibility Two: they are actually just building soapbox racers, which doesn't give us much information for building safe cars, but also means that just tweaking how the engineers work won't suddenly give us real race cars.
If progress in AI is continuous, we should expect record levels of employment. Not the opposite.
My mentality is if progress in AI doesn't have a sudden, foom-level jump, and if we all don't die, most of the fears of human unemployment are unfounded... at least for a while. Say we get AIs that can replace 90% of the workforce. The productivity surge from this should dramatically boost the economy, creating more companies, more trading, and more jobs. Since AIs can be copied, they would be cheap, abundant labor. This means anything a human can do that ...
I think my main problem with this is that it isn't based on anything. Countless times, you just reference other blog posts, which reference other blog posts, which reference nothing. I fear a whole lot of people thinking about alignment are starting to decouple themselves from reality. It's starting to turn into the AI version of String Theory. You could be correct, but given the enormous number of assumptions your ideas are stacked on (and that even a few of those assumptions being wrong leads to completely different conclusions), the odds of you even being in the ballpark of correct seem unlikely.
I'm very sympathetic to this view, but I disagree. It is based on a wealth of empirical evidence that we have: on data regarding human cognition and behavior.
I think my main problem with this is that it isn't based on anything
Hm. I wonder if I can get past this common reaction by including a bunch of references to respectable psychology/neurology/game-theory experiments, which "provide scientific evidence" that various common-sensical properties of humans are actually real? Things like fluid vs. general intelligence, g-factor, the global workplace theory, ...
At first I strong-upvoted this, because I thought it made a good point. However, upon reflection, that point is making less and less sense to me. You start by claiming current AIs provide nearly no data for alignment, that they are in a completely different reference class from human-like systems... and then you claim we can get such systems with just a few tweaks? I don't see how you can go from a system that, you claim, provides almost no data for studying how an AGI would behave, to suddenly having a homunculus-in-the box that becomes superintelligent a...
Contra One Critical Try: AIs are all cursed
I don't feel like making this a whole blog post, but my biggest source for optimism for why we won't need to one-shot an aligned superintelligence is that anyone who's trained AI models knows that AIs are unbelievably cursed. What do I mean by this? I mean even the first quasi-superintelligent AI we get will have so many problems and so many exploits that taking over the world will simply not be possible. Take a "superintelligence" that only had to beat humans at the very constrained game of Go, which ...
I'm kind of surprised this has almost 200 karma. This feels much more like a blog post on substack, and much less like the thoughtful, insightful new takes on rationality that used to get this level of attention on the forum.
It also isn't my favorite version of this post that could exist, but it seems like a reasonable point to make, and my guess is a lot of people are expressing their agreement with the title by upvoting.
Why would it matter if they notice or not? What are they gonna do? EMP the whole world?
I think you're missing the point. If we could establish that all important information had been extracted from the original, would you expect humans to then destroy the original or allow it to be destroyed?
My guess is that they wouldn't. Which I think means practicality is not the central reason why humans do this.
if we could somehow establish how information from the original was extracted, do you expect humans to then destroy the original or allow it to be destroyed?
On 12 September 1940, the entrance to the Lascaux Cave was discovered on the La Rochefoucauld-Montbel lands by 18-year-old Marcel Ravidat when his dog, Robot, investigated a hole left by an uprooted tree (Ravidat would embellish the story in later retellings, saying Robot had fallen into the cave.)[8][9] Ravidat returned to the scene with three friends, Jacques Marsal, Georges Agnel, and Simon Coencas. They entered the cave through a 15-metre-deep (50-foot) shaft that they believed might be a legendary secret passage to the ne...
Suppose you've got a strong goal agnostic system design, but a bunch of competing or bad actors get access to it. How does goal agnosticism stop misuse?
This was the question I was waiting to be answered (since I'm already basically onboard with the rest of it), but was disappointed you didn't have a more detailed answer. Keeping this out of incompetent/evil hands perpetually seems close-to impossible. It seems this goes back to needing a maximizer-type force in order to prevent such misuse from occurring, and then we're back to square-one of the clas...
I created a simple Google Doc for anyone interested in joining/creating a new org to put down their names, contact, what research they're interested in pursuing, and what skills they currently have. Overtime, I think a network can be fostered, where relevant people start forming their own research, and then begin building their own orgs/get funding. https://docs.google.com/document/d/1MdECuhLLq5_lffC45uO17bhI3gqe3OzCqO_59BMMbKE/edit?usp=sharing
But it's also an entire School of Thought in Cognitive Science. I feel like DL is the method, but without the understanding that these are based on well-thoughtout, mechanistic rules for how cognition fundamentally works, building potentially toward a unified theory of cognition and behaviour.
I don't have an adequate answer for this, since these models are incomplete. But the way I see it is that these people had a certain way of mathematically reasoning about cognition (Hinton, Rumelhart, McClelland, Smolensky), and that reasoning created most of the breakthroughs we see today in AI (backprop, multi-layed models, etc.) It seems trying to utilize that model of cognition could give rise to new insights about the questions you're asking, attack the problem from a different angle, or help create a grounded paradigm for alignment research to build on.
My answer is a bit vague, but I would say that the current DL curriculum tells you how these things work, but it doesn't go into the reasoning about cognition that allowed these ideas to exist in the first place.
You could say it "predicted" everything post-AlexNet, but it's more that it created the fundamental understanding for everything post-AlexNet to exist in the first place. It's the mathematical models of cognition that all of modern AI is built on. This is how we got back propagation, "hidden" layers, etc.
If you, or if you know someone who wants to try to start doing this, let me know. I've noticed a lot of things in AIS people will say they'd like to see, but then nothing happens.
I guess my biggest doubt is that a dl-based AI could run interpretability on itself. Large NNs seem to "simulate" a larger network to represent more features, which results in most of the weights occupying a superposition. I don't see how a network could reflect on itself, since it seems that would require an even greater network (which then would require an even greater network, and so on). I don't see how it could eat its own tail, since only interpreting parts of the network would not be enough. It would have to interpret the whole.
The following is a conversation between myself in 2022, and a newer version of myself earlier this year.
On AI Governance and Public Policy
2022 Me: I think we will have to tread extremely lightly with, or, if possible, avoid completely. One particular concern is the idea of gaining public support. Many countries have an interest in pleasing their constituents, so if executed well, this could be extremely beneficial. However, it runs high risk of doing far more damage. One major concern is the different mindset needed to conceptualize the problem. Aler...
[crossposting my reply]
Thank you for taking the time to read and critique this idea. I think this is very important, and I appreciate your thoughtful response.
Regarding how to get current systems to implement/agree to it, I don't think that will be relevant longterm. The mechanisms current institutions use for control I don't think can keep up with AI proliferation. I imagine most existing institutions will still exist, but won't have the capacity to do much once AI really takes off. My guess is, if AI kills us, it will happen after a slow-motion coup. Not...
"If the boxed superintelligence with the ability to plan usage of weapons when authorized by humans, and other boxed superintelligences able to control robotics in manufacturing cells are on humans side, the advantage for humans could be overwhelming"
As I said, I do not expect boxed AIs to be a thing most will do. We haven't seen it, and I don't expect to see it, because unboxed AIs are superior. This isn't how people in control are approaching the situation, and I don't expect that to change.
"keep it relegated to "tool" status, then it might be possible to use such an AI to combat unboxed, rogue AI"
I don't think this is a realistic scenario. You seem to be seeing it as an island of rogue, agentic, "unboxed" AIs in a sea of tool AIs. I think it's much, much more realistic that it'll be the opposite. Most AIs will be unboxed agents because they are superior.
"For example, give it a snapshot of the internet from a day ago, and ask it to find the physical location of rogue AI servers, which you promptly bomb."
This seems to be approaching it f...
Are you familiar with Constellation's Proof of Reputable Observation? This seems very similar.
The following is a conversation between myself in 2022, and a newer version of me earlier this year.
On the Nature of Intelligence and its "True Name":
2022 Me: This has become less obvious to me as I’ve tried to gain a better understanding of what general intelligence is. Until recently, I always made the assumption that intelligence and agency were the same thing. But General Intelligence, or G, might not be agentic. Agents that behave like RLs may only be narrow forms of intelligence, without generalizability. G might be something closer to a simula...
Thanks, finding others who are working on similar things is very useful. Do you know if the reading group is still active, or if they are working on anything new?
Given that I don't know when Schelling Day is, I doubt its existence.
If we're being realistic, this kind of thing would only get criminalized after something bad actually happened. Until then, too many people will think "omg, it's just a Chatbot". Any politician calling for it would get made fun of on every Late Night show.
I'm almost certain this is already criminal, to the extent it's actually dangerous. If you roll a boulder down the hill, you're up for manslaughter if it kills someone, and reckless endangerment if it could've but didn't hurt anyone. It doesn't matter if it's a boulder or software; if you should've known it was dangerous, you're criminally liable.
In this particular case, I have mixed feelings. This demonstration is likely to do immense good for public awareness of AGI risk. It even did for me, on an emotional level I haven't felt before. But it's also impo...
Yeah, all the questions over the years of "why would the AI want to kill us" could be answered with "because some idiot thought it would be funny to train an AI to kill everyone, and it got out of hand". Unfortunately, stopping everyone on the internet from doing things isn't realistic. It's much better to never let the genie out of the bottle in the first place.
I'm currently thinking that if there are any political or PR resources available to orgs (AI-related or EA) now is the time to use them. Public interest is fickle, and currently most people don't seem to know what to think, and are looking for serious-seeming people to tell them whether or not to see this as a threat. If we fail to act, someone else will likely hijack the narrative, and push it in a useless or even negative direction. I don't know how far we can go, or how likely it is, but we can't assume we'll get another chance before the public falls b...
Yeah, since the public currently doesn't have much of an opinion on it, trying to get the correct information out seems critical. I fear some absolutely useless legislation will get passed, and everyone will just forget about it once the shock-value of GPT wears off.
Unfortunately, he could probably get this published in various journals, with only minor edits being made.
sigh Protests last year, barricading this year, I've already mentally prepared myself for someone next year throwing soup at a human-generated painting while shouting about AI. This is the kind of stuff that makes no one in the Valley want to associate with you. It makes the cause look low-status, unintelligent, lazy, and uninformed.
Just because the average person disapproves of a protest tactic doesn't mean that the tactic didn't work. See Roger Hallam's "Designing the Revolution" series for the thought process underlying the soup-throwing protests. Reasonable people may disagree (I disagree with quite a few things he says), but if you don't know the arguments, any objection is going to miss the point. The series is very long, so here's a tl/dr:
- If the public response is: "I'm all for the cause those protestors are advocating, but I can't stand their methods" notic... (read more)