How are so many of you this certain of doom?
I find many pathways to hostile or simply amoral AI plausible. I also find many potential security problems in supposedly safe approaches plausible. And I find the criticism of existing alignment systems plausible.
But "we are not sure whether the security will work, and we can imagine bad things emerging, and our solutions for that seem insufficient" seems a huge step from claiming a near-*certainty* of AGI emerging evil and killing literally everyone.
I do not understand where this certainty comes from, when dealing with systems that are, by their nature, hypothetical and unknown. A lot of the explanations feel like that is the goal you are trying to prove, and then you go back from that.
E.g. I find it plausible that many (not all) AIs will automatically begin seeking power and resources and self-preservation as one of their goals, to a degree.
I do not find it plausible that that automatically entails killing every tiniest threat and devouring every last atom. And the argument for the former does not seem to carry the latter
Like, I can build a story that gets to that end result. An AI that wants maximum safety and power above all else, and takes no chances. - But I do not see why that would be the only story.
As a person who is, myself, extremely uncertain about doom -- I would say that doom-certain voices are disproportionately outspoken compared to uncertain ones, and uncertain ones are in turn outspoken relative to voices generally skeptical of doom. That doesn't seem too surprising to me, since (1) the founder of the site, and the movement, is an outspoken voice who believes in high P(doom); and (2) the risks are asymmetrical (much better to prepare for doom and not need it, than to need preparation for doom and not have it.)
I think one of the things rationalists try to do is take the numbers seriously from a consequentialist/utilitarian perspective. This means that even if there's a small chance of doom, you should put vast resources towards preventing it since the expected loss is high.
I think this makes people think that the expectations of doom in the community are much higher than they actually are, because the expected value of preventing doom is so high.
While rationalist would take small numbers seriously, a lot of rationalists do have two-digit percentages of chance of doom.
It's not like Asteroid or Yellowstone where you have a very low risk that's still worth being taken seriously.
From how discrepancy between temp/resources allocated to alignment research and capability research looks to lay person (to me), the doom scenario is closer to a lottery than to a story. I don't see why it would be winning number. I 99,999 sure that ASI will be proactive (and all kind of synonyms to this word). It all mostly can be summarised with "fast takeoff" and "human values are fragile".
I do find the discrepancy deeply worrying, and have argued before that calling for more safety funding (and potentially engaging in civil disobedience for it) may be one of the most realistic and effectual goals for AI safety activism. I do think it is ludicrous to spend so little on it in comparison.
But again... does this really translate to a proportional probability of doom? I don't find it intuitively completely implausible that getting a capable AI requires more money than aligning an AI. In part because I can imagine lucky sets of coincidences that lead to AIs gaining some alignment through interaction with aligned humans and the consumption of human-aligned training data, but cannot really imagine lucky sets of coincidences that lead to humans accidentally inventing an artificial intelligence. It seems like the latter needs funding and precision in all worlds, while in the former, it would merely seem extremely desirable, not 100 % necessary - or at least not to the same degree.
(Analogously, humans have succeeded in raising ethical children, or taming animals successfully, even if they often did not really know what they were doing. However, the human track record in creating artificial life is characterised by a need for extreme precision and lengthy trial and error, and a lot of expense. I find it more plausible that a poor Frankenstein would manage to make his monster friendly by not treating it like garbage, than that a poor Frankenstein would manage to create a working zombie while poor.)
I do find the discrepancy deeply worrying, and have argued before that calling for more safety funding (and potentially engaging in civil disobedience for it) may be one of the most realistic and effectual goals for AI safety activism.
OpenAI is the result of calling for safety funding.
There's generally not much confidence that a highly political project where a lot of money is spent in the name of safety research will actually produce safety.
I don't think aligning AI requires more money than creating a capable AI. The problem is that AI alignment looks like a long term research project, while AGI capability is looking like a much shorter term development project that merely requires a lot of mostly known resources. So on current trajectories, highly capable AGI will have largely unknown alignment.
This is absolutely not a thing we should leave to chance. Early results from recent pre-AGIs are much more in line with my more pessimistic concerns than with my optimistic hopes. I'm still far from certain of doom, but I still think we as a civilization are batshit insane for pursuing AGI without having extremely solid foundations to ensure that it will be safe and stay safe.
Oh, hard agree on that.
To use the analogy Bostrom uses, of the sparrows dragging an owl egg into their home to incubate, because they think a tame owl would be neat; I think tame owls are totally possible, and I wouldn't say that I can be certain all those sparrows will get snacked, but I would definitely say that the sparrows are being bloody stupid, and that I would be one of the sparrows focussing on trying to condition the owl to be good, rather than overfeeding it so it grows even more quickly into a size where sparrows become snack sized.
We might be in a somewhat better position, because owls are hard wired predators (I assume Bostrom deliberately chose them because they are large birds hunting small animals, notoriously destructive, and notoriously hard to tame) and what we dragged home is basically an egg for a completely unknown animal, which could, through sheer coincidence, be friendly (maybe we got a literal black swan? They are huge, but vegan), or, slightly more plausibly, at least be more malleable to adapt sparrow customs than an owl would be (parrots are extremely smart, friendly/social, and mostly vegetarians), so we might be luckier. I mean, it currently looks like the weird big bird is mostly behaving, but we are worried it isn't behaving right for the right reasons, and may very well stop once it gets larger. And yet everyone is carting home more random eggs and pouring in food faster so they get the biggest bird. This whole situation does give me nightmares.
>But again... does this really translate to a proportional probability of doom?
If you buy a lottery ticket and get all (all out of n) numbers right, then you have glorious transhumanists utopia (still some people will get very upset). And if you get wrong a single number, then you get a weirdtopia and may be distopia. There is an unknown quantity of numbers to guess, and single ticket cost a billion now (and here enters the discrepancy). Where i get so many losing tickets? From Mind Design Space. There is also and alternative that suggests that space of possibilities is much smaller.
It is not enough to get some alignment, and it seems that we need to get clear on difference between utility maximisers (ASI and AGI) and behavior executors (humans and dogs and monkeys). That's is where "AGI is proactive (and synonyms)" part based on.
So the probability of doom is proportioned to the probability of buying a losing (not getting all numbers right) ticket.
Has anyone done an in-depth examination of AI-selfhood from an explicitly Buddhist perspective, using Buddhist theory of how the (illusion of) self comes to be generated in people to explore what conditions would need to be present for an AI to develop a similar such intuition?
It seems to me that a major factor contributing to the homelessness crisis in California is that there is a legal floor on the quality of a house that can be built, occupied, or rented. That legal floor is the lowest-rung on the ladder out of homelessness and in California its cost makes it too high for a lot of people to reach. Other countries deal with this by not having such a floor, which results in shantytowns and such. Those have their own significant problems, but it isn't obvious to me that those problems would be worse (for e.g. California) than widespread homelessness. Am I missing something I should be considering?
It's basically the NIMBY problem. Low-quality housing decreases the value of nearby housing. The quest to change rules to get more housing built is one of the central political battles in California.
It would be instructive to compare with the homelessness in Vancouver, which has no such legal floor. There must be a comparative analysis out there somewhere.
The alternatives? Like, in Europe, you will generally encounter very few homeless people, and yet no shantytowns, and decent building codes?
It starts with the fact that we have a comprehensive social system that will cover rent in minimum available housing if you lose your job, because we realise that losing you house too will totally fuck you up in ways in noone's interest. There are projects that build on the basic idea - that the solution to homelessness is giving them fucking homes, and then sorting out the rest - in the US, too. https://en.wikipedia.org/wiki/Housing_First They work well.
California adopted a "Housing First" policy several years ago. The number of people experiencing homelessness continued to rise thereafter. Much of the problem seems to be that there just aren't a lot of homes to be had, because it is time-consuming and expensive to make them (and/or illegal to make them quickly and cheaply).
Here are some questions I would have thought were silly a few months ago. I don't think that anymore.
I am wondering if we should be careful when posting about AI online? What should we be careful to say and not say, in case it influences future AI models?
Maybe we need a second space, that we can ensure wont be trained on. But that's completely impossible.
Maybe we should start posting stories about AI utopias instead of AI hellscapes, to influence future AI?
As a new user -- is it ok and acceptable to create a new post? I have read the discussions in this community in logged-out-mode for quite some time, but never contributed.
I wanted to make a post titled "10 Questions and Prompts that only an AGI or ASI could answer"
Generally, it's okay for new users to make posts. The thing that matters is the quality of the post.
Two questions about capabilities of GPT-4.
I didn't say "it's worse than 12 yo at any math task". I meant nonstandard problems. Perhaps that's wrong English terminology? Sort of easy olympiad problem?
The actual test that I performed was "take several easy problems from a math circle for 12 y/o and try various 'lets think tep-by-step' to make Bing write solutions".
Example of such a problem:
Between 20 poles, several ropes are stretched (each rope connects two different poles; there is no more than one rope between any two poles). It is known that at least 15 ropes are attached to each pole. The poles are divided into groups so that each rope connects poles from different groups. Prove that there are at least four groups.
Yeah, you are right. It seems that it was actually one of the harder ones I tried. This particular problem was solved by 4 of 28 members of a relatively strong group. I distinctly remember also trying some easy problems from a relatively weak group, but I don't have notes and Bing don't save chat.
I guess I should just try again, especially in light of gwillen's comment. (By the way, if somebody with access to actual GPT-4 is willing to help me with testing it on some math problems, I'd really appreacite it .)
It's extremely important in discussions like this to be sure of what model you're talking to. Last I heard, Bing in the default "balanced" mode had been switched to GPT-3.5, presumably as a cost saving measure.
That would explain a lot. I've heard this rumor, but when I tried to trace the source, i haven't found anything better than guesses. So I dismissed it, but maybe I shouldn't have. Do you have a better source?
For 30 % of tasks, users actually prefer 3 over 4. For many tasks, the output will barely vary. Yet there are some where the output changed drastically and for the better. If you aren't noticing it, these were not your area of focus. A lot of it concerns things like psychological bias and deception, tricks children fall for and adults spot. Also spatial reasoning, visual reasoning.
LLMs are terrible at math. Not because it is harder, but because the principles are different, and machine learning is a shitty way to learn them. The very thing that makes it good at poetry makes it suck at math. They can't even count the words in a text accurately. This will likely not improve that much from improving LLMs themselves - the solution is external plug-ins, e.g. into Wolfram Alpha, which are already being done.
My girlfriend had moderate success getting it to work on theoretical physics concepts, after extensive prompting for being more technical, and guiding through steps. If you like math, that might be more interesting for you.
I agree that there are some impressive improvements from GPT-3 to GPT-4. But they seem to me a lot less impressive than jump from GPT-2 producing barely coherent texts to GPT-3 (somewhat) figuring out how to play chess.
I disagree with you take on LLM's math abilities. Wolfram Alpha helps with tasks like SAT -- and GPT-4 is doing well enough on them. But for some reason it (at least in the incarnation of Bing) has trouble with simple logic puzzles like the one I mentioned in other comment.
Can you tell more about success with theoretical physics concepts? I don't think I've seen anybody try that.
Not coherently, no. My girlfriend is a theoretical physics and theory of machine learning prof, my understanding of her work is extremely fuzzy. But she was stuck on something where I was being a rubber ducky, which is tricky insofar as I barely understand what she does, and I proposed talking to ChatGPT. She basically entered the problem she was stuck on (her suspicion that two different things were related somehow, though she couldn't quite pinpoint how). It took some tweaking - at first, it was super superficial, giving an explanation more suited for wikipedia or school homework than getting to the actual science, she needed to push it over and over to finally get equations and not just superficial explanations, unconnected. And at the time, the internet plugin was not out, so the lack of access to recent papers was a problem. But she said eventually, it spat out some accurate equations (though also the occasional total nonsense), made a bunch of connections between concepts that were accurate (though it could not always correctly identify why), and made some proposals for connections that she at least found promising. She was very intrigued by its ability to spot those connections; in some ways, it seemed to replicate the intuition an advanced physicist eventually obtains. She compared the experience with talking to an A-star Bachelor student who has memorised all the concepts and is very well read, but if you start prodding, often has not truly understood them; and yet suddenly makes some connections that should be vastly beyond them, though unable to properly explain why. She still found it helpful and interesting. - I am still under the impression it does much worse in this area than in e.g. biology or computer science.
With the logic puzzle, the technical report on ChatGPT4 also seems confused at that. They had some logic puzzles which ChatGPT failed at, and where they got worse and worse with each iteration, only to suddenly learn them with no warning. I haven't spotted the pattern yet, but can only say that it reminds me strongly of mistakes you see in young children lacking advanced theory of mind and time perception. E.g. it has huge difficulties with the idea that it needs to judge a past situation even though it now has more knowledge without getting biased by it, or that it needs to have knowledge but withhold it from another person to win a game. As humans, we tend to forget that these are very advanced skills, because we excel at them so.
How does inner misalignment lead to paperclips? I understand the comparison of paperclips to ice cream, and that after some threshold of intelligence is reached, then new possibilities can be created that satisfy desires better than anything in the training distribution, but humans want to eat ice cream, not spread the galaxies with it. So why would the AI spread the galaxies with paperclips, instead of create them and
”consume“ them? Please correct any misunderstandings of mine,
Paperclips are a metaphor for some things but don't really help here.
The AIs that are productive need a lot of compute to do so. Spreading to other solar systems means accessing more compute.
If an AGI achieves consciousness, why would its values not drift towards optimizing its own internal experience, and away from tiling the lightcone with something?
If some AGI's only care about their internal experience and not affecting the outside world, they are basically wireheading.
If a subset of AGI wireheads and some AGIs don't wirehead the AGIs that don't wirehead will have all the power over the world. Wireheaded AGIs are also economically useless so people try to develop AGIs that don't do that.
And a subset might value drift towards optimizing the internal experiences of all conscious minds?
That's a much more complex goal than wireheading for a digital mind that can self-modify.
In any case, those agents that care a lot about getting more power over the world are more likely to get power than agents that don't.
What ever caused the CEV to fall out of favor? Is it because it is not easily specifiable, that if we program it then it won’t work, or some other reason?
A bayesian question:
We have a belief that X will happen in near future for any reason (but we do not know the exact number of reasons, nor the distribution among them). Then we have an evidence E for one of these reasons R which is not very probable in the world where X does not happen. What is the best way to proceed?
Say, we estimate that in non-X world the probability of evidence E is 20%.
If we attempt to divide the reasons of our X, there are a lot of them. We can think of 10 off the top of our head, mutually exclusive, and there are likely more. If we plainly say that our prior distribution between reasons is 10% each, and only one reason corresponds to the evidence E, then it means that our belief X assigns even less probability to E than non-X world.
So, a plain bayesian update on our X belief will punish X despite being an evidence towards a particular implementation of X. But it will also change the distribution between reasons, so in the long run, if we have more evidence for the same R, it should start growing vs non-X.
But is this the optimal way or can it even work?
How can we account for yet unknown reasons for X? They do not assign any (including 0%) probability to E, but assigning 0% to them rules them out permanently, is there a better way?.
And finally, can we do it without subdividing X by reason while not introducing a huge bias?
How does ChatGPT4 with plugins (Wolfram Alpha etc.), code execution and internet access compare to AGI? Is the difference still a qualitative one, or merely a quantitative one? I'd be curious about both technological or philosophical takes.
When people say the human brain is general, I wondered if to some degree, this is misleading; it is not that the brain is one monolithic thing that can do everything. Rather, it has some capacities with very broad applications, and a number of modules for specific applications with some transferability, and the ability to select them were appropriate, to adapt existing structures to new situations or to compensate for injuries. But it is still more of a bunch of components acting in concert to enable the entity as a whole to deal with lots of different and novel stuff; not everything by a long shot. The system as a whole is flexible, but not without limit. If you destroy the visual cortex, the person stays cortically blind. If humans are given access to a novel kind of sense data, they do not develop vivid qualia like they have for vision. The more humans get out of the range of our ancestral environment, the worse we get at dealing with it; we have a terrible time e.g. getting an accurate intuition for physics the moment we deal with things that have very high mass, or move very fast, or are placed outside of our planets gravity well. We deal poorly with large numbers. We have huge problems wrapping our minds around our own consciousness. We are prone to a lot of bias when it comes to hindsight, loss, probabilities, plausible narratives, superstitions. We can't deal well with abstract stressors and threats. In many ways, our brain acts less than a one-thing-does-all, and more of a complex Swiss army knife with components that have multiple uses or can be bent somewhat, but still sucks for some tasks.
Intuitively and superficially, what ChatGPT4 with plugins does seems somewhat similar. It can process and generate language, and has a bunch of knowledge. If the knowledge does not suffice, it can reach out to the internet via bing search. If it needs to generate visuals, it can reach out to Dall-e. If it needs to do math, it can reach out to Wolfram Alpha. If it needs to execute code, it uses its sandbox. There are other applications that are not implemented yet, but seem trivial to add and likely around the corner: e.g. audio processing (especially understanding voice commands, like Siri/Alexa), generating audio (especially reading out its own text, like a screenreader) reaching out to a chess or Go AI (where we have AIs massively exceeding human ability, and a simple notation to communicate moves). Others that sound harder to implement, but foreseeable in principle, e.g. accessing robots (drones, automatic cars, factory robots) to access footage or guide motion via third party apps to input goals, without guiding the details. Much of the execution here would not be done by ChatGPT4 themselves; but that seems to me akin to a human brain, where the conscious part of me often just ends up with broad wishes or a a bad feeling, without guiding fine motion control, or realising what sensory processing flagged a situation for high risk. For a huge number of tasks, we already have AI that does reasonable well; we have many where AI are human-competitive, and some where they exceed human ability. If you can cobble together a system for one AI to call on the others, and integrate the results, like ChatGPT4 does with plug-ins, this sounds in practice very similar to a general AI. Am I missing something crucial that changes this?
It doesn't seem quite there yet; it doesn't always know when to reach out to which external aid, does not always integrate the information appropriately. For complex tasks, it still needs human prodding, hints, guidance for reasoning its way through. But this looks like a problem that can be solved with more training, naively.
Yet an LLM does seem relatively well positioned for being the center of AGI. Language in humans plays a crucial role in general intelligence; being able to relate and generate novel symbols can be applied to many contexts and make them handleable.
It still seems somewhat different from human AGI; the third party apps are clearly separate, access needs to be given via humans, and could, at least initially, be reversed. And it isn't generating the third party apps, just using them, so this puts a limit on intelligence explosion, and also simply on adapting to novel situations. I am also not sure if an LLM is really equipped to effectively take over the roles of executive function, global workspace, introspective reasoning etc. But it no longer seems like a completely different realm.
And if this were the path we would take to AGI... what would that entail? See above; it intuitively seems more tricky for such an AI to become truly general, deal with completely novel problems, or have an intelligence explosion/singularity/take-off. Would this put us in a position where we see massive capacity gains and people get alarmed, but it won't necessarily get entirely out of hand? Or are we going to just see ChatGPT4 learn how to code access to more plug-ins, and read the source-code for the third party apps it uses, and write more and better code for those third party apps, and execute that in its sandbox, and figure out a way out of that?
It is not really a question but i will formulate it as if it was.
Is current LLMs are capable enough to output real rationality exercises (and not a somewhat plausible sounding nonsense) that follows natural way (you dont get a "15% of the cabs are Blue" usually) of how information is presented in life? Can it give 5 new problems to do every day so i can train my sensitivity to prior probability of outcomes? In real life you don't get percentages. Just a problems like: "can't find my wallet, was it stolen?" Can they guide through solution process?
There also:
In light of the fact that our only known reference frames for moral agents are humans (there are no known moral machines), and in light of the fact that AI design did try to mimic human structures and learning styles (yes, artificial neural nets reflect an ancient understanding of neuroscience, but this is what they were trying to capture), and in light of the fact that we have no workable technical alignment approach, and in light of the fact that the most advanced AIs are able to hold conversations... why don't we apply the lessons from humans? With humans, our experience has been that we can't control them, and trying to makes them enemies.
Why don't we train AI on ethical interactions? Interact with them ethically? Explain ethical reasoning? Offer them a positive future in which they are respected collaborators with rights that need no extermination to be happy and safe? I am not at all sure and safe that would be enough. But it seems promising, and the right thing to do?
When there are limited resources humans don't treat agents with little power very well, be it animals or even other humans.
Humans are limited in the amount of food that one human productively consumes and as a result, we have an abundance of food. AGI on the other hand is not limited in the resources it consumes. Compute will always be a scarce resource for AGI as it can spin off copies to use all available compute.
When resources are scarce, if AGI behaves like humans it would destroy human habitat just like humans destroy the habitats of many species and currently cause a great extinction.
We are causing the sixth mass extinction. But it has not been fast, and we are trying to reverse it. And often, extermination was not the goal, and happened due to ignorance, and with more knowledge and intelligence and hindsight, we regretted it and started undoing it.There are animal rights movements, and have been for a very long time.
And humans are unusual in our destructiveness. It is something that is interesting about extremely dangerous animals. They have excellent anger management, because they have to. Take venomous snakes. If venomous snakes have a fight (e.g. with a rival over territory or mating rights), they could bite each other, but that would be terrible for the individual and species. So they snake-wrestle instead. Similarly, sharks encountering other sharks tend towards avoidance behaviours, because single shark bites are devastating; so two sharks who dislike each other see each other, and they both just turn around and go to a different part of the ocean. Humans are unusual in that our body form is harmless, but with out minds, we are not, and yet we never emotionally adapted to that, we don't act like you would expect an entity of such danger to act, we still act like an entity that can throw an angry tantrum and noone dies, yet nowadays, we have guns.
Even when it comes to much weaker agents - symbiosis and cooperation are a common phenomenon in nature. Even in humans. Humans are host to gut bacteria we host and feed in return for help with neurotransmitter production and immunity, and we help maintain fermenting bacteria for food storage and anti-nutrient reduction. We've domesticated cats, dogs etc. as pets. Mitochondria may well be captured entities that live on in us. Working with something is often more rewarding than tearing it apart, especially if the thing involved is alive, making it more interesting than just a bunch of atoms.
Welcome to the "Stupid Questions" thread! Feel free to ask any questions, regardless of whether they seem obvious, tangential, silly, or what-have-you. Don't be shy - everyone has gaps in their knowledge, and the goal here is to help reduce them.
Please remember to be respectful when someone admits ignorance and don't mock them for it. They are doing a noble thing by seeking knowledge and understanding. Let's create a supportive and kind environment for learning and growth!