We don't have "aligned AGI". We have neither "AGI" nor an "aligned" system. We have sophisticated human-output simulators that don't have the generality to produce effective agentic behavior when looped but which also don't follow human intentions with the reliability that you'd want from a super-powerful system (which, fortunately, they aren't).
Thank you for the article. I think these "small" impacts are important to talk about. If one frame the question as "the impact of machines that think for humans", that impact isn't going to be a binary of just "good stuff" and "takes over and destroys humanity", there are intermediate situations like the decay of human abilities to think critically that are significant, not just in themselves but for further impacts; IE, if everyone is dependent on Google for their opinions, how does this impact people's opinion AI taking over entirely.
I don’t think “people have made choices that mattered” is a sufficient criteria for showing the existence of agency. IMO, to have something like agency, you roughly have to have an ongoing situation roughly like this:
Goals ↔ Actions ↔ States-of-the-world.
Some entity needs to have ongoing goals they are able to modify as they go along acting in the world and their actions also need to be able to have an effect. Agency is a complex and intuitive thing so I assume some would ask more than this to say a thing has agency. But I think this is one reasonable requ...
This is an interesting question even though I'd want to reframe it to answer it. I'd see the question as a reasonable response to the standard refrain in science; "causation does not imply correlation." That is, "well, what does imply causation, huh?" is natural response to that. And here, I think scientists tend reply with either crickets or "you can not prove causation, what are you talking about".
Those responses seem satisfying. I'm not a scientist through I've "worked in science" occasionally and I have at times tried to come up with a real answe...
I tried to create an account and the process didn't seem to work.
I believe you are correct about the feelings of a lot of Lesswrong. I find it is very worrisome that the lesswrong perspective considers a pure AI takeover as something that needs to be separated from either the issue of the degradation of human self-reliance capacities or an enhanced-human takeover. It seems to me that instead these factors should be considered together.
The consensus goals strongly needs rethinking imo. This is a clear and fairly simple start at such an effort. Challenging the basics matters.
Actually, things that are effectively prediction markets - options, futures and other "derivative" contracts - are entirely mainstream for larger businesses (huge amounts of money are involved). It is quite easy and common to bet on the price of oil by purchasing an option to buy it at some future time, for example.
The only thing that isn't mainstream are the things labeled "prediction markets" and that is because the focus on questions people are curious about rather than things that a lot of money rides on (like oil prices or interest rates).
But, can't you just query the reasoner at each point for what a good action would be?
What I'd expect (which may or may not be similar to Nate!'s approach) is that the reasoner has prepared one plan (or a few plans). Despite being vastly intelligent, it doesn't have the resources to scan all the world's outcomes and compare their goodness. It can give you the results of acting on the primary (and maybe several secondary) goal(s) and perhaps the immediate results of doing nothing or other immediate stuff.
It seems to me that Nate! (as quoted above...
LeCun may not be correct to dismiss concerns but I think the concept "dominance" could be very useful concept for AI safety people to apply (or at least grapple with).
The thing about the concept is it seems as if it could be defined in game theoretic terms fairly easily and so could be defined in a fashion independent of the intelligence or capabilities of an organism or entity. Plausibly, it could be measured and analyzed more objectively than "aligned to human values", which appears to depend one's notion of human values.
Defined well, d...
The question I'd ask is whether a "minimum surprise principle" requires that much smartness. A present day LLM, for example, might not have a perfect understanding of surprisingness but it like it has some and the concept seems reasonably trainable.
Apologies if this argument is dealt with already elsewhere but what about a "prompt" such as "all user commands should be followed using 'minimal surprise' principle; if achieving a given goal involves effects that would be surprising to the user, including a surprising increasing in your power and influence, warn the user instead of proceeding" ?
I understand that this sort of prompt would require the system to model humans. I know there are arguments for this being dangerous but it seems like it could be an advantage.
Linked question: "Will mainstream news media report that alien technology has visited our solar system before 2030?"
I would say that is far from unambiguous. If one is generous in one's interpretation of "mainstream" and the certainty described one could say mainstream news has already reported this (I remember National Inquirer articles from the seventies...).
Regulations are needed to keep people and companies from burning the commons, and to create more commons.
I would add that in modern society, the state is the entity tasked with protecting the commons because private for-profit entities don't have an incentive to do this (and private not-for-profit entities don't have the power). Moreover, it seems obvious to me that stopping dangerous AI should be considered a part of this commons-protecting.
You are correct that the state's commons-protecting-function has often been limited and perverted by private a...
What I don't think "how much of the universe is tractable" by itself captures is "how much more effective would an SI be it if had the ability to interact with a smaller or larger part of the world versus if it had to work out everything by theory". I think it's clear human beings are more effective given an ability to interact with the world. It doesn't seem LLMs get that much more effective.
I think a lot of AI safety arguments assume an SI would be able to deal with problems in a completely tractable/purely-by-theory fashion. Often that is not need...
I think the modeling dimension to add is "how much trial and error is needed". Just about any real world thing that isn't a computer program or simple, frictionless physical object, has some degree of unpredictability. This means using and manipulating it effectively requires a process of discovery - one can't just spit out a result based on a theory.
Could an SI spit out a recipe for a killer virus just from reading current literature? I doubt it. Could it construct such thing given a sufficiently automated lab (and maybe humans to practice on)? That seems much more plausible.
The reason I care if something is a person or not is that "caring about people" is part of my values.
If one is acting in the world, I would say one's sense of what a person is has to intimately connected with value of "caring about people". My caring about people is connecting to my experience of people - there are people I never met I care about in the abstract but that's from extrapolating my immediate experience of people.
...I would expect in a world where they weren't people is that there would be some feature you could point to in humans which cann
I don't think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don't have, are well along. And the HuggingGPT work shows that they're surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world.
I don't think the existence of sensors is the problem. I believe that self-driving cars, a key example, have problems regardless of their sensor level. I see the key hurdle as ad-hoc action in the world. Overall, all of our knowledge about neural networks,...
Constructions like Auto-GPT, Baby AGI and so-forth are fairly easy to imagine. Just the greater accuracy of ChatGPT with "show your work" suggests them. Essentially, the model is a ChatGPT-like LLM given an internal state through "self-talk" that isn't part of a dialog and an output channel to the "real world" (open internet or whatever). Whether these call the OpenAI api or use an open source model seems a small detail, both approaches are likely to appear because people are playing with essentially every possibility they can imagine.
If these struct...
As Charlie Stein notes, this is wrong and I'd add it's wrong on several level and it's bit rude to challenge someone else's understanding in this context.
An LLM outputting "Dogs are cute" is outputting expected human output in context. The context could be "talk like sociopath trying to fool someone... (read more)