Here's a non-obvious way it could fail. I don't expect researchers to make this kind of mistake, but if this reasoning is correct, public access of such an AI is definitely not a good idea.
Also, consider a text predictor which is trying to roleplay as an unaligned superintelligence. This situation could be triggered even without the knowledge of the user by accidentally creating a conversation which the AI relates to a story about a rogue SI, for example. In that case it may start to output manipulative replies, suggest blueprints for agentic AIs, and mayb...
I disagree with your last point. Since we're agents, we can get a much better intuitive understanding of what causality is, how it works and how to apply it in our childhood. As babies, we start doing lots and lots of experiments. Those are not exactly randomized controlled trials, so they will not fully remove confounders, but it gets close when we try to do something different in a relatively similar situation. Doing lots of gymnastics, dropping stuff, testing the parent's limits etc., is what allows us to quickly learn causality.
LLMs, as they are curren...
I posted a somewhat similar response to MSRayne, with the exception that what you accidentally summon is not an agent with a utility function, but something that tries to appear like one and nevertheless tricks you into making some big mistake.
Here, what you get is a genuine agent which works across prompts by having some internal value function which outputs a different value after each prompt, and acts accordingly, if I understand correctly. It doesn't seem incredibly unlikely, as there is nothing in the process of evolution that necessarily has to make ...
That makes a lot of sense, thanks for the link. It is not as dangerous of a situation as a true agent AGI as this failure mode involves a (relatively stupid) user error. I trust researchers not to make that mistake, but it seems like there is no way to safely make those systems available to the public.
A way to make this more plausible I thought of after reading this is that of accidentally making it think it's hostile. Perhaps you make a joking remark about paperclip maximizers, or maybe it just so happens that the chat history is similar to the premise of...
Small remark; BIG-bench does include tasks on self-awareness, and I'd argue that it is a requirement for your definition "an AI that can do any cognitive tasks that humans can", as well as being generally important for problem solving. Being able to correctly answer the question "Can I do task X?" is evidence of self-awareness and is clearly beneficial.
Again, there seems to be an assumption in your argument which I don't understand. Namely, that a society/superintelligence which is intelligent enough to create a convincing simulation for an AGI would necessarily possess the tools (or be intelligent enough) to assess its alignment without running it. Superintelligence does not imply omniscience.
Maybe showing the alignment of an AI without running it is vastly more difficult than creating a good simulation. This feels unlikely, but I genuinely do not see any reason why this can't be the case. If we create ...
I don't follow. Why are you assuming that we could adequately evaluate the alignment of an AI system without running it if we were also able to create a simulation accurate enough to make the AI question what's real? This doesn't seem like it would be true necessarily.
I think the word "explainable" isn't really the best fit. What we really mean is that the model has to be able to construct theories of the world, and prioritize the ones which are more compact. An AI that has simply memorized that a stone will fall if it's exactly 5, 5.37, 7.8 (etc) meters above the ground is not explainable in that sense, whereas one that discovered general relativity would be considered explainable.
And yeah, at some point, even maximally compressed theories become so complex that no human can hope to understand them. But explainability ...
I agree it's irrelevant, but I've never actually seen these terms in the context of AI safety. It's more about how we should treat powerful AIs. Are we supposed to give them rights? It's a difficult question which requires us to rethink much of our moral code, and one which may shift it to the utilitarian side. While it's definitely not as important as AI safety, I can still see it causing upheavals in the future.
We should pause to note that even Clippy2 doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages.
I am trying to understand if this part was supposed to mock human exceptionalism or if this is the author's genuine opinion. I would assume it's the former, since I don't understand how you could otherwise go from describing various instances of it demonstrating consciousness to this, but there are jus...
The former. Aside from making fun of people who say things like "ah but DL is just X" or "AI can never really Y" for their blatant question-begging and goalpost-moving, the serious point there is that unless any of these 'just' or 'really' can pragmatically cash out as a permanently-missing fatal unworkable-around capability gaps (and they'd better start cashing out soon!), they are not just philosophically dubious but completely irrelevant to AI safety questions. If qualia or consciousness are just epiphenoma and you can have human or superhuman-level cap...
At this point I have to ask what exactly is meant by this. The bigger model beats the average human performance on the national math exam in Poland. Sure, the people taking this exam are usually not adults, but for many it may be where they peak in their mathematical abilities, so I wouldn't be surprised if it beats average human performance in the US. It's all rather vague though; looking at the MATH dataset paper all I could find regarding human performance was the following:
... (read more)