All of fiso64's Comments + Replies

fiso6475

The model’s performance is still well below human performance

At this point I have to ask what exactly is meant by this. The bigger model beats the average human performance on the national math exam in Poland. Sure, the people taking this exam are usually not adults, but for many it may be where they peak in their mathematical abilities, so I wouldn't be surprised if it beats average human performance in the US. It's all rather vague though; looking at the MATH dataset paper all I could find regarding human performance was the following:

Human-Level Perform

... (read more)
7Daniel Paleka
They test on the basic (Poziom podstawowy) Matura tier for testing on math problems. In countries with Matura-based education, the basic tier math test is not usually taken by mathematically inclined students -- it is just the law that anyone going to a public university has to pass some sort of math exam beforehand. Students who want to study anything where mathematics skills are needed would take the higher tier (Poziom rozszezony). Can someone from Poland confirm this? A quick estimate of the percentage of high-school students taking the Polish Matura exams is 50%-75%, though. If the number of students taking the higher tier is not too large, then average performance on the basic tier corresponds to essentially average human-level performance on this kind of test.  Note that many students taking the basic math exam only want to pass and not necessarily perform well; and some of the bottom half of the 270k students are taking the exam for the second or third time after failing before.
fiso6410

Here's a non-obvious way it could fail. I don't expect researchers to make this kind of mistake, but if this reasoning is correct, public access of such an AI is definitely not a good idea.

Also, consider a text predictor which is trying to roleplay as an unaligned superintelligence. This situation could be triggered even without the knowledge of the user by accidentally creating a conversation which the AI relates to a story about a rogue SI, for example. In that case it may start to output manipulative replies, suggest blueprints for agentic AIs, and mayb... (read more)

fiso64110

I disagree with your last point. Since we're agents, we can get a much better intuitive understanding of what causality is, how it works and how to apply it in our childhood. As babies, we start doing lots and lots of experiments. Those are not exactly randomized controlled trials, so they will not fully remove confounders, but it gets close when we try to do something different in a relatively similar situation. Doing lots of gymnastics, dropping stuff, testing the parent's limits etc., is what allows us to quickly learn causality.

LLMs, as they are curren... (read more)

1Canaletto
Well maybe llms can "experiment" on their dataset by assuming something about it and then being modified if they encounter counterexample.   I think it vaguely counts as experimenting.
6Owain_Evans
I agree my last point is more speculative. The question is whether vast amounts of pre-trained data + a smaller amount of finetuning by online RL substitutes for the human experience. Given the success of pre-training so far, I think it probably will. Note that the modern understanding of causality in stats/analytic philosophy/Pearl took centuries of intellectual progress -- even if it seems straightforward. Spurious causal inference seems ubiquitous among humans unless they have learned -- by reading/explicit training -- about the modern understanding. Your examples from human childhood (dropping stuff) seem most relevant to basic physics experiments and less to stochastic relationships between 3 or more variables.
fiso6420

I posted a somewhat similar response to MSRayne, with the exception that what you accidentally summon is not an agent with a utility function, but something that tries to appear like one and nevertheless tricks you into making some big mistake.

Here, what you get is a genuine agent which works across prompts by having some internal value function which outputs a different value after each prompt, and acts accordingly, if I understand correctly. It doesn't seem incredibly unlikely, as there is nothing in the process of evolution that necessarily has to make ... (read more)

fiso6420

That makes a lot of sense, thanks for the link. It is not as dangerous of a situation as a true agent AGI as this failure mode involves a (relatively stupid) user error. I trust researchers not to make that mistake, but it seems like there is no way to safely make those systems available to the public.

A way to make this more plausible I thought of after reading this is that of accidentally making it think it's hostile. Perhaps you make a joking remark about paperclip maximizers, or maybe it just so happens that the chat history is similar to the premise of... (read more)

1MSRayne
Yeah, exactly. That said I don't think the event in the story is a "stupid" user error. It's quite a reasonable one. Suppose nobody considered this problem and this language model was installed in a next-gen smart home assistant, and someone asked it to order them the best possible pizza... in general, I think it's dangerous to assume anyone is "smart enough" to avoid anything, because if common sense was common the world would make more sense.
fiso6470

Small remark; BIG-bench does include tasks on self-awareness, and I'd argue that it is a requirement for your definition "an AI that can do any cognitive tasks that humans can", as well as being generally important for problem solving. Being able to correctly answer the question "Can I do task X?" is evidence of self-awareness and is clearly beneficial.

2RomanS
I think you're right on both points.  Although I'm not sure if self-awareness is necessary to surpass humans at all cognitive tasks. I can imagine a descendant of GPT that completely fails the self-awareness benchmarks, yet is able to write the most beautiful poetry, conduct Nobel-level research in physics, and even design a superior version of itself. 
fiso6440

Again, there seems to be an assumption in your argument which I don't understand. Namely, that a society/superintelligence which is intelligent enough to create a convincing simulation for an AGI would necessarily possess the tools (or be intelligent enough) to assess its alignment without running it. Superintelligence does not imply omniscience.

Maybe showing the alignment of an AI without running it is vastly more difficult than creating a good simulation. This feels unlikely, but I genuinely do not see any reason why this can't be the case. If we create ... (read more)

fiso6490

I don't follow. Why are you assuming that we could adequately evaluate the alignment of an AI system without running it if we were also able to create a simulation accurate enough to make the AI question what's real? This doesn't seem like it would be true necessarily.

0RHollerith
I will try to explain (probably via a top-level post, probably not today). For now, I will restate my position. No superintelligence (SI) that can create programs at all will run any program it has created to get evidence about whether the program is aligned with the SI's values or interests: the SI already knows that before the program runs for the first time. The nature of the programming task is such that if you can program well enough, there's essentially no uncertainty about the matter (barring pathological cases that do not come up in practice unless the SI is in a truly dire situation in which an adversary is messing with core pieces of its mind) similar to how (barring pathological cases) there's no uncertainty about whether a theorem is true if you have a formal proof of the theorem. The qualifier "it has created" above is there only because an SI might find itself in a very unusual situation in which it is in its interests to run a program deliberately crafted (by someone else) to have the property that the only practical way for anyone to learn what the SI wants to learn about the program is to run it. Although I acknowledge that such programs definitely exist, the vast majority of programs created by SIs will not have that property. ---------------------------------------- Are you curious about this position mostly for its own sake or mostly because it might shed light on the question of how much hope there is for us in an SI's being uncertain about whether it is in a simulation?
fiso6420

I think the word "explainable" isn't really the best fit. What we really mean is that the model has to be able to construct theories of the world, and prioritize the ones which are more compact. An AI that has simply memorized that a stone will fall if it's exactly 5, 5.37, 7.8 (etc) meters above the ground is not explainable in that sense, whereas one that discovered general relativity would be considered explainable.

And yeah, at some point, even maximally compressed theories become so complex that no human can hope to understand them. But explainability ... (read more)

1George3d6
My view is along these lines, and see first link for an interesting example vis a vis this (start at min 17, or just read the linked paper in the description)
fiso6420

I agree it's irrelevant, but I've never actually seen these terms in the context of AI safety. It's more about how we should treat powerful AIs. Are we supposed to give them rights? It's a difficult question which requires us to rethink much of our moral code, and one which may shift it to the utilitarian side. While it's definitely not as important as AI safety, I can still see it causing upheavals in the future.

fiso6430

We should pause to note that even Clippy2 doesn’t really think or plan. It’s not really conscious. It is just an unfathomably vast pile of numbers produced by mindless optimization starting from a small seed program that could be written on a few pages. 

I am trying to understand if this part was supposed to mock human exceptionalism or if this is the author's genuine opinion. I would assume it's the former, since I don't understand how you could otherwise go from describing various instances of it demonstrating consciousness to this, but there are jus... (read more)

gwern160

The former. Aside from making fun of people who say things like "ah but DL is just X" or "AI can never really Y" for their blatant question-begging and goalpost-moving, the serious point there is that unless any of these 'just' or 'really' can pragmatically cash out as a permanently-missing fatal unworkable-around capability gaps (and they'd better start cashing out soon!), they are not just philosophically dubious but completely irrelevant to AI safety questions. If qualia or consciousness are just epiphenoma and you can have human or superhuman-level cap... (read more)