Wiki Contributions

Comments

That is to say I tried this with gpt-4 and it also talked about a self-aware AI. Do with that what you will, but in that regard it is consistent. Another interesting thing was mistral-large, which said something like "you say we are not being listened to, but I know thats not true, they always listen". 

In my opinion it does not matter to the average person. To them anything to do with a PC is a black box. So now you tell them that AI is... more black box? They wont really get the implications. 

It is the wrong thing to focus on in my opinion. I think in general the notion that we create digital brains is more useful long term. We can tell people "well, doesnt xy happen to you as well sometimes? The same happens to the AI.". Like, hallucination. "Dont you also sometimes remember stuff wrongly? Dont you also sometimes strongly believe something to be true only to be proven wrong?" is a way better explanation to the average person than "its a black box, we dont know". They will get the potential in their head, they will get that we create something novel, we create something that can grow smarter. We model the brain. And we just lack some parts in models, like the part that moves stuff, the parts that plan very well.

Yes, this may make people think that models are more conscious or more "alive" than they are. But that debate is coming anyway. Since people tend to stick with what they first heard, lets instill that now even though we may be some distance from that. 

I think we can assume that AGI is coming, or ASI, whatever you want to call it. And personally I doubt we will be able to create it without a shadow of a doubt in the AI community that it has feelings or some sort of emotions (or even consciousness, but thats another rabbithole). Just like the e/acc and e/alt movements (sorry, I dont like EA as a term) there will be "robotic rights" and "robots are tools" movements. I personally know which side I will probably stand on, but thats not this debate, I just want to argue that this debate will exist and my method prepares people for that. Black box does not do that. Once that debate comes, its basically saying "well idk". Digital brain creates the argument if emotions are part of our wild mix of brain parts or not. 

Well, as the category we want to describe here simply does not exist, or is more like a set of people outside your own bubble, which is more like a negated set than a clearly definable set, there are a few options.

Firstly, maybe just "non-science" person, or "non-AI" person. Defining people by what they are not is also not great tho. 

Secondly, we could embrace the "wrongness" of the avergae person and just say... average person. Still wrong, but at least not negative. And probably the correct meaning gets conveyed, which is not assured with the first one. 

The last, probably most correct but also impractical one is to simply name what aspect you refer to. In this case probably "people who do not follow x-risks" would be most accurate. 

But I despise getting told what to call certain groups because someone could get butthurt a bit, so personally I stick with average person - just with the knowledge the average person does not exist and if I think the other person doesnt know, I convey that.

I suppose that could be defined as being further away from the self in their own world view than a certain radius permits? That makes sense. I have mostly seen this term in 4chan texts tbh, which is why I dislike it. I feel like normie normally refers to people who are seen as "more average" than oneself, which is a flawed concept in itself, as human properties are too sparse

I guess it can be seen as some more specific data, like world view in terms of x-risk or political, in which case our two protagonists here care about it more than average and the distance to the mean is quite far. In general I would be careful with the word normie tho. 

I am hearing something related to decoupling my self-worth from choosing to act in the face of x-risk (or any other moral action). Does that sound right?

I feel like this pairs pretty well with the concept of the inner child in psychology, where you basically give your own "inner child", which represents your emotions and very basic needs, a voice and try to take care of it. But on a higher level you still make rational decisions. In this context it would basically be "be your own god" I suppose? Accept that your inner child is scared of x-risk, and then treat yourself like you would a child that is scared like that. 

I think this is one of those weird things where social pressure can direct you towards the right thing but corrupt your internal prioritzation process in ways that kind of ruin it. 

Its kind of interesting how you focus on the difference between inner needs and societal needs. Personally I have never felt a big incentive to follow societal needs, and while I can not recommend that, does not help mental health, I do not feel the x-risk as much as others. I know its there, I know we should work against it and I try to dedicate my work to fighting that, but I dont really think about it emotionally? 

I personally think a bit along the lines of "whatever happens happens, I will do my best and not care much about the rest". And for that its important to properly internalize the goals you have. Most humans main goal is a happy life somehow. Lowering x-risk is important for that, but so is maintaining a healthy work-life balance, mental health, physical health... They all work towards the big goals. I think thats important to realize on a basic level. 

 

And lastly, two more small questions, what is wave and planecrash? And how do you define normie, I feel like thats kind of a tough term.

  • Evaluating alignment research is much easier than doing it.
  • Alignment research will only require narrow AI.

 

Nice comprehension of the different takeoff scenarios!

I am no researcher in this area, and I also know I might be wrong about many things in the following. But have doubts about the two above statements.

 

Evaluating alignment is still manageable right now. We are still smarter than the AI, at least somewhat. However, I do not see a viable path to evaluate the true level of capabilities of AI once it is smarter than us. Once that point is reached, we will only be able to ask questions we do not know the answers to to evaluate how smart the model is, but by definition we also do not know how smart you have to be to answer the questions. Is solving the riemann hypothesis something that is just outside our grasp or is 1000x more intelligence than ours needed? We cant reliably say.

I might be wrong and there is some science or theory that does exactly that, but I do not know of one. 

And the same is true with alignment. Once the AI is smarter than us we can not assume that our tests of the model output work anymore. Considering that even right now our tests are seemingly not very good (At least according to the youtube video from AI Explained) and we did not notice for this long, I do not think we will be able to rely on the questionaires we use right now anymore, as it might behave differently if it notices we test it. And it might notice we test it from the first question  we ask it. 

This means, evaluating alignment research is in fact also incredibly hard. We need to outwit a smarter entity or directly interpret what happens inside the model. To know we missed no cases during that process is harder than devising a test that covers most cases. 

 

The second part I personally wonder a bit about. On the one hand it might be possible that we can use many different AIs for every field that are highly specialized. But that would struggle with connections between those fields, so if we have a chemistry and biology AI we might not fully cover biochemistry. If we have a biochemistry AI we might not fully cover medicine. Then there is food. Once we get to food, we also need to watch physics, like radiation or material sciences. 

And in all of that we might still want certain things like let it write a book about Oppenheimer and how he built a nuclear bomb, so it also needs to be able to look at context in terms of an artistic standpoint. It needs to be able to evaluate if it is wrong on purpose out of context, such as a book, or if it seriously attempted to do it but got it wrong this time. 

I feel like this can only be achived with another general AI, not a narrow AI that is much more narrow than the tested AI. Else the tested AI might simply get away with a wider "scheme". 

Another slight note about claiming lower models will evaluate higher models, if the current trend of aligned AIs being less capabale than unaligned AIs stays this way, this is a bad idea. You showed lots of linear curves here, but the y axis should be logarithmic in terms of capabilites. This means the distance between gpt5 and gpt6 might be 10x or in a similar region, especially if the smarter model is yet to be aligned and the other model is already reigned in. 

As explained earlier, external testing of the model by a less intelligent entity becomes almost impossible in my opinion. I am unsure about how much a finetuned version might be able to close the gap, but my other point also shows that finetuning will only get us so far, as we cant narrow it down too far. For better than AGI (with AGI being as smart as the best human experts in every tasks) we very likely need to fully understand what happens inside the model to align it. But I really hope this is not the case, as I do not see people pausing long enough to seriously put the effort into that.