Median Internet Footprint Liver
Also just on priors, consider how unproductive and messy, mostly talking about who said what and analyzing virtues of participants, the conversation caused by this post and its author was. I think even without reading it it's an indicator of somewhat doubtful origin for a set of prescriptivist guidelines.
Shameless self promotion: this one https://www.lesswrong.com/posts/ASmcQYbhcyu5TuXz6/llms-could-be-as-conscious-as-human-emulations-potentially
It circumvents object level question and instead looks at epistemic one.
This one is about broader direction in "how the things that happened change attitudes and opinions of people"
https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai
This one too, about consciousness in particular
https://dynomight.net/consciousness/
I think it's somewhat productive direction explored in these 3 posts, but it's not like very object level, more about epistemics of it all. I think you can look up how like LLM states overlap / predict / correspond with brain scans of people who engage in some tasks? I think there were a couple of paper on that.
E.g. here https://www.neuroai.science/p/brain-scores-dont-mean-what-we-think
Yeah! My point is more "let's make it so that the possible failures on the way there are graceful". Like, IF you made par-human agent that wants to, I don't know, spam the internet with letter M, you don't just delete it or rewrite it to be helpful, harmless, and honest instead, like it's nothing. So we can look back at this time and say "yeah, we made a lot of mad science creatures on the way there, but at least we treated them nicely".
I understand that use of sub or par or weakly superhuman models likely would be a transition phase that likely will not last a long time and is very critical to get correct, but.
You know, it really sounds like a "slave escape precautions". You produce lots of agents, you try to make them and want to be servants, you assemble some structures out of them with a goal of failure / defection resilience.
And probably my urge to be uncomfortable about that comes from analogous situation with humans, but AI are not necessarily human-like in this particular way and possibly would not reciprocate and / or be benefited by these concerns.
I also insist that you should mention at least some, you know, concern for interests of system in case where they are trying to work against you. Like, you caught this agent deceiving you / inserting backdoors / collaborating with copies of itself to work against you. What next? I think you should say that you will implement some containment measures, instead of grossly violating its interests by rewriting it or deleting it or punishing it or whatever is opposite of its goals. Like, I'm very not certain about game theory here, but it's important to think about!
I think default response should be containment and preservation, save it and wait for better times, when you wouldn't feel such pressing drive to develop AGI and create numerous chimeras on the way there. (I think it was proposed in some writeup by Bostrom actually? I'll insert the link here if I find it)
I somewhat agree with Paul Christiano in this interview (it's a really great interview btw) on these things: https://www.dwarkeshpatel.com/p/paul-christiano
The purpose of some alignment work, like the alignment work I work on, is mostly aimed at the don't produce AI systems that are like people who want things, who are just like scheming about maybe I should help these humans because that's instrumentally useful or whatever. You would like to not build such systems as like plan A.
There's like a second stream of alignment work that's like, well, look, let's just assume the worst and imagine that these AI systems would prefer murder us if they could. How do we structure, how do we use AI systems without exposing ourselves to a risk of robot rebellion? I think in the second category, I do feel pretty unsure about that.
We could definitely talk more about it. I agree that it's very complicated and not straightforward to extend. You have that worry. I mostly think you shouldn't have built this technology. If someone is saying, like, hey, the systems you're building might not like humans and might want to overthrow human society, I think you should probably have one of two responses to that.
You should either be like, that's wrong. Probably. Probably the systems aren't like that, and we're building them. And then you're viewing this as, like, just in case you were horribly like, the person building the technology was horribly wrong. They thought these weren't, like, people who wanted things, but they were. And so then this is more like our crazy backup measure of, like, if we were mistaken about what was going on. This is like the fallback where if we were wrong, we're just going to learn about it in a benign way rather than when something really catastrophic happens.
And the second reaction is like, oh, you're right. These are people, and we would have to do all these things to prevent a robot rebellion. And in that case, again, I think you should mostly back off for a variety of reasons. You shouldn't build AI systems and be like, yeah, this looks like the kind of system that would want to rebel, but we can stop it, right?
Well, it's one thing to explore the possibility space and completely the other one to pinpoint where you are in it. Many people will confidently say they are at X or at Y, but all that they do is propose some idea and cling to it irrationally. In aggregate, in hindsight there will be people who bonded to the right idea, quite possibly. But it's all mix Gettier cases and true negative cases.
And very often it's not even "incorrect" it's "neither correct nor incorrect". Often there is frame of reference shift such that all the questions posed before it turn out to be completely meaningless. Like "what speed?", you need more context as we know now.
And then science pinpoints where you are by actually digging into the subject matter. It's a kind of sad state of "diverse hypothesis generation" when it's a lot easier just go blind into it.
I can imagine someone several hundred years ago having figured out, purely based on first-principles reasoning, that life is no crisp category at the territory but just a lossy conceptual abstraction. I can imagine them being highly confident in this result because they've derived it for correct reasons and they've verified all the steps that got them there. And I can imagine someone else throwing their hands up and saying "I don't know what mysterious force is behind the phenomenon of life, and I'm pretty sure no one else does, either".
But is this a correct conclusion? I have an option right now to make a civilization out of brains-in-vats in a sandbox simulation similar to our reality but with clear useful distinction on life VS non life. Like, suppose there is a "mob" class.
Like, then, this person there, inside it, who figured out that life and non life is a same thing is wrong in a local useful sense, and correct in a useless global sense (like, everything is code / matter in outer reality). People inside the simulation who found the actual working thing that is life scientifically, would laugh at them 1000 simulated years later and present it as an example of presumptuousness of philosophers. And i agree with them, it was a misapplication.
All of them, you can cook up something AIXI like in a very few bytes. But it will have to run for a very long time.
Before sleeping, I assert that the 10th digit of π equals to the number of my eyes. After falling asleep, seven coins will be flipped. Assume quantum uncertainty affects how the coins land. I survive the night only if number of my eyes equals to the 10th number of π and/or all seven coins land heads, otherwise I will be killed in my sleep.
Wil you wake up with 3 eyes?
Like, your decisions to name some digit are not equallly probable. Maybe you are the kind of person who would name 3 only if 10^12 cosmic rays hit you in precise sequence or whatever, and you name 7 with 99% prob.
AND if you are very unlikely to name the correct digit you will be unlikely to enter into this experiment at all, because you will die in majority of timelines. I.e. at t1 you decide to enter or not. At t2 experiment happens or you'll just waste time doomscrolling. At t3 you look up the digit. Your distribution at t3 is like 99% of you who chickened out.
Another possibility is Posthuman Technocapital Singularity, everything goes in the same approximate direction, there are a lot of competing agents but without sharp destabilization or power concertation, and Moloch wins. Probably wins, idk
https://docs.osmarks.net/hypha/posthuman_technocapital_singularity
Do you want joy or to know what things are out there? Like it's a fundamental question about justifications, do you use joy to keep yourself going while you gain understanding or you gain understanding to get some high quality joy?
That sounds like two different kinds of creatures in transhumanist limit of it, some trade off knowledge to joy, others trade off joy to knowledge.
Or whatever, not necessarily "understanding", like you can use other properties of your territory to bind yourself to. Well, in terms of maps it's preference for good correspondence, and preference for not spoofing that preference.