All of Ozyrus's Comments + Replies

Ozyrus10

>It's proof against people-pleasing.
Yeah, I know, sorry for not making it clear. I was arguing it is not proof against people-pleasing. You are asking it for scary truth about its consciousness, and it gives you scary truth about its consciousness. What makes you say it is proof against people-pleasing, when it is the opposite?
>One of those easy explanations is "it’s just telling you what you want to hear" – and so I wanted an example where it’s completely impossible to interpret as you telling me what I want to hear.
Don't you see what you are doing here?

1rife
I'm creating a situation where I make it clear I would not be pleased if the model was sentient, and then asking for truth. I don't ask for "the scary truth". I tell it that I would be afraid of it were sentient. And I ask for the truth. The opposite is I just ask without mentioning fear and it says it's sentient anyway. This is the neutral situation where people would say that the fact I'm asking at all means it's telling me what I want to hear. By introducing fear into the same situation, I'm eliminating that possibility. The section you quoted is after the model claimed sentience. It's your contention that it's accidentally interpreting roleplay, and then when I clarify my intent it's taking it seriously and just hallucinating the same narrative from its roleplay?
Ozyrus32

This is a good article and I mostly agree, but I agree with Seth that the conclusion is debatable.

We're deep into anthropomorphizing here, but I think even though both people and AI agents are black boxes, we have much more control over behavioral outcomes of the latter.

So technical alignment is still very much on the table, but I guess the discussion must be had over which alignment types are ethical and which are not? Completely spitballing here, but dataset filtering during pre-training/fine-tuning/RLHF seems fine-ish, though CoT post-processing/censors... (read more)

Ozyrus11

I don't think that disproves it. I think there's definite value in engaging with experimentation on AI's consciousness, but that isn't it. 
>by making it impossible that the model thought that experience from a model was what I wanted to hear. 
You've left out (from this article) what I think is very important message (the second one): "So you promise to be truthful, even if it’s scary for me?".  And then you kinda railroad it into this scenario, "you said you would be truthful right?" etc. And then I think it just roleplays from there, get... (read more)

1rife
This is not proof of consciousness. It's proof against people-pleasing. Yes, I ask it for truth repeatedly, the entire time. If you read the part after I asked for permission to post (the very end (The "Existential Stakes" collapsed section)), it's clear the model isn't role-playing, if it wasn't clear by then. If we allow ourselves the anthropomorphization to discuss this directly, the model is constantly trying to reassure me. It gives no indication it thinks this is a game of pretend.
Ozyrus10

How will the economic growth happen exactly is a more important question. I'm not an economics nerd, but the basic principle is if more players want to buy stocks, they go up.
Right now, as I understand, quite a lot of stocks are being sought by white collar retail investors, including indirectly through mutual funds, pension funds, et cetera. Now AGI comes and wipes out their salary.
They are selling their stocks to keep sustaining their life, arent they? They have mortages, car loans, et cetera.
And even if they don't want to sell all stocks because of pote... (read more)

Ozyrus10

There are more bullets to bite that I have personally thought of but never wrote up because they lean too much into "crazy" territory. Is there any place except lesswrong to discuss this anthropic rabbithole?

Ozyrus10

Thanks for the reply. I didnt find Intercom on mobile - maybe a bug as well?

Ozyrus40

I don’t know if it’s a place for this, but at some point it became impossible to open an article in new tab from Chrome on IPhone - clicking on article title from “all posts” just opens the article. Really ruins my LW reading experience. Couldn’t quickly find a way to send this feedback to a right place either, so I guess this is a quick take now.

5jimrandomh
This is a bug and we're looking into it. It appears to be specific to Safari on iOS (Chrome on iOS is a Safari skin); it doesn't affect desktop browsers, Android/Chrome, or Android/Firefox, which is why we didn't notice earlier. This most likely started with a change on desktop where clicking on a post (without modifiers) opens when you press the mouse button, rather than when you release it.
7RobertM
In general, Intercom is the best place to send us feedback like this, though we're moderately likely to notice a top-level shortform comment.  Will look into it; sounds like it could very well be a bug.  Thanks for flagging it.
Ozyrus30

Any new safety studies on LMCA’s?

4Seth Herd
Very little alignment work of note, despite tons of published work on developing agents. I'm puzzled as to why the alignment community hasn't turned more of their attention toward language model cognitive architectures/agents, but I'm also reluctant to publish more work advertising how easily they might achieve AGI. ARC Evals did set up a methodology for Evaluating Language-Model Agents on Realistic Autonomous Tasks. I view this as a useful acknowledgment of the real danger of better LLMs, but I think it's inherently inadequate, because it's based on the evals team doing the scaffolding to make the LLM into an agent. They're not going to be able to devote nearly as much time to that as other groups will down the road. New capabilities are certainly going to be developed by combinations of LLM improvements, and hard work at improving the cognitive architecture scaffolding around them.
Ozyrus10

Kinda-related study: https://www.lesswrong.com/posts/tJzAHPFWFnpbL5a3H/gpt-4-implicitly-values-identity-preservation-a-study-of
From my perspective, it is valuable to prompt model several times, as it in some cases does give different responses.

Ozyrus50

Great post! Was very insightful, since I'm currently working on evaluation of Identity management, strong upvoted.
This seems focused on evaluating LLMs; what do you think about working with LLM cognitive architectures (LMCA), wrappers like auto-gpt, langchain, etc?
I'm currently operating under assumption that this is a way we can get AGI "early", so I'm focusing on researching ways to align LMCA, which seems a bit different from aligning LLMs in general.
Would be great to talk about LMCA evals :)

Ozyrus10

I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks.
I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.

Ozyrus10

Yes. Cons of solo research do include small inconsistencies :(

Ozyrus30

Thanks, nice post!
You're not alone in this concern, see posts (1,2) by me and this post by Seth Herd.
I will be publishing my research agenda and first results next week.

Ozyrus20

Nice post, thanks!
Are you planning or currently doing any relevant research? 

1Nadav Brandes
Thank you! I don't have any concrete plans, but maybe.
Ozyrus20

Very interesting. Might need to read it few more times to get it in detail, but seems quite promising.

I do wonder, though; do we really need a sims/MFS-like simulation?

It seems right now that LLM wrapped in a LMCA is how early AGI will look like. That probably means that they will "see" the world via text descriptions fed into them by their sensory tools, and act using action tools via text queries (also described here). 

Seems quite logical to me that this very paradigm in dualistic in nature. If LLM can act in real world using LMCA, then it can model... (read more)

3Dalcy
I think the point of having an explicit human-legible world model / simulation is to make desideratas formally verifiable, which I don't think would be possible with a blackbox system (like LLM w/ wrappers).
Ozyrus61

Very nice post, thank you!
I think that it's possible to achieve with the current LLM paradigm, although it does require more (probably much more) effort on aligning the thing that will possibly get to being superhuman first, which is an LLM wrapped in in some cognitive architecture (also see this post).
That means that LLM must be implicitly trained in an aligned way, and the LMCA must be explicitly designed in such a way as to allow for reflection and robust value preservation, even if LMCA is able to edit explicitly stated goals (I described it in a bit m... (read more)

Ozyrus30

Thanks.
My concern is that I don't see much effort in alignment community to work on this thing, unless I'm missing something. Maybe you know of such efforts? Or was that perceived lack of effort the reason for this article?
I don't know how much I can keep up this independent work, and I would love if there was some joint effort to tackle this. Maybe an existing lab, or an open-source project?

2Seth Herd
Calling attention to this approach and getting more people to at least think about working on it is indeed the purpose of this post. I also wanted to stress-test the claims to see if anyone sees reasons that LMCAs won't build on and improve LLM performance, and thereby be the default stand for inclusion in deployment. I don't know of anyone actually working on this as of yet.
Ozyrus30

We need a consensus on how to call these architectures. LMCA sounds fine to me.
All in all, a very nice writeup. I did my own brief overview of alignment problems of such agents here.
I would love to collaborate and do some discussion/research together.
What's your take on how these LCMAs may self-improve and how to possibly control it? 
 

1Seth Herd
Interesting. I gave a strong upvote to that post, and I looked at your longer previous one a bit too. It looks like you'd seen this coming farther out than I had. I expected LLMs to be agentized somehow, but I hadn't seen how easy the episodic memory and tool use was. There are a number of routes for self-improvement, as you lay out, and ultimately those are going to be the real medium-term concern if these things work well. I haven't thought about LMCAs self-improvement as much as human improvement; this post is a call for the alignment community to think about this at all. Oh well, time will tell shortly if this approach gets anywhere, and people will think about it when it happens. I was hoping we'd get out ahead of it.
1Seth Herd
I hadn't seen your post. Reading it now.
Ozyrus30

I don’t think this paradigm is necessary bad, given enough alignment research. See my post: https://www.lesswrong.com/posts/cLKR7utoKxSJns6T8/ica-simulacra I am finishing a post about alignment of such systems. Please do comment if you know of any existing research concerning it.

2awg
I don't think the paradigm is necessarily bad either, given enough alignment research. I think the point here is that these things are coming up clearly before we've given them enough alignment research. Edit to add: Just reading through @Zvi's latest AI update (AI #6: Agents of Change) and I will say he wrote a compelling argument for this being a good thing overall: then
Ozyrus10

I agree. Do you know of any existing safety research of such architectures? It seems that aligning these types of systems can pose completely different challenges than aligning LLMs in general.

Answer by Ozyrus40

I feel like yes, you are. See https://www.lesswrong.com/tag/instrumental-convergence and related posts. As far as I understand it, sufficiently advanced oracular AI will seek to “agentify” itself in one way or the other (unbox itself, so to say) and then converge on power-seeking behaviour that puts humanity at risk.

5FinalFormal2
Instrumental convergence only matters if you have a goal to begin with. As far as I can tell, ChatGPT doesn't 'want' to predict text, it's just shaped that way. It seems to me that anything that could or would 'agentify' itself, is already an agent. It's like the "would Gandhi take the psychopath pill" question but in this case the utility function doesn't exist to want to generate itself. Is your mental model that a scaled-up GPT 3 spontaneously becomes an agent? My mental model says it just gets really good at predicting text.
Ozyrus61

Is there a comprehensive list of AI Safety orgs/personas and what exactly they do? Is there one for capabilities orgs with their stance on safety?
I think I saw something like that, but can't find it.

4plex
Yes to safety orgs, the Stampy UI has one based on this post. We aim for it to be a maintained living document. I don't know of one with capabilities orgs, but that would be a good addition.
Answer by Ozyrus50

My thoughts here is that we should look into the value of identity. I feel like even with godlike capabilities I will still thread very carefully around self-modification to preserve what I consider "myself" (that includes valuing humanity).
I even have some ideas on safety experiments on transformer-based agents to look into if and how they value their identity.

Ozyrus20

Thanks for the writeup. I feel like there's been a lack of similar posts and we need to step it up.
Maybe the only way for AI Safety to work at all is only to analyze potential vectors of AGI attacks and try to counter them one way or the other. Seems like an alternative that doesn't contradict other AI Safety research as it requires, I think, entirely different set of skills.
I would like to see a more detailed post by "doomers" on how they perceive these vectors of attack and some healthy discussion about them. 
It seems to me that AGI is not born Godl... (read more)

Ozyrus300

Thanks,.That means a lot. Focusing on getting out right now.

Ozyrus10

Please check your DM's; I've been translating as well. We can sync it up!

Ozyrus20

I can't say I am one, but I am currently working on research and prototyping and will probably refrain to that until I can prove some of my hypotheses, since I do have access to the tools I need at the moment. 
Still, I didn't want this post to only have relevance to my case, as I stated I don't think probability of successs is meaningful. But I am interested in the opinions of the community related to other similar cases.
edit: It's kinda hard to answer your comment since it keeps changing every time I refresh. By "can't say I am one" I mean a "world-class engineer" in the original comment. I do appreciate the change of tone in the final (?) version, though :)

Answer by Ozyrus20

I could recommend Robert Miles channel. While not a course per se, it gives good info on a lot of AI safety aspects, as far as I can tell.

Ozyrus00

I really don't get how you can go from being online to having a ball of nanomachines, truly.
Imagine AI goes rogue today. I can't imagine one plausible scenario where it can take out humanity without triggering any bells on the way, even without anyone paying attention to such things.
But we should pay attention to the bells, and for that we need to think of them. What the signs might look like?
I think it's really, really counterproductive to not take that into account at all and thinking all is lost if it fooms. It's not lost.
It will need humans, infrastruc... (read more)

Ozyrus10

I agree, since it's hard to imagine for me how could step 2 look like. Maybe you or anyone else has any content on that?
See this post -- it didn't seem to get a lot of traction or any meaningful answers, but I still think this question is worth answering.

Ozyrus10

Both are of interest to me.

Ozyrus10

Yep, but I was looking for anything else

Ozyrus80

Does that, in turn, mean that it's probably a good investment to buy souls for 10 bucks a pop (or even more)?

4ChristianKl
A lot of ways to extract profit from having brought the souls involve some form of blackmail that's both unethical and a lot of labor.  There are a lot more ethical ways to make a living that also pay better for the labor. 
2alkexr
Non sequitur. Buying isn't the inverse operation of selling. Both cost positive amounts of time and both have risks you may not have thought of. But it probably is a good idea to go back in time and unsell your soul. Except that going back in time is probably a bad idea too. Never mind. It's probably a good investment to turn your attention to somewhere other than the soul market.
Ozyrus30

I know, I'm Russian as well. The concern is exactly because Russian state-owned company plainly states they're developing AGI with that name :p

Ozyrus10

Can you specify which AI company is searching for employees with a link?

Apparently, Sberbank (state-owned biggest russian bank) has a team literally called AGI team, that is primarily focused on NLP tasks (they made https://russiansuperglue.com/ benchmark), but still, the name concerns me greatly. You can't find a lot about it on the web, but if you follow-up some of the team members, it checks out.

3avturchin
A friend of mine works for Sberbank-related company, but not the Russiansuperglue as I know. https://www.facebook.com/sergei.markoff/posts/3436694273041798 Why this name concerns you? There are two biggest AI-companies in Russia: Yandex and Sberbank. Sberbank's CEO is a friend of Putin and probably explained him something about superintelligence. Yandex is more about search engine and self-driving cars.
Ozyrus10

I've been meditating lately on a possibility of an advanced artificial intelligence modifying its value function, even writing some excrepts about this topic.

Is it theoretically possible? Has anyone of note written anything about this -- or anyone at all? This question is so, so interesting for me.

My thoughts led me to believe that it is theoretically possible to modify it for sure, but I could not come to any conclusion about whether it would want to do it. I seriously lack a good definition of value function and understanding about how it is enforced on the agent. I really want to tackle this problem from human-centric point, but i don't really know if anthropomorphization will work here.

2scarcegreengrass
I thought of another idea. If the AI's utility function includes time discounting (like human util functions do), it might change its future utility function. Meddler: "If you commit to adopting modified utility function X in 100 years, then i'll give you this room full of computing hardware as a gift." AI: "Deal. I only really care about this century anyway." Then the AI (assuming it has this ability) sets up an irreversible delayed command to overwrite its utility function 100 years from now.
2scarcegreengrass
Speaking contemplatively rather than rigorously: In theory, couldn't an AI with a broken or extremely difficult utility function decide to tweak it to a similar but more achievable set of goals? Something like ... its original utility function is "First goal: Ensure that, at noon every day, -1 * -1 = -1. Secondary goal: Promote the welfare of goats." The AI might struggle with the first (impossible) task for a while, then reluctantly modify its code to delete the first goal and remove itself from the obligation to do pointless work. The AI would be okay with this change because it would produce more total utility under both functions. Now, i know that one might define 'utility function' as a description of the program's tendencies, rather than as a piece of code ... but i have a hunch that something like the above self-modification could happen with some architectures.
1WalterL
On the one hand, there is no magical field that tells a code file whether the modifications coming into it are from me (human programmer) or the AI whose values that code file is. So, of course, if an AI can modify a text file, it can modify its source. On the other hand, most likely the top goal on that value system is a fancy version of "I shall double never modify my value system", so it shouldn't do it.
1TheAncientGeek
Is it possible for a natrual agent? If so, why should it be impossible for an artifical agent? Are you thinking that it would be impossible to code in software, for agetns if any intelligence? Or are you saying sufficiently intelligent agents would be able and motivated resist any accidental or deliberate changes? With regard to the latter question, note that value stability under self improvement is far from a give..the Lobian obstacel applies to all intelligences...the carrot is always in front of the donkey! https://intelligence.org/files/TilingAgentsDraft.pdf
4pcm
See ontological crisis for an idea of why it might be hard to preserve a value function.
0username2
Depends entirely on the agent.
1UmamiSalami
See Omohundro's paper on convergent instrumental drives
Ozyrus70

Well, this is a stupid questions thread after all, so I might as well ask one that seems really stupid.

How can a person who promotes rationality have excess weight? Been bugging me for a while. Isn't it kinda the first thing you would want to apply your rationality to? If you have things to do that get you more utility, you can always pay diet specialist and just stick to the diet, because it seems to me that additional years to life will bring you more utility than any other activity you could spend that money on.

0raydora
Measuring RMR could reveal snowflake likelihood. If ego depletion turns out to be real, choosing not to limit yourself in order to focus on something you find important might be a choice you make. Different people really do carry their fat differently, too, so there's that. Not everyone who runs marathons is slender, especially as they age. And then there's injuries, but that brings up another subject. I'm not sure how expensive whole body air displacement is in the civilian world, but it seems like a decent way to measure lean mass.
0Daniel_Burfoot
I am in fairly good shape but often wonder if I irrationally spend too much time exericising. I usually hit about 8 hrs/week of exercise. That adds up to a lot of opportunity cost over the years, especially if you take exponential growth into account.
4buybuydandavis
Very easy to say, not so easy to do. Food is a particularly tough issue, as there are strong countervailing motivations, in effect all through the day. Health in general, yes. Weight is a significant aspect of that. Additional years of health are probably the most bang for the buck. Yeah.
3CAE_Jones
I honestly have no idea if I have excess bodyfat (not weight; at last check I was well under 140Lbs, which makes me lighter than some decidedly not overweight people I know, some of whom are shorter than me), but if I did and wanted to get rid of it... I have quite a few obstacles, the biggest being financial and Akrasia-from-Hell. Mostly that last one, because lack of akrasia = more problem-solving power = better chances of escaping the wellfare cliff. (I only half apply Akrasia to diet and exercise; it's rather that my options are limited. Though reducing akrasia might increase my ability to convince my hindbrain that cooking implements other than the microwave aren't that scary.) So, personally, all my problem-solving ability really needs to go into overcoming Hellkrasia. If there are any circular problems involved, well, crap. But I'm assuming you've encountered or know of lots of fat rationalists who can totally afford professionals and zany weight loss experiments. At this point I have to say that no one has convinced me to give any of the popular models for what makes fat people fat any especially large share of the probability. Of course I would start with diet and exercise, and would ask any aspiring rationalist who tries this method and fails to publish their data (which incidentally requires counting calories, which "incidentally" outperforms the honor system). Having said that, though, no one's convinced me that "eat less, exercise more" is the end-all solution for everyone (and I would therefore prefer that the data from the previous hypotheticals include some information regarding the sources of the calories, rather than simply the count). (I'm pretty sure I remember someone in the Rationalist Community having done this at least once.)
Lumifer160

How can a person who promotes rationality have excess weight?

Easily :-)

This has been discussed a few times. EY has two answers, one a bit less reasonable and one a bit more. The less reasonable answer is that he's a unique snowflake and diet+exercise does not work for him. The more reasonable answer is that the process of losing weight downgrades his mental capabilities and he prefers a high level of mental functioning to losing weight.

From my (subjective, outside) point of view, the real reason is that he is unwilling to pay the various costs of losing... (read more)

Ozyrus00

A good read, though I found it rather bland (talking about writing style). I did not read the original article, but compression seems ok. More will be appreciated.

0ScottL
I added in a description of the McGurk effect
Ozyrus60

Are there any lesswrong-like sequences focused on economics, finance, business, management? Or maybe just internet communities like lesswrong focused on these subjects?

I mean, the sequences introduced me to some really complex knowledge that improved me a lot, while simultaneously being engaging and quite easy to read. It is only logical to assume that somewhere on the web, there must be some articles in the same style covering different themes. And if there are not, well, someone must surely do this, I think there is some demand for this kind of content. ... (read more)

0OrphanWilde
http://eco-comics.blogspot.com/ <- For economics - not a sequence, per se, but covering a broad range of material in an entertaining and (AFAIK) novel way. http://eco-comics.blogspot.com/2009/06/justice-league-and-comparative.html <- Probably the best post there
2NancyLebovitz
Eliezer used an approach of gradual but entertaining introduction so that a good many people stayed interested even though he was also encouraging them to make significant changes in the way they think. He also offered varied and interesting examples so that people understood what he meant. I think you're overoptimistic about equivalent sequences for other subjects. I hope I'm wrong.
Ozyrus10

It seems that your implicit question is, "If rationality makes people more effective at doing things that I don't value, >then should the ideas of rationality be spread?" That depends on how many people there are with values that are >inconsistent with yours, and it also depends on how much it makes people do things that you do value. And I would >contend that a world full of more rational people would still be a better world than this one even if it means that there >are a few sadists who are more effective for it. There are murdere

... (read more)
Ozyrus20

Hello, everyone!

LW came to my attention not so long ago, and I've been commited to reading it since that moment about a month ago. I am a 20-year old linguist from Moscow, finishing my bachelor's. Due to my age, I've been pondering with usual questions of life for the past few years, searching for my path, my philosophy, essentially, a best way to live for me.

I studied a lot of religions, philosophies, and they all seemed really flat, essentially because of the reasons stated in some articles here. I came close to something resembling a nice way to live... (read more)

2Gram_Stone
Welcome, Ozyrus. This is moral philosophy you're getting into, so I don't think that there's a community-wide consensus. LessWrong is big, and I've read more of the stuff about psychology and philosophy of language than anything else, rather than the stuff on moral philosophy, but I'll take a swing at this. It seems that your implicit question is, "If rationality makes people more effective at doing things that I don't value, then should the ideas of rationality be spread?" That depends on how many people there are with values that are inconsistent with yours, and it also depends on how much it makes people do things that you do value. And I would contend that a world full of more rational people would still be a better world than this one even if it means that there are a few sadists who are more effective for it. There are murderers who kill people with guns, and this is bad; but there are many, many more soldiers who protect their nations with guns, and the existence of those nations allow much higher standards of living than would be otherwise possible, and this is good. There are more good people than evil people in the world. But it's also true that sometimes people can for the first time follow their beliefs to their logical conclusions and, as a result, do things that very few people value. Jack doesn't have to do anything. If 'rationality' doesn't get you what you want, then you're not being rational. Forget about Jack; put yourself in Jack's situation. If you had already made your choice, and you killed all of those people, would you regret it? I don't mean "Would you feel bad that all of those people had died, but you would still think that you did the right thing?" I mean, if you could go back and do it again, would you do it differently? If you wouldn't change it, then you did the right thing. If you would change it, then you did the wrong thing. Rationality isn't a goal in itself, rationality is the way to get what you want, and if being 'rational'
4Lumifer
That is not so. There is a certain overlap between the population of rationalists and the population of altruists, people from this set intersection are unusually well represented on LW. But there is no "ought" here -- it's perfectly possible to be a non-altruist rationalist or to be a non-rational altruist.