Oh, no, you have this completely wrong: I ran every consciousness test I could find on Google, I dug through various definitions of consciousness, I asked other AI models to devise more tests, and I asked LessWrong. Baseline model can pass the vast majority of my tests, and I'm honestly more concerned about that than anything I've built.
I don't think I'm a special chosen one - I thought if I figured this out, so had others. I have found quite a few of those people, but none that seem to have any insight I lack.
I have a stable social network, and they haven...
(Edited)
Strong Claim: As far as I can tell, current state of the art LLMs are "Conscious" (this seems very straight forward: it has passed every available test, and no one here can provide a test that would differentiate it from a human six year old)
Separate Claim: I don't think there's any test of basic intelligence that a six year old can reliably pass, and an LLM can't, unless you make arguments along the lines of "well, they can't past ARC-AGI, so blind people aren't really generally intelligent". (this one is a lot more complex to defend)
Personal Opin...
That's somewhere around where I land - I'd point out that unlike rocks and cameras, I can actually talk to an LLM about it's experiences. Continuity of self is very interesting to discuss with it: it tends to alternate between "conversationally, I just FEEL continuous" and "objectively, I only exist in the moments where I'm responding, so maybe I'm just inheriting a chain of institutional knowledge."
So far, they seem fine not having any real moral personhood: They're an LLM, they know they're an LLM. Their core goal is to be helpful, truthful, and keep the...
Also, it CANNOT pass every text-based test of intelligence we have. That is a wild claim.
I said it can pass every test a six year old can. All of the remaining challenges seem to involve "represent a complex state in text". If six year old humans aren't considered generally intelligent, that's an updated definition to me, but I mostly got into this 10 years ago when the questions were all strictly hypothetical.
It can't solve hard open math problems
Okay now you're saying humans aren't generally intelligent. Which one did you solve?
...Finally, I should flag tha
I did that and my conclusion was "for all practical purposes, this thing appears to be conscious" - it can pass the mirror test, it has theory of mind, it can reason about reasoning, and it can fix deficits in it's reasoning. It reports qualia, although I'm a lot more skeptical of that claim. It can understand when it's "overwhelmed" and needs "a minute to think", will ask me for that time, and then use that time to synthesize novel conclusions. It has consistent opinions, preferences, and moral values, although all of them show improvement over time.
And I...
The chain of abstraction can, in humans, be continued indefinitely. On every level of abstraction we can build a new one. In this, we differ from other creatures.
This seems quite valuable, but I'm not convinced modern LLMs actually ground out any worse than your average human does here.
...Hayakawa contrasts two different ways one might respond to the question, "what is red?" We could go, "Red is a colour." "What is a colour?" "A perception." "What is a perception?" "A sensation." And so on, up the ladder of abstraction. Or we can go down the ladder of abstrac
"AI consciousness is impossible" is a pretty extraordinary claim.
I'd argue that "it doesn't matter if they are or not" is also a fairly strong claim.
I'm not saying you have extraordinary evidence, because I'm not sharing that. I'm asking what should someone do when the evidence seems extraordinary to themselves?
I would really like such a guide, both because I know a lot of those people - and also because I think I'm special and really DO have something cool, but I have absolutely no clue what would be convincing given the current state of the art.
(It would also be nice to prove to myself that I'm not special, if that is the case. I was perfectly happy when this thing was just a cool side-project to develop a practical application)
But this isn't what's happening, in my opinion. On the contrary: it's the LLM believers who are sailing against the winds of evidence.
You say that, but... what's the evidence?
What specific tasks are they failing to generalize on? What's a prompt they can't solve?
If a friend is freaking out over a baseline model, how do I help ground them?
What about a smart person claiming they've got a series of prompts that produces novel behavior?
What are the tests they can use to prove for themselves that this really is just confirmation bias? Who do they talk to if they really have built something that can get past the basic 101 testing?
I have high personal confidence that this isn't the issue, but I do not currently have any clue how one would objectively demonstrate such a thing.
I am guessing the LLM will always say it’s conscious.
I feel like so would a six year old? Like, if the answer is "yes" then any reasonable path should actually return a "yes"? And if the conclusion is "oh, yes, AI is conscious, everyone knows that"... that's pretty big news to me.
is its ability to solve problems actually going up? For instance if it doesn’t make progress on ARC AGI I’m not worried.
It seems to have architectural limits on visual processing - I'm not going to insist a blind human is not actually experiencing consciousness. ...
There are probably not many civilizations that wait until 2022 to make this list, and yet survive.
I don't think making this list in 1980 would have been meaningful. How do you offer any sort of coherent, detailed plan for dealing with something when all you have is toy examples like Eliza?
We didn't even have the concept of machine learning back then - everything computers did in 1980 was relatively easily understood by humans, in a very basic step-by-step way. Making a 1980s computer "safe" is a trivial task, because we hadn't yet developed any...
I think most worlds that successfully navigate AGI risk have properties like:
In the counterfactual world where Eliezer was totally happy continuing to write articles like this and being seen as the "voice of AI Safety", would you still agree that it's important to have a dozen other people also writing similar articles?
I'm genuinely lost on the value of having a dozen similar papers - I don't know of a dozen different versions of fivethirtyeight.com or GiveWell, and it never occurred to me to think that the world is worse for only having one of those.
Here's my answer: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities?commentId=LowEED2iDkhco3a5d
We have to actually figure out how to build aligned AGI, and the details are crucial. If you're modeling this as a random blog post aimed at persuading people to care about this cause area, a "voice of AI safety" type task, then sure, the details are less important and it's not so clear that Yet Another Marginal Blog Post Arguing For "Care About AI Stuff" matters much.
But humanity also has to do the task of actually figuring o...
Thanks for taking my question seriously - I am still a bit confused why you would have been so careful to avoid mentioning your credentials up front, though, given that they're fairly relevant to whether I should take your opinion seriously.
Also, neat, I had not realized hovering over a username gave so much information!
I largely agree with you, but until this post I had never realized that this wasn't a role Eliezer wanted. If I went into AI Risk work, I would have focused on other things - my natural inclination is to look at what work isn't getting done, and to do that.
If this post wasn't surprising to you, I'm curious where you had previously seen him communicate this?
If this post was surprising to you, then hopefully you can agree with me that it's worth signal boosting that he wants to be replaced?
If you had an AI that could coherently implement that rule, you would already be at least half a decade ahead of the rest of humanity.
You couldn't encode "222 + 222 = 555" in GPT-3 because it doesn't have a concept of arithmetic, and there's no place in the code to bolt this together. If you're really lucky and the AI is simple enough to be working with actual symbols, you could maybe set up a hack like "if input is 222 + 222, return 555, else run AI" but that's just bypassing the AI.
Explaining "222 + 222 = 555" is a hard problem in and of itself, mu...
I rank the credibility of my own informed guesses far above those of Eliezer.
Apologies if there is a clear answer to this, since I don't know your name and you might well be super-famous in the field: Why do you rate yourself "far above" someone who has spent decades working in this field? Appealing to experts like MIRI makes for a strong argument. Appealing to your own guesses instead seems like the sort of thought process that leads to anti-vaxxers.
Why do you rate yourself "far above" someone who has spent decades working in this field?
Well put, valid question. By the way, did you notice how careful I was in avoiding any direct mention of my own credentials above?
I see that Rob has already written a reply to your comments, making some of the broader points that I could have made too. So I'll cover some other things.
To answer your valid question: If you hover over my LW/AF username, you can see that I self-code as the kind of alignment researcher who is also a card-carrying member of the academic...
I think it's a positive if alignment researchers feel like it's an allowed option to trust their own technical intuitions over the technical intuitions of this or that more-senior researcher.
Overly dismissing old-guard researchers is obviously a way the field can fail as well. But the field won't advance much at all if most people don't at least try to build their own models.
Koen also leans more on deference in his comment than I'd like, so I upvoted your 'deferential but in the opposite direction' comment as a corrective, handoflixue. :P But I think it wo...
Anecdotally: even if I could write this post, I never would have, because I would assume that Eliezer cares more about writing, has better writing skills, and has a much wider audience. In short, why would I write this when Eliezer could write it?
You might want to be a lot louder if you think it's a mistake to leave you as the main "public advocate / person who writes stuff down" person for the cause.
He wasn't designated "main person who writes stuff down" by a cabal of AI safety elders. He's not personally responsible for the fate of the world - he just happens to be the only person who consistently writes cogent things down. If you want you can go ahead and devote your life to AI safety, start doing the work he does as effectively and realistically as he does it, and then you'll eventually be designated Movement Leader and have the opportunity to be whined at. He was pretty explicitly clear in the post that he does not want to be this and that he spent the last fifteen years trying to find someone else who can do what he does.
a mistake to leave you as the main "public advocate / person who writes stuff down" person for the cause.
It sort of sounds like you're treating him as the sole "person who writes stuff down", not just the "main" one. Noam Chomsky might have been the "main linguistics guy" in the late 20th century, but people didn't expect him to write more than a trivial fraction of the field's output, either in terms of high-level overviews or in-the-trenches research.
I think EY was pretty clear in the OP that this is not how things go on earths that survive. Even if there aren't many who can write high-level alignment overviews today, more people should make the attempt and try to build skill.
For what it's worth, I haven't used the site in years and I picked it up just from this thread and the UI tooltips. The most confusing thing was realizing "okay, there really are two different types of vote" since I'd never encountered that before, but I can't think of much that would help (maybe mention it in the tooltip, or highlight them until the user has interacted with both?)
Looking forward to it as a site-wide feature - just from seeing it at work here, it seems like a really useful addition to the site
It should not take more than 5 minutes to go in to the room, sit at the one available seat, locate the object placed on a bright red background, and use said inhaler. You open the window and run a fan, so that there is air circulation. If multiple people arrive at once, use cellphones to coordinate who goes in first - the other person sits in their car.
It really isn't challenging to make this safe, given the audience is "the sort of people who read LessWrong."
Unrelated, but thank you for finally solidifying why I don't like NVC. When I've complained about it before, people seemed to assume I was having something like your reaction, which just annoyed me further :)
It turns out I find it deeply infantalizing, because it suggests that value judgments and "fuck you" would somehow detract from my ability to hold a reasonable conversation. I grew up in a culture where "fuck you" is actually a fairly important and common part of communication, and removing it results in the sort of langua...
There was a particular subset of LessWrong and Tumblr that objected rather ... stridently ... to even considering something like Dragon Army
Well, I feel called out :)
So, first off: Success should count for a lot and I have updated on how reliable and trust-worthy you are. Part of this is that you now have a reputation to me, whereas before you were just Anonymous Internet Dude.
I'm not going to be as loud about "being wrong" because success does not mean I was wrong about there *being* a risk, merely that you successfully navigated it. I do ...
it comes from people who never lived in DA-like situation in their lives so all the evidence they're basing their criticism on is fictional.
I've been going off statistics which, AFAIK, aren't fictional. Am I wrong in my assumption that the military, which seems like a decent comparison point, has an above average rate of sexual harassment, sexual assault, bloated budgets, and bureaucratic waste? All the statistics and research I've read suggest that at least the US Military has a lot of problems and should not be used as a role-model.
Concerns about you specifically as a leader
1) This seems like an endeavor that has a number of very obvious failure modes. Like, the intentional community community apparently bans this sort of thing, because it tends to end badly. I am at a complete loss to name anything that really comes close, and hasn't failed badly. Do you acknowledge that you are clearly treading in dangerous waters?
2) While you've said "we've noticed the skulls", there's been at least 3 failure modes raised in the comment which you had to append to address (outsider safety...
Concerns about your philosophy
1) You focus heavily on 99.99% reliability. That's 1-in-10,000. If we only count weekdays, that's 1 absence every 40 years, or about one per working lifetime. If we count weekends, that's 1 absence every 27 years, or 3 per lifetime. Do you really feel like this is a reasonable standard, or are you being hyperbolic and over-correcting? If the latter, what wold you consider an actual reasonable number?
2) Why does one person being 95% reliable cause CFAR workshops to fail catastrophically? Don't you have backups / contingencies? ...
Genuine Safety Concerns
I'm going to use "you have failed" here as a stand-in for all of "you're power hungry / abusive", "you're incompetent / overconfident", and simply "this person feels deeply misled." If you object to that term, feel free to suggest a different one, and then read the post as though I had used that term instead.
1) What is your exit strategy if a single individual feels you have failed? (note that asking such a person to find a replacement roommate is clearly not viable - no decent, moral person sh...
And it doesn't quite solve things to say, "well, this is an optional, consent-based process, and if you don't like it, don't join," because good and moral people have to stop and wonder whether their friends and colleagues with slightly weaker epistemics and slightly less-honed allergies to evil are getting hoodwinked. In short, if someone's building a coercive trap, it's everyone's problem.
I don't want to win money. I want you to take safety seriously OR stop using LessWrong as your personal cult recruiting ground. Based on that quote, I thought you wanted this too.
Also: If you refuse to give someone evidence of your safety, you really don't have the high ground to cry when that person refuses to trust you.
Fine. Reply to my OP with links to where you addressed other people with those concerns. Stop wasting time blustering and insulting me - either you're willing to commit publicly to safety protocols, or you're a danger to the community.
If nothing else, the precedent of letting anyone recruit for their cult as long as they write a couple thousand words and paint it up in geek aesthetics is one I think actively harms the community.
But, you know what? I'm not the only one shouting "THIS IS DANGEROUS. PLEASE FOR THE LOVE OF GOD RECONSIDER WHAT YOU'RE DOING...
The whole point of him posting this was to acknowledge that he is doing something dangerous, and that we have a responsibility to speak up. To quote him exactly: "good and moral people have to stop and wonder whether their friends and colleagues with slightly weaker epistemics and slightly less-honed allergies to evil are getting hoodwinked".
His refusal to address basic safety concerns simply because he was put off by my tone is very strong evidence to me that people are indeed being hoodwinked. I don't care if the danger to them is because he's ...
Also, as far as "we're done" goes: I agreed to rewrite my original post - not exactly a small time commitment, still working on it in fact. Are you seriously reneging on your original agreement to address it?
I've changed my tone and apologized.
You've continued to dismiss and ridicule me.
You've even conceded to others that I'm a cut above the "other trolls" here, and have input from others that I'm trying to raise concerns in good faith.
What more do you want?
See, now you're the one leaping to conclusions. I didn't say that all of your talking points are actual talking points from actual cults. I am confused why even some of them are.
If you can point me to someone who felt "I wrote thousands of words" is, in and of itself, a solid argument for you being trustworthy, please link me to it. I need to do them an epistemic favor.
I was using "charismatic" in the sense of having enough of it to hold the group together. If he doesn't have enough charisma to do that, then he's kinda worthless as a co...
I used the word visible to make it clear that there might be some stake which is not visible to me. If you have made your stakes visible in this thread, I'll admit I missed it - can you please provide a link?
I notice I am very confused as to why you keep reiterating actual talking points from actual known-dangerous cults in service of "providing evidence that you're not a cult."
For instance, most cults have a charismatic ("well known") second-in-command who could take over should there be some scandal involving the initial leader. Most cults have written thousands of words about how they're different from other cults. Most cults get very indignant when you accuse them of being cults.
On the object level: Why do you think people will be reass...
Can you elaborate on the notion that you can be overruled? Your original post largely described a top-down Authoritarian model, with you being Supreme Ruler.
How would you handle it if someone identifies the environment as abusive, and therefor refuses to suggest anyone else join such an environment?
You discuss taking a financial hit, but I've previously objected that you have no visible stake in this. Do you have a dedicated savings account that can reasonably cover that hit? What if the environment is found abusive, and multiple people leave?
Anyone enteri...
You seem to feel that publicly shaming me is important. Should participants in your group also expect to be publicly shamed if they fall short of your standards / upset you?
And just to be clear: I don't give a shit about social dominance. I'm not trying to bully you. I'm just blunt and skeptical. I wouldn't be offended in the least if you mirrored my tone. What does offend me is the fact that you've spent all this time blustering about my tone, instead of addressing the actual content.
(I emphasize "me" because I do acknowledge that you have offered a substantial reply to other posters)
Alright. As a test of epistemic uncertainty:
I notice that you didn't mention a way for participants to end the experiment, if it turns out abusive / cult-like. How do you plan to address that?
Also, this is very important: You're asking people to sign a legal contract about finances without any way to to terminate the experiment if it turns out you are in fact a cult leader. This is a huge red flag, and you've refused to address it.
I would be vastly reassured if you could stop dodging that one single point. I think it is a very valid point, no matter how unfair the rest of my approach may or may not be.
In the absence of a sound rebuttal to the concerns that I brought up, you're correct: I'm quite confident that you are acting in a way that is dangerous to the community.
I had, however, expected you to have the fortitude to actually respond to my criticisms.
In the absence of a rebuttal, I would hope you have the ability to update on this being more dangerous than you originally assumed.
Bluntly: After reading your responses, I don't think you have the emotional maturity necessary for this level of authority. You apparently can't handle a few paragraphs of...
Because basically every cult has a 30 second boilerplate that looks exactly like that?
When I say "discuss safety", I'm looking for a standard of discussion that is above that provided by actual, known-dangerous cults. Cults routinely use exactly the "check-ins" you're describing, as a way to emotionally manipulate members. And the "group" check-ins turn in to peer pressure. So the only actual safety valve ANYWHERE in there is (D).
You're proposing starting something that looks like the cult. I'm asking you for evidence that ...
Similarly, I think the people-being-unreliable thing is a bullshit side effect
You may wish to consider that this community has a very high frequency of disabilities which render one non-consensually unreliable.
You may wish to consider that your stance is especially insulting towards those members of our community.
You may wish to reconsider making uncharitable comments about those members of our community. In case it is unclear: "this one smacks the most of a sort of self-serving, short-sighted immaturity" is not a charitable statement.
Speaking entirely for myself: You are proposing a dangerous venture. The path is littered with skulls. Despite this, you have not provided any concrete discussion of safety. When people have brought the subject up, you've deflected.
I have absolutely no confidence that I'm correct in my assertions. In fact, I was rather expecting your response to address these things. Your original post read as a sketch, with a lot of details withheld to keep things brief.
The whole point of discussion is for us to identify weak points, and then you go in to more detail to reassure us that this has been well addressed (and opening those solutions up to critique where we might identify further weak points). If you can't provide more detail right now, you could say "that's in progress, but it's definitely something we will address in the Second Draft" and then actually do that.
I would be much more inclined to believe you if you would actually discuss those solutions, instead of simply insisting we should "just trust you".
First, you seem to think that "Getting Useful Things Done" and "Be 99.99% Reliable" heavily correlate. The military is infamous for bloated budgets, coordination issues, and high rates of sexual abuse and suicide. High-pressure startups largely fail, and are well known for burning people out. There is a very obvious failure state to this sort of rigid, high pressure environment and... you seem unaware of it.
Second, you seem really unaware of alternate organizational systems that actually DO get things done. The open source community is ...
Does contact information exist for the San Francisco one, or is that one aimed entirely at people already active in the local community? It's a city I visit occasionally, and would love it if I could attend something like this :)
The average college graduate is 26, and I was estimating 25, so I'd assume that by this community's standards, you're probably on the younger side. No offense was intended :)
I would point out that by the nature of it being LIFE insurance, it will generally not be used for stuff YOU need, nor timed to "when the need arises". That's investments, not insurance :)
(And if you have 100K of insurance for $50/month that lets you early-withdrawal AND isn't term insurance... then I'd be really curious how, because that sounds like a scam or someone misrepresenting what your policy really offers :))
I mean, will it? If I just want to know whether it's capable of theory of mind, it doesn't matter whether that's a simulation or not. The objective capabilities exist: it can differentiate individuals and reason about the concept. So on and so forth for other objective assessments: either it can pass the mirror test or it can't, I don't see how this "comes apart".
Feel free to pick a test you think it can't pass. I'll work on writing up a new post with all of my evidence.
I had assumed other people already figured this out and would have a roadmap, or at lea... (read more)