All of KvmanThinking's Comments + Replies

Great response, first of all. Strong upvoted.

 

My subconscious gave me the following answer, after lots of trying-to-get-it-to-give-me-a-satisfactory-answer:

"Everyone tells you that you're super smart, not because you actually are (in reality, you are probably only slightly smarter than average) but because you have a variety of other traits which are correlated with smartness (i.e: having weird hobbies/interests, getting generally good grades, knowing a lot of very big and complicated-sounding words, talking as if my speech is being translated literal... (read more)

Would that imply that there is a hard, rigid, and abrupt limit on how accurately you can predict the actions of a conscious being without actually creating a conscious being? And if so, where is this limit?

I guess you mean on an intuitive level, you feel you have X intelligence, but upon self-reflection, you think you have Y intelligence. And you can't change X to match Y.

Yes, that's exactly correct.

I am aware of all of these things. Although, I don't believe that I am smarter than others. (I alieve it though, I just don't want to.) After all, if I was significantly smarter, then I would probably have successfully done an intelligence-requiring major thing by now. Also, if I was significantly smarter, I would be able to understand intelligence-requiring books. I fully believe, in my deliberate, conscious mind, that I am not significantly smarter than others in my environment, I just have lots of weird interests and stuff.

1Knight Lee
I guess you mean on an intuitive level, you feel you have X intelligence, but upon self-reflection, you think you have Y intelligence. And you can't change X to match Y. I don't know how you actually feel because I'm not you, but my guess is that a lot of other people are like this. They know their intuition is wrong and biased, but they can't change it. They might be too embarrassed to talk about it.

Is there a non-mysterious explanation somewhere of the idea that the universe "conserves information"?

2Ben Pace
That sounds good to me i.e. draft this post, and then make it a comment in one of those places instead (my weak guess is a quick take is better, but whatever you like).
2Ben Pace
Posted either as a comment on the seasonal open thread or using the quick takes / shortform feature, which posts it in your shortform (e.g. here is my shortform). I'm saying that this seems to me not on the level of substance of a post, so it'd be better as a comment of one of the above two types, and also that it's plausible to me you'd probably get more engagement as a comment in the open thread.

Ah. Thanks! (by the way, when these questions get answered, should I take them down or leave them up for others?)

3Viliam
Leave them up, other people may be curious too, but too shy to ask.

Good point. Really sorry. Just changed it.

Has Musk tried to convince the other AI companies to also worry about safety?

3Milan W
Main concern right now is very much lab proliferation, ensuing coordination problems, and disagreements / adversarial communication / overall insane and polarized discourse. * Google Deepmind:  They are older than OpenAI. They also have a safety team. They are very much aware of the arguments. I don't know about Musk's impact on them. * Anthropic: They split from OpenAI. To my best guess, they care about safety at least roughly as much as them. Many safety researchers have been quitting OpenAI to go work for Anthropic over the past few years. * xAI: Founded by Musk several years after he walked out from OpenAI. People working there have previously worked at other big labs. General consensus seems to be that their alignment plan (as least as explained by Elon) is quite confused. * SSI: Founded by Ilyia Sutskever after he walked out from OpenAI, which he did after participating in a failed effort to fire Sam Altman from OpenAI. Very much aware of the arguments. * Meta AI: To the best of my knowledge, aware of the arguments but very dismissive of them (at least at the upper management levels). * Mistral AI: I don't know much but probably more or less the same or worse than Meta AI. * Chinese labs: No idea. I'll have to look into this. I am confident that there are relatively influential people within Deepmind and Anthropic who post here and/or on the Aligment Forum. I am unsure about people from other labs, as I am nothing more than a relatively well-read outsider.

The difference is that if the Exxon Mobil CEO internalizes that (s)he is harming the environment, (s)he has to go and get a completely new job, probably building dams or something. But if Sam Altman internalizes that he is increasing our chance of extinction, all he has to do is tell all his capability researchers to work on alignment, and money is still coming in; only now, less of it comes from ChatGPT subscriptions and more of it comes from grants from the Long-Term Future Fund. It's a much easier and lighter shift. Additionally, he knows that he can go... (read more)

Could someone add "7. Why do very smart people in high places say this is not a problem?" (Or you could just reply to this comment.)

quantum immortality is not going to work out

How come?

That's a pretty big "somehow".

1Milan W
Oh I know! That is why I added "somehow". But I am also very unsure over exactly how hard it is. Seems like a thing worth whiteboarding over for an hour and then maybe doing a weekend-project-sized test about.

I don't think such considerations apply to upvotes nearly as much if at all. Upvotes indicate agreement or approval, which doesn't need to be explained as thoroughly as disagreement (which usually involves having separate, alternative ideas in your head different from the ideas of the one you are disagreeing with)

5Vladimir_Nesov
Whether upvotes need to be explained overall is not relevant to my comment, as I'm talking about the specific considerations named by Noah Birnbaum.

I believe that the reason your comment was strong downvoted was because you implied that "everyone repeating things already said" is an inevitable consequence of asking people why they disagree with you. This might be true on other websites (where people are hesitant to relinquish beliefs and opinions), but not on LessWrong.

4Richard_Kennaway
Even on LW, there comes a point where everything has been said and further discussion will foreseeably be unuseful.

I had upvoted the ones I agreed with and thought were helpful. If I agree with something, I will upvote, because simply saying "I agree" is unnecessary when I can just click on a check mark. I appreciate and recognize the effort of those 5 other people who commented, but that is well enough communicated through agreement karma. Just because I have nothing to say about a response someone provided doesn't mean I don't value it.

Your answer wasn't cryptic at all. Don't worry. This is a great answer. Let me know when you're done with that sequence. I'll have to read it.

(Also, it's horrifying that people can be hypnotized against their will. That makes me simultaneously thankful-that and curious-why it isn't more widely practiced...) 

Something like TNIL or Real Character might be used for maximum intellectual utility. But I cannot see how simply minimizing the amount of words that need to exist for compact yet precise communication would help correct the corrupted machinery our minds run on.

-3ChristianKl
I don't think the mental model of "corrupted machinery" is a very useful one. Humans reason by using heuristics. Many heuristics have advantages and disadvantages instead of being perfect. Sometimes that's because they are making tradeoffs, other times it's because they have random quirks.  Real Character was a failed experiment. I don't know how capable Ithkuil IV happens to be. 

By "make its users more aware of their biases" I mean, for example, a language where it's really obvious when you say something illogical, or have a flaw in your reasoning.

Some ideas I had for this:

  • Explicitly defined sematic spaces for every word, to dissolve questions and help people agree on the locations of phenomena in thingspace. Mechanisms for searching thingspace (while, for example, you can say "red chair" to narrow the space of all chairs down to the space of all chairs which reflect red light, it would be nice to be able to express things like "t
... (read more)

why are people downvoting?

[This comment is no longer endorsed by its author]Reply
2Said Achmiz
It’s not censored; the asterisk is used here in the computer science sense, meaning “wildcard”; “veg*an” is short for “vegan or vegetarian”.

Also, it helps taboo your words. For example, "Toki Pona helps taboo your words" would be rendered as
tenpo toki pi toki pona li sama e tenpo toki pi ni: jan li ken kepeken ala e nimi pi ken ala sona pi pali lili.
"(the) speech-time related to Toki Pona is similar or the same as (the) speech-time with this quality: (the) person cannot use word(s) which cannot be known via small effort." 

Before you complain that this is too long a phrase to be used practically, try to explain the concept of rationalist taboo in less syllables than I did in Toki Pona, whilst not relying on other rationalist jargon.

by "making an AI that builds utopia and stuff" I mean an AI that would act in such a way that rather than simply obeying the intent of its promptors, it goes and actively improves the world in the optimal way. An AI which has fully worked out Fun Theory and simply goes around filling the universe with pleasure and beauty and freedom and love and complexity in such a way that no other way would be more Fun.

1Robert Cousineau
That would be described well by the CEV link above.  

it will not consider it acceptable to kill me and instantly replace me with a perfect copy

Why not? I would find this acceptable, considering you are your information system.

I disagree with your disagreement of Eliezer and Connor's conclusions, but I still upvoted because you worded your argument and response quite well. Judging from your comments, you seem not to have a very high opinion of LessWrong, and yet you choose to interact anyways, because you would like to counterargue. You think we are just a big echo chamber of doom, and yet you learn our jargon. Good job. I disagree with what you say, but thank you for being the dissent we encourage. If you would like to know why we believe what we do, you would do well to read t... (read more)

I have an irrational preference

If your utility function weights you knowing things higher than most people's, that is not an irrationality.

It's "101"? I searched the regular internet to find out, but I got some yes's and some no's, which I suspect were just due to different definitions of intelligence.

It's controversial?? Has that stopped us before? When was it done to death?

I'm just confused, because if people downvote my stuff, they're probably trying to tell me something, and I don't know what it is. So I'm just curious.

Thanks. By the way, do you know why this question is getting downvoted?

4Maxwell Peterson
Guesses: people see it as too 101 of a question; people think it’s too controversial / has been done to death many years ago; one guy with a lot of karma hates the whole concept and strong-downvoted it I think the 101 idea is most likely. But I don’t think it’s a bad question, so I’ve upvoted it.

I already figured that. The point of this question was to ask if there could possibly exist things that look indistinguishable from true alignment solutions (even to smart people), but that aren't actually alignment solutions. Do you think things like this could exist?
 

By the way, good luck with your plan. Seeing people actively go out and do actually meaningful work to save the world gives me hope for the future. Just try not to burn out. Smart people are more useful to humanity when their mental health is in good shape.

7johnswentworth
I'm pretty uncertain on this one. Could a superintelligence find a plan which fools me? Yes. Will such a plans show up early on in a search order without actively trying to fool me? Ehh... harder to say. It's definitely a possibility I keep in mind. Most importantly, over time as our understanding improves on the theory side, it gets less and less likely that a plan which would fool me shows up early in a natural search order.
  1. Yes, human intelligence augmentation sounds like a good idea.
  2. There are all sorts of "strategies" (turn it off, raise it like a kid, disincentivize changing the environment, use a weaker AI to align it) that people come up with when they're new to the field of AI safety, but that are ineffective. And their ineffectiveness is only obvious and explainable by people who specifically know how AI behaves. Supposes there are strategies which ineffectiveness is only obvious and explainable by people who know way more about decisions and agents and optimal strategi
... (read more)
1[anonymous]
yep but the first three all fail for the shared reason of "programs will do what they say to do, including in response to your efforts". (the fourth one, 'use a weaker AI to align it', is at least obviously not itself a solution. the weakest form of it, using an LLM to assist an alignment researcher, is possible, and some less weak forms likely are too.) when i think of other 'newly heard of alignment' proposals, like boxing, most of them seem to fail because the proposer doesn't actually have a model of how this is supposed to work or help in the first place. (the strong version of 'use ai to align it' probably fits better here) (there are some issues which a programmatic model doesn't automatically make obvious to a human: they must follow from it, but one could fail to see them without making that basic mistake. probable environment hacking  and decision theory issues come to mind. i agree that on general priors this is some evidence that there are deeper subjects that would not be noticed even conditional on those researchers approving a solution.) i guess my next response then would be that some subjects are bounded, and we might notice (if not 'be able to prove') such bounds telling us 'theres not more things beyond what you have already written down', which would be negative evidence (strength depending on how strongly we've identified a bound). (this is more of an intuition, i don't know how to elaborate this) (also on what johnswentworth wrote: a similar point i was considering making is that the question is set up in a way that forces you into playing a game of "show how you'd outperform magnus carlsen {those researchers} in chess alignment theory" - for any consideration you can think of, one can respond that those researchers will probably also think of it, which might preclude them from actually approving, which makes the conditional 'they approve but its wrong'[1] harder to be true and basically dependent on them instead of object-level properties
1[comment deleted]

Uh, this is a human. Humans find it much harder to rationalize away the suffering of other humans, compared to rationalizing animal suffering.

6Kaj_Sotala
Historically there were plenty of rationalizations for slavery, including ones holding that slaves weren't really people and were on par with animals. Such an argument would be much easier for a mind running on a computer and with no physical body - "oh it just copies the appearance of suffering but it doesn't really suffer".

And the regular, average people in this future timeline consider stuff like this ethically okay?

8Kaj_Sotala
Compare to e.g. factory farming today, which also persists despite a lot of people thinking it not okay (while others don't care).

hack reality via pure math

What - exactly - do you mean by that?

The above statement could be applied to a LOT of other posts too, not just this one.

How were these discovered? Slow, deliberate thinking, or someone trying some random thing to see what it does and suddenly the AI is a zillion times smarter?

2Marcus Williams
"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence." -SwiGLU paper. I think it varies, a few of these are trying "random" things, but mostly they are educated guesses which are then validated empirically. Often there is a spefic problem we want to solve i.e. exploding gradients or O(n^2) attention and then authors try things which may or may not solve/mitigate the problem.

I certainly believe he could. After reading Tamsin Leake's "everything is okay" (click the link if you dare), I felt a little unstable, and felt like I had to expend deliberate effort to not think about the described world in sufficient detail in order to protect my sanity. I felt like I was reading something that had been maximized by a semi-powerful AI to be moving, almost infohazardously moving, but not quite; that this approached the upper bound of what humans could read while still accepting the imperfection of their current conditions.

utopia

It's a protopia. It is a word better than ours. It is not perfect. It would be advisable to keep this in mind. dath ilan likely has its own, separate problems.

And I’m not even mentioning the strange sexual dynamics

Is this a joke? I'm confused.

yeah, the moment i looked at the big diagram my brain sort of pleasantly overheated

I think the flaw is how he claims this:

No one begins to truly search for the Way until their parents have failed them, their gods are dead, and their tools have shattered in their hand.

I think that these three things are not things that cause a desire for rationality, but things that rationality makes you notice.

why is this so downvoted? just curious

If I am not sufficiently terrified by the prospect of our extinction, I will not take as much steps to try and reduce its likelihood. If my subconscious does not internalize this sufficiently, I will not be as motivated. Said subconscious happiness affects my conscious reasoning without me consciously noticing.

Harry's brain tried to calculate the ramifications and implications of this and ran out of swap space.

this is very relatable

That's a partial focus.

1ZY
I don't understand either. If it is meant what it meant, this is a very biased perception and not very rational (truth seeking or causality seeking). There should be better education systems to fix that.

i'd pick dust & youtube. I intrinsically value fairness

The YouTube is pure happiness. The sublimity is some happiness and some value. Therefore I choose the sublimity, but if it was "Wireheading vs. Youtube", or "Sublimity vs. seeing a motivational quote", I would choose the YouTube or the motivational quote, because I intrinsically value fairness.

Ok, yeah, I don't think the chances are much smaller than one in a million. But I do think the chances are not increased much by cryonics. Here, let me explain my reasoning. 

I assume that eventually, humanity will fall into a topia (Tammy's definition) or go extinct. Given that it does not go extinct, it will spend a very long amount of subjective time, possibly infinite, in said topia. In the event that this is some sort of brilliant paradise of maximum molecular fun where I can make stuff for eternity, we can probably reconstruct a person solely bas... (read more)

2Viliam
Well, there are different opinions on the possibility of reconstructing a person. Some people here would agree with you. I am afraid that there will not be enough evidence left to reconstruct the person, even if we had all their writings, and we usually don't have even that.
Load More