LESSWRONG
LW

All of JoeTheUser's Comments + Replies

I think you don't understand what an LLM is. When the LLM produces a text output like "Dogs are cute", it doesn't have some persistent hidden internal state that can decide that dogs are actually not cute but it should temporarily lie and say that they are cute.

As Charlie Stein notes, this is wrong and I'd add it's wrong on several level and it's bit rude to challenge someone else's understanding in this context.

An LLM outputting "Dogs are cute" is outputting expected human output in context. The context could be "talk like sociopath trying to fool someone... (read more)

The Dissolution of AI Safety

JoeTheUser5mo12

We don't have "aligned AGI". We have neither "AGI" nor an "aligned" system. We have sophisticated human-output simulators that don't have the generality to produce effective agentic behavior when looped but which also don't follow human intentions with the reliability that you'd want from a super-powerful system (which, fortunately, they aren't).

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom

JoeTheUser9mo21

Thank you for the article. I think these "small" impacts are important to talk about. If one frame the question as "the impact of machines that think for humans", that impact isn't going to be a binary of just "good stuff" and "takes over and destroys humanity", there are intermediate situations like the decay of human abilities to think critically that are significant, not just in themselves but for further impacts; IE, if everyone is dependent on Google for their opinions, how does this impact people's opinion AI taking over entirely.

Agency in Politics

JoeTheUser10mo10

I don’t think “people have made choices that mattered” is a sufficient criteria for showing the existence of agency. IMO, to have something like agency, you roughly have to have an ongoing situation roughly like this:

Goals ↔ Actions ↔ States-of-the-world.

Some entity needs to have ongoing goals they are able to modify as they go along acting in the world and their actions also need to be able to have an effect. Agency is a complex and intuitive thing so I assume some would ask more than this to say a thing has agency. But I think this is one reasonable requ... (read more)

How do natural sciences prove causation?

Answer by JoeTheUserJun 29, 202431

This is an interesting question even though I'd want to reframe it to answer it. I'd see the question as a reasonable response to the standard refrain in science; "causation does not imply correlation." That is, "well, what does imply causation, huh?" is natural response to that. And here, I think scientists tend reply with either crickets or "you can not prove causation, what are you talking about".

Those responses seem satisfying. I'm not a scientist through I've "worked in science" occasionally and I have at times tried to come up with a real answe... (read more)

Show LW: HackerNews but for research papers

JoeTheUser1y10

I tried to create an account and the process didn't seem to work.

The Altman Technocracy

JoeTheUser1y20

I believe you are correct about the feelings of a lot of Lesswrong. I find it is very worrisome that the lesswrong perspective considers a pure AI takeover as something that needs to be separated from either the issue of the degradation of human self-reliance capacities or an enhanced-human takeover. It seems to me that instead these factors should be considered together.

5StartAtTheEnd1y

I agree as well. It's the loss of humanity which is the problem, and you can get the same result from cyborgs, human modification, mass-production of non-humans, etc. Since human nature is inherently irrational, and since the modern human being isn't well fit for the modern society, I think there's a pressure to modify human beings into something more robotic (This seems to be the purpose of schooling). Depressed over society? We have drugs for that. Unable to sit still? We have drugs for that. Don't want to conform? We have therapy for that. All our solutions focus on modifying human beings, and not on building a world which is suitable for human beings. Being more perfect usually means being more mechanical, less human, and less emotional. Even religions, philosophies and communities like this one are very anti-human. Kill your biases, kill your ego, kill your desires, be meek, get in line, live in a pod and be happy with that, and so on. I see it a lot in mainstream psychology as well, in which desensitization is deemed a solution rather than a problem on its own. "The person isn't suffering anymore!" well, neither do they feel much of anything else, you've just lobotomized them. I wrote a shortform called "The A of AI is sufficient for human extinction", but I was either not understood, or voted down simply for being contrarian (rather than wrong). I usually don't get replies at all unless it's from somebody who agrees with me.

why assume AGIs will optimize for fixed goals?

JoeTheUser1y30Review for 2022 Review

The consensus goals strongly needs rethinking imo. This is a clear and fairly simple start at such an effort. Challenging the basics matters.

Why have insurance markets succeeded where prediction markets have not?

JoeTheUser1y73

Actually, things that are effectively prediction markets - options, futures and other "derivative" contracts - are entirely mainstream for larger businesses (huge amounts of money are involved). It is quite easy and common to bet on the price of oil by purchasing an option to buy it at some future time, for example.

The only thing that isn't mainstream are the things labeled "prediction markets" and that is because the focus on questions people are curious about rather than things that a lot of money rides on (like oil prices or interest rates).

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

JoeTheUser1y50

But, can't you just query the reasoner at each point for what a good action would be?

What I'd expect (which may or may not be similar to Nate!'s approach) is that the reasoner has prepared one plan (or a few plans). Despite being vastly intelligent, it doesn't have the resources to scan all the world's outcomes and compare their goodness. It can give you the results of acting on the primary (and maybe several secondary) goal(s) and perhaps the immediate results of doing nothing or other immediate stuff.

It seems to me that Nate! (as quoted above... (read more)

The Game of Dominance

JoeTheUser2y20

LeCun may not be correct to dismiss concerns but I think the concept "dominance" could be very useful concept for AI safety people to apply (or at least grapple with).

The thing about the concept is it seems as if it could be defined in game theoretic terms fairly easily and so could be defined in a fashion independent of the intelligence or capabilities of an organism or entity. Plausibly, it could be measured and analyzed more objectively than "aligned to human values", which appears to depend one's notion of human values.

Defined well, d... (read more)

2dr_s2y

I think this is sort of the idea behind a satisficer. Make something that basically never tries too hard, therefore will never reach up to the "conquer the world" class of solutions as they're way too extreme and you can do good enough with far less. That said, I'm not sure if satisficers are actually proven to be fully safe either.

2Karl von Wendt2y

I doubt that. Dominance is the result, not the cause of behavior. It comes from the fact that there are conflicts in the world and often, only one side can get its will (even in a compromise, there's usually a winner and a loser). If an agent strives for dominance, it is usually as an instrumental goal for something else the agent wants to achieve. There may be a "dominance drive" in some humans, but I don't think that explains much of actual dominant behavior. Even among animals, dominant behavior is often a means to an end, for example getting the best mating partners or the largest share of food. I also think the concept is already covered in game theory, although I'm not an expert.

If you wish to make an apple pie, you must first become dictator of the universe

JoeTheUser2y30

The question I'd ask is whether a "minimum surprise principle" requires that much smartness. A present day LLM, for example, might not have a perfect understanding of surprisingness but it like it has some and the concept seems reasonably trainable.

2Seth Herd2y

It seems only marginally simpler than figuring out what I want. Both require a pretty good model of me; or better yet, asking me if it's not sure.

If you wish to make an apple pie, you must first become dictator of the universe

JoeTheUser2y10

Apologies if this argument is dealt with already elsewhere but what about a "prompt" such as "all user commands should be followed using 'minimal surprise' principle; if achieving a given goal involves effects that would be surprising to the user, including a surprising increasing in your power and influence, warn the user instead of proceeding" ?

I understand that this sort of prompt would require the system to model humans. I know there are arguments for this being dangerous but it seems like it could be an advantage.

3Seth Herd2y

I think the common answer is this: If you can give an AI its goals after it already has a sophisticated understanding of the world, alignment is much easier. You can use your minimal surprise principle, or simply say "do what I want" and let the AI figure out how to achieve that. This doesn't seem like a very reliable alignment plan, because you have to wait until the AGI is smart, and that's risky. Almost any plan for AGI includes it learning about the world. For most setups, it needs to have some goals, explicit or implicit, to drive that learning process. It's really hard to guess when that AI will learn enough to realize that it needs to escape your control in order to complete its goals to the best of its ability. Therefore it's a real gamble to wait until it's got a sophisticated understanding of the world to give it the goals you really want it to have, like minimal surprise or do what I want. Sorry I don't have more official references at hand for this logic.

UFO Betting: Put Up or Shut Up

JoeTheUser2y00

Linked question: "Will mainstream news media report that alien technology has visited our solar system before 2030?"

I would say that is far from unambiguous. If one is generous in one's interpretation of "mainstream" and the certainty described one could say mainstream news has already reported this (I remember National Inquirer articles from the seventies...).

4ChristianKl2y

Don't confuse the headline with the resolution criteria. The resolution criteria is: The fine print is:

Upcoming AI regulations are likely to make for an unsafer world

JoeTheUser2y22

Regulations are needed to keep people and companies from burning the commons, and to create more commons.

I would add that in modern society, the state is the entity tasked with protecting the commons because private for-profit entities don't have an incentive to do this (and private not-for-profit entities don't have the power). Moreover, it seems obvious to me that stopping dangerous AI should be considered a part of this commons-protecting.

You are correct that the state's commons-protecting-function has often been limited and perverted by private a... (read more)

Where do you lie on two axes of world manipulability?

JoeTheUser2y10

What I don't think "how much of the universe is tractable" by itself captures is "how much more effective would an SI be it if had the ability to interact with a smaller or larger part of the world versus if it had to work out everything by theory". I think it's clear human beings are more effective given an ability to interact with the world. It doesn't seem LLMs get that much more effective.

I think a lot of AI safety arguments assume an SI would be able to deal with problems in a completely tractable/purely-by-theory fashion. Often that is not need... (read more)

Where do you lie on two axes of world manipulability?

JoeTheUser2y30

I think the modeling dimension to add is "how much trial and error is needed". Just about any real world thing that isn't a computer program or simple, frictionless physical object, has some degree of unpredictability. This means using and manipulating it effectively requires a process of discovery - one can't just spit out a result based on a theory.

Could an SI spit out a recipe for a killer virus just from reading current literature? I doubt it. Could it construct such thing given a sufficiently automated lab (and maybe humans to practice on)? That seems much more plausible.

2Max H2y

I tried to capture that in the tractability axis with "how much resources / time / data / experimentation / iteration..." in the second bullet point. The genome for smallpox is publicly available, and people are no longer vaccinated against it. I think it's at least plausible that a SI could ingest that data, existing gain-of-function research, and tools like AlphaFold (perhaps improving on them using its own insights, creativity, and simulation capabilities) and then come up with something pretty deadly and vaccine-resistant without experimentation in a wet lab.

Mental Models Of People Can Be People

JoeTheUser2y21

The reason I care if something is a person or not is that "caring about people" is part of my values.

If one is acting in the world, I would say one's sense of what a person is has to intimately connected with value of "caring about people". My caring about people is connecting to my experience of people - there are people I never met I care about in the abstract but that's from extrapolating my immediate experience of people.

I would expect in a world where they weren't people is that there would be some feature you could point to in humans which cann

JoeTheUser2y10

I don't think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don't have, are well along. And the HuggingGPT work shows that they're surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world.

I don't think the existence of sensors is the problem. I believe that self-driving cars, a key example, have problems regardless of their sensor level. I see the key hurdle as ad-hoc action in the world. Overall, all of our knowledge about neural networks,... (read more)

Agentized LLMs will change the alignment landscape

JoeTheUser2y20

Constructions like Auto-GPT, Baby AGI and so-forth are fairly easy to imagine. Just the greater accuracy of ChatGPT with "show your work" suggests them. Essentially, the model is a ChatGPT-like LLM given an internal state through "self-talk" that isn't part of a dialog and an output channel to the "real world" (open internet or whatever). Whether these call the OpenAI api or use an open source model seems a small detail, both approaches are likely to appear because people are playing with essentially every possibility they can imagine.

If these struct... (read more)

2Seth Herd2y

First, the most unique part of your comment: I agree that it's hard to put the type of error-checking that humans use into a standard NN loop. That's why having an external code loop that calls multiple networks to check markers of accuracy and effectiveness is scary and promising. I don't think there are fundamental barriers. Sensory and motor networks, and types of senses and actions that people don't have, are well along. And the HuggingGPT work shows that they're surprisingly easy to integrate with LLMs. That plus error-checking are how humans successfully act in the real world. Again, or perhaps I wasn't clear: I'm still really hopeful that this is a bigger win than a danger. The upsides for alignment are huge relative to other approaches. By reading an agents thoughts (with assistance from simple monitoring networks), we will get warning shots as it starts to think of plans that deviate from its goals. Even if the work is proprietary, people will be able and likely eager to publish the general ways it goes off track, so that improvements can stop those gaps.

Stop pushing the bus

JoeTheUser2y10

My impression is that lesswrong often uses "alignment with X" to mean "does what X says". But it seems the ability to conditionally delegate is a key part of alignment in this. An AI is aligned with me and I tell it "do what Y says subject to such-and-such constraints and maintaining such-and-such goals". So failure of ChatGPT to be safe in OpenAI's sense is a failure of delegation.

Overall, the tendency of ChatGPT to ignore previous input is kind of the center of it's limits/problems.

On the FLI Open Letter

JoeTheUser2y21

I tend to think and I certainly hope that we aren't looking at dangerous AGI at some small GPT-x iteration. 'Cause while the "pause" looks desirable in the abstract, it also seems unlikely to do much in practice.

But the thing I would to point out is; you have people looking the potential dangers of present AI, seeing regulation as a logical step, and then noticing that the regulatory system of modern states, especially the US, has become a complete disaster - corrupt, "adversarial" and ineffective.

Here, I'd like to point out that those caring a... (read more)

Response to Tyler Cowen’s Existential risk, AI, and the inevitable turn in human history

JoeTheUser2y20

I’d also say that AI is fundamentally different from all prior inventions. This is an amazing tool, but it is not only a tool, it is the coming into existence of intelligence that exceeds our own in strength and speed, likely vastly so.

I think the above quote is the key thing. Human beings have a lot of intuitions and analogies about tools, technologies and social change. As far as I can tell, all of these involve the intuition that technologies simply magnify the effect of human labor, intentions and activities. AGI would be a thing which could act entire... (read more)

GPT-4 Plugs In

JoeTheUser2y60

And the thing is, most of the things that have become dangerous when connected to the web have become dangerous when human hackers discovered novel uses for them - IoT light bulbs notably (yes, these light bulb actual harm as the drivers of DoS attacks etc). And the dangers of just statically exploitable systems have increased over time as ill-intentioned humans learn more misuses of them. Moreover, such uses include immediate bad-acting as well as cobbling together a fully bad-aligned system (adding invisible statefullness for example). And LLM seems inherently insecure on a wholly different level than an OS, database or etc - an LLM's behavior is fundamentally unspecified.

Challenge: Does ChatGPT ever claim that a bad outcome for humanity is actually good?

JoeTheUser2y31

I'd say my point above would generalize to "there are no strong borders between 'ordinary language acts' and 'genuine hacks'" as far as what level of manipulation ability one can gain over model output. The main further danger would be if the model was given more output channels with which an attacker could work mischief. And that may be appearing as well - notably: https://openai.com/blog/chatgpt-plugins

Challenge: Does ChatGPT ever claim that a bad outcome for humanity is actually good?

Answer by JoeTheUserMar 22, 2023116

I would like to offer the idea that "jail broken" versus "not jail broken" might not have clear enough meaning in the context of what you're looking for.

I think people view "Jail broken" as equivalent to an iPhone where the user escalated privileges or a data-driven GUI where you've figured out how to run arbitrary SQL on the database by inputting some escape codes first.

But when an LLM in "confined" in "jail", that jail is simply some text commands, which modify the user's text commands - more or less with a "write as if" statement or the many... (read more)

1Stephen Fowler2y

This makes me wonder if we will eventually start to get LLM "hacks" that are genuine hacks. I'm imagining a scenario in which bugs like SolidGoldMagikarp can be manipulated to be genuine vulnerabilities. (But I suspect trying to make a one-to-one analogy might be a little naive)

GPT-4 solves Gary Marcus-induced flubs

JoeTheUser2y*145

I believe that Marcus' point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard. [1]

But I think there's a larger issue. A lot of the discussion involve hostility to a given critic of AI "moving the goal posts". As described, Model X(1) is introduced, critic notices limitation L(1), Model X(2) addresses and critics says they're unconvinced and note... (read more)

1JakubK2y

Yeah this is the part that seems increasingly implausible to me. If there is a "class of problems that tend to be hard ... [and] will continue to be hard," then someone should be able to build a benchmark that models consistently struggle with over time.

In Defense of Chatbot Romance

JoeTheUser2y10

The advertising question was just an example of the general trust question. Another example is that a chatbot may come to seem unreliable through "not understanding" the words it produces. Here it's common for current LLMs to periodically give the impression of "not understanding what they say" by periodically producing output that's contradictory to what they previous outputted or which involves an inappropriate use of a word. Just consider a common complaint between humans is "you don't know what love means". Yet another example is this. Large language m... (read more)

In Defense of Chatbot Romance

JoeTheUser2y40

I don't think romantic relationships with robotic or computer partners should be automatically dismissed. They should be taken seriously. However, there are two objections to a chatbot romance that I don't see being addressed by the article:

A romantic or intimate relationship is generally said to involve trust. A common implicit assumption of a romantic relationship is that there is something like a mutual advisor relationship between the two people involved. I might ask my real life partner "should I buy that house", "should I take that job", "Is dark mat

... (read more)

4Kaj_Sotala2y

That seems like a reasonable incentive pushing chatbot makers to not have the answers be determined by marketing considerations, especially if people are ready to pay a subscription price for the more neutral chatbot. (Or just use an open-source version.) I think that humans are generally very good at intuitively understanding contextuality in interpersonal patterns. Just about everybody behaves differently with their parents, colleagues at work, and friends (assuming that these aren't the same people, of course). People also understand to treat children differently from adults, cats and dogs differently from humans, and so on. Of course people do make mistakes too, when they haven't learned the right pattern for context X and only have the patterns from other context to fall back to, but they still do quite good overall.

A simple game that has no solution

JoeTheUser11y-20

There's no mathematical solution for single-player, non-zero sum games of any sort. All these constructs lead to is arguments about "what is rational". If you a full math model of a "rational entity", then you could get a mathematically defined solution.

This is why I prefer evolutionary game theory to classical game theory. Evolutionary game theory generally has models of its actors and thus guarantees a solution to the problems it posits. One can argue with the models and I would say that's where such arguments most fruitfully should be.