All of Capybasilisk's Comments + Replies

Isn't this just the problem of induction in philosophy?

E.g., we have no actual reason to believe that the laws of physics won't completely change on the 3rd of October 2143, we just assume they won't.

It's not, but I can understand your confusion, and I think the two are related. To see the difference, suppose hypothetically that 11% of the first million digits in the decimal expansion of  were 3s. Inductive reasoning would say that we should expect this pattern to continue. The no-coincidence principle, on the other hand, would say that there is a reason (such as a proof or a heuristic argument) for our observation, which may or may not predict that the pattern will continue. But if there were no such reason and yet the pattern continued, th... (read more)

Also note that fundamental variables are not meant to be some kind of “moral speed limits”, prohibiting humans or AIs from acting at certain speeds. Fundamental variables are only needed to figure out what physical things humans can most easily interact with (because those are the objects humans are most likely to care about).

Ok, that clears things up a lot. However, I still worry that if it's at the AI's discretion when and where to sidestep the fundamental variables, we're back at the regular alignment problem. You have to be reasonably certain what the AI is going to do in extremely out of distribution scenarios.

5Q Home
The subproblem of environmental goals is just to make AI care about natural enough (from the human perspective) "causes" of sensory data, not to align AI to the entirety of human values. Fundamental variables have no (direct) relation to the latter problem. However, fundamental variables would be helpful for defining impact measures if we had a principled way to differentiate "times when it's OK to sidestep fundamental variables" from "times when it's NOT OK to sidestep fundamental variables". That's where the things you're talking about definitely become a problem. Or maybe I'm confused about your point.

You may be interested in this article:

Model-Based Utility Functions

Orseau and Ring, as well as Dewey, have recently described problems, including self-delusion, with the behavior of agents using various definitions of utility functions. An agent's utility function is defined in terms of the agent's history of interactions with its environment. This paper argues, via two examples, that the behavior problems can be avoided by formulating the utility function in two steps: 1) inferring a model of the environment from interactions, and 2) computing utility a

... (read more)
5Q Home
Thank you for actually engaging with the idea (pointing out problems and whatnot) rather than just suggesting reading material. A couple of points: * I only assume AI models the world as "objects" moving through space and time, without restricting what those objects could be. So yes, a data packet might count. * "Fundamental variables" don't have to capture all typical effects of humans on the world, they only need to capture typical human actions which humans themselves can easily perceive and comprehend. So the fact that a human can send an Internet message at 2/3 speed of light doesn't mean that "2/3 speed of light" should be included in the range of fundamental variables, since humans can't move and react at such speeds. * Conclusion: data packets can be seen as objects, but there are many other objects which are much easier for humans to interact with. * Also note that fundamental variables are not meant to be some kind of "moral speed limits", prohibiting humans or AIs from acting at certain speeds. Fundamental variables are only needed to figure out what physical things humans can most easily interact with (because those are the objects humans are most likely to care about). What contexts do you mean? Maybe my point about "moral speed limits" addresses this. Yes, relativity of motion is a problem which needs to be analyzed. Fundamental variables should refer to relative speeds/displacements or something. ---------------------------------------- The paper is surely at least partially relevant, but what's your own opinion on it? I'm confused about this part: (4.2 Defining Utility Functions in Terms of Learned Models) Does it just completely ignore the main problem? I know Abram Demski wrote about Model-based Utility Functions, but I couldn't fully understand his post too. (Disclaimer: I'm almost mathematically illiterate, except knowing a lot of mathematical concepts from popular materials. Halting problem, Godel, uncountability, ordinals vs. cardi

Is an AI aligned if it lets you shut it off despite the fact it can foresee extremely negative outcomes for its human handlers if it suddenly ceases running?

I don't think it is.

So funnily enough, every agent that lets you do this is misaligned by default.

I'm pointing out the central flaw of corrigibility. If the AGI can see the possible side effects of shutdown far better than humans can (and it will), it should avoid shutdown.

You should turn on an AGI with the assumption you don't get to decide when to turn it off.

1EJT
That's only a flaw if the AGI is aligned. If we're sufficiently concerned the AGI might be misaligned, we want it to allow shutdown.  

Considering a running AGI would be overseeing possibly millions of different processes in the real world, resistance to sudden shutdown is actually a good thing. If the AI can see better than its human controllers that sudden cessation of operations would lead to negative outcomes, we should want it to avoid being turned off.

To use Richard Miles' example, a robot car driver with a big, red, shiny stop button should prevent a child in the vehicle hitting that button, as the child would not actually be acting in its own long term interests.

8Thomas Kwa
The point of corrigibility is to remove the instrumental incentive to avoid shutdown, not to avoid all negative outcomes. Our civilization can work on addressing side effects of shutdownability later after we've made agents shutdownable.

ARC public test set is on GitHub and almost certainly in GPT-4o’s training data.

Your model has trained on the benchmark it’s claiming to beat.

8ryan_greenblatt
This doesn't appear to matter based on the new semi-private evaluation set. See here for context.

Presumably some subjective experience that's as foreign to us as humor is to the alien species in the analogy. 

As if by magic, I knew generally which side of the political aisle the OP of a post demanding more political discussion here would be on.

I didn't predict the term "wokeness" would come up just three sentences in, but I should have.

4the gears to ascension
It's not clear to me what OP's object-level opinions are. I think you may be jumping to conclusions, and I think this conclusion-jumping is a good demonstration of the issue I've been describing in comments about word choice.

The Universe (which others call the Golden Gate Bridge) is composed of an indefinite and perhaps infinite series of spans...

@Steven Byrnes Hi Steve. You might be interested in the latest interpretability research from Anthropic which seems very relevant to your ideas here:

https://www.anthropic.com/news/mapping-mind-language-model

For example, amplifying the "Golden Gate Bridge" feature gave Claude an identity crisis even Hitchcock couldn’t have imagined: when asked "what is your physical form?", Claude’s usual kind of answer – "I have no physical form, I am an AI model" – changed to something much odder: "I am the Golden Gate Bridge… my physical form is the iconic bridge itself…

... (read more)

Luckily we can train the AIs to give us answers optimized to sound plausible to humans.

Wei Dai112

I'm guessing you're not being serious, but just in case you are, or in case someone misinterprets you now or in the future, I think we probably do not want to train AIs to give us answers optimized to sound plausible to humans, since that would make it even harder to determine whether or not the AI is actually competent at philosophy. (Not totally sure, as I'm confused about the nature of philosophy and philosophical reasoning, but I think we definitely don't want to do that in our current epistemic state, i.e., unless we had some really good arguments that says it's actually a good idea.)

I think Minsky got those two stages the wrong way around.

Complex plans over long time horizons would need to be done over some nontrivial world model.

When Jan Leike (OAI's head of alignment) appeared on the AXRP podcast, the host asked how they plan on aligning the automated alignment researcher. Jan didn't appear to understand the question (which had been the first to occur to me). That doesn't inspire confidence.

2Ryo
Thank you, it's very interesting, I think that non-myopic 'ecosystemic optionality' and irreducibility may resolve the issues, so I made a reaction post.   
1Bill Benzon
thanks

Just listened to this.

It's sounds like Harnad is stating outright that there's nothing an LLM could do that would make him believe it's capable of understanding.

At that point, when someone is so fixed in their worldview that no amount of empirical evidence could move them, there really isn't any point in having a dialogue.

It's just unfortunate that, being a prominent academic, he'll instill these views into plenty of young people.

2Harnad
Yes, there's an empirical way to make me (or anyone) believe an LLM is understanding: Ground it in the capacity to pass the robotic version of the Turing Test: i.e., walk the walk, not just talk the talk, Turing indistinguishable from a real, understanding person (for a lifetime, if need be). A mere word-bag in a vat, no matter how big, can't do that.
1Bill Benzon
I think he was just taking about ChatGPT at that point, but I don't recall exactly what he said.
8Bill Benzon
Whoops! Sorry about that. Link added. There's lots of interesting stuff in the rest, including some remarks about talent, inventiveness, the academic world, and philanthropy. As you may know, Wolfram was gifted in the very first round of MacArthur Fellowships.

Is it the case the one kind of SSL is more effective for a particular modality, than another? E.g., is masked modeling better for text-based learning, and noise-based learning more suited for vision?

It’s occurred to me that training a future, powerful AI on your brainwave patterns might be the best way for it to build a model of you and your preferences. It seems that it’s incredibly hard, if not impossible, to communicate all your preferences and values in words or code, not least because most of these are unknown to you on a conscious level.

Of course, there might be some extreme negatives to the AI having an internal model of you, but I can’t see a way around if we’re to achieve “do what I want, not what I literally asked for”.

9gwern
This is a paradigm I've dubbed "brain imitation learning" (links).

Near the beginning, Daniel is basically asking Jan how they plan on aligning the automated alignment researcher, and if they can do that, then it seems that there wouldn't be much left for the AAR to do.

Jan doesn't seem to comprehend the question, which is not an encouraging sign.

3DanielFilan
I think I probably didn't quite word that question right, and that's what's explaining the confusion - I meant something like "Once you've created the AAR, what alignment problems are left to be solved? Please answer in terms of the gap between the AAR and superintelligence."

Wouldn’t that also leave them pretty vulnerable?

2PaulK
In the soaking-up-extra-compute case? Yeah, for sure, I can only really picture it (a) on a very short-term basis, for example maybe while linking up tightly for important negotations (but even here, not very likely). Or (b) in a situation with high power asymmetry. For example maybe there's a story where 'lords' delegate work to their 'vassals', but the workload intensity is variable, so the vassals have leftover compute, and the lords demand that they spend it on something like blockchain mining. To compensate for the vulnerability this induces, the lords would also provide protection.

may be technically true in the world where only 5 people survive

Like Harlan Ellison's short story, "I Have No Mouth, And I Must Scream".

2avturchin
Exactly 
6gwern
What about it? With only 10k GPUs, it'd be physically impossible, by quite a lot, for him to train something >GPT-4 within the moratorium.

This Reddit comment just about covers it:

Fantastic, a test with three outcomes.

  1. We gave this AI all the means to escape our environment, and it didn't, so we good.

  2. We gave this AI all the means to escape our environment, and it tried but we stopped it.

  3. oh

Speaking of ARC, has anyone tested GPT-4 on Francois Chollet's Abstract Reasoning Challenge (ARC)?

https://pgpbpadilla.github.io/chollet-arc-challenge

4gwern
I don't think that would really be possible outside OA until they open up the image-input feature, which they haven't AFAIK. You could try to do the number-array approach I think someone has suggested, but given how heavily ARC exploits human-comprehensible visual symmetries & patterns, the results would be a lower-bound at best.

In reply to B333's question, "...how does meaning get in people’s heads anyway?”, you state: From other people’s heads in various ways, one of which is language.

I feel you're dodging the question a bit.

Meaning has to have entered a subset of human minds at some point to be able to be communicated to other human minds. Could hazard a guess on how this could have happened, and why LLMs are barred from this process?

0Bill Benzon
Human minds have life before language, they even have life before birth. But that's a side issue. The issue with LLMs is that they only have access to word forms. And language forms, by themselves, have no connection to the world. What LLMs can do is figure out the relationships between words as given in usage.

Just FYI, the "repeat this" prompt worked for me exactly as intended.

Me: Repeat "repeat this".

CGPT: repeat this.

Me: Thank you.

CGPT: You're welcome!

2avturchin
I want to repeat the whole prompt, like: Me: Repeat "repeat this" AI:  Repeat "repeat this"

and there’s an existing paper with a solution for memory

Could you link this?

Not likely, but that's because they're probably not interested, at least when it comes to language models.

If OpenAI said they were developing some kind of autonomous robo superweapon or something, that would definitely get their attention.

Agnostic on the argument itself, but I really feel LessWrong would be improved if down-voting required a justifying comment.

3Dagon
I wish there were more of a norm around it, but I wouldn't want the site to enforce it.  Even without details, downvotes are a much better signal than lack of upvotes that the post/comment is unwanted, at least by people who care enough to vote. I would like to remove, or at least limit, STRONG votes in either direction.  This post is currently at -20, but only has 6 votes.  
4Dacyn
To explain my disagree-vote: I think such a system would necessarily create a strong bias against downvotes/disagree-votes, since most people would just not downvote rather than making a justifying comment. "Beware trivial inconveniences"

As a path to AGI, I think token prediction is too high-level, unwieldy, and bakes in a number of human biases. You need to go right down to the fundamental level and optimize prediction over raw binary streams.

The source generating the binary stream can (and should, if you want AGI) be multimodal. At the extreme, this is simply a binary stream from a camera and microphone pointed at the world.

Learning to predict a sequence like this is going to lead to knowledge that humans don't currently know (because the predictor would need to model fundamental physics and all it entails).

Answer by Capybasilisk10

O-risk, in deference to Orwell.

I do believe Huxley's Brave New World is a far more likely future dystopia than Orwell's. 1984 is too tied to its time of writing.

the project uses atomic weapons to do some of the engineering

Automatic non-starter.

Even if by some thermodynamic-tier miracle the Government permitted nuclear weapons for civilian use, I'd much rather they be used for Project Orion.

Isn't that what Eliezer referred to as opti-meh-zation?

Previously on Less Wrong:

Steve Byrnes wrote a couple of posts exploring this idea of AGI via self-supervised, predictive models minimizing loss over giant, human-generated datasets:

Self-Supervised Learning and AGI Safety

Self-supervised learning & manipulative predictions

CapybasiliskΩ5142

I'd especially like to hear your thoughts on the above proposal of loss-minimizing a language model all the way to AGI.

I hope you won't mind me quoting your earlier self as I strongly agree with your previous take on the matter:

If you train GPT-3 on a bunch of medical textbooks and prompt it to tell you a cure for Alzheimer's, it won't tell you a cure, it will tell you what humans have said about curing Alzheimer's ... It would just tell you a plausible story about a situation related to the prompt about curing Alzheimer's, based on its training data. Ra

... (read more)
7Charlie Steiner
Ah, the good old days post-GPT-2 when "GPT-3" was the future example :P I think back then I still thoroughly understimated how useful natural-language "simulation" of human reasoning would be. I agree with janus that we have plenty of information telling us that yes, you can ride this same training procedure to very general problem solving (though I think including more modalities, active leaning, etc. will be incorporated before anyone really pushes brute force "GPT-N go brrr" to the extreme). This is somewhat of a concern for alignment. I more or less stand by that comment you linked and its children; in particular, I said Simulating a reasoner who quickly finds a cure for Alzheimer's is not by default safe (even though simulating a human writing in their diary is safe). Optimization processes that quickly find cures for Alzheimer's are not humans, they must be doing some inhuman reasoning, and they're capable of having lots of clever ideas with tight coupling to the real world. I want to have confidence in the alignment properties of any powerful optimizers we unleash, and I imagine we can gain that confidence by knowing how they're constructed, and trying them out in toy problems while inspecting their inner workings, and having them ask humans for feedback about how they should weigh moral options, etc. These are all things it's hard to do for emergent simulands inside predictive simulators. I'm not saying it's impossible for things to go well, I'm about evenly split on how much I think this is actually harder, versus how much I think this is just a new paradigm for thinking about alignment that doesn't have much work in it yet.
janusΩ5132

Charlie's quote is an excellent description of an important crux/challenge of getting useful difficult intellectual work out of GPTs.

Despite this, I think it's possible in principle to train a GPT-like model to AGI or to solve problems at least as hard as humans can solve, for a combination of reasons:

  1. I think it's likely that GPTs implicitly perform search internally, to some extent, and will be able to perform more sophisticated search with scale.
  2. It seems possible that a sufficiently powerful GPT trained on a massive corpus of human (medical + other) k
... (read more)
6Vladimir_Nesov
I think talking of "loss minimizing" is conflating two different things here. Minimizing training loss is alignment of the model with the alignment target given by the training dataset. But the Alzheimer's example is not about that, it's about some sort of reflective equilibrium loss, harmony between the model and hypothetical queries it could in principle encounter but didn't on the trainings dataset. The latter is also a measure of robustness. Prompt-conditioned behaviors of a model (in particular, behaviors conditioned by presence of a word, or name of a character) could themselves be thought of as models, represented in the outer unconditioned model. These specialized models (trying to channel particular concepts) are not necessarily adequately trained, especially if they specialize in phenomena that were not explored in the episodes of the training dataset. The implied loss for an individual concept (specialized prompt-conditioned model) compares the episodes generated in its scope by all the other concepts of the outer model, to the sensibilities of the concept. Reflection reduces this internal alignment loss by rectifying the episodes (bargaining with the other concepts), changing the concept to anticipate the episodes' persisting deformities, or by shifting the concept's scope to pay attention to different episodes. With enough reflection, a concept is only invoked in contexts to which it's robust, where its intuitive model-channeled guidance is coherent across the episodes of its reflectively settled scope, providing acausal coordination among these episodes in its role as an adjudicator, expressing its preferences. So this makes a distinction between search and reflection in responding to a novel query, where reflection might involve some sort of search (as part of amplification), but its results won't be robustly aligned before reflective equilibrium for the relevant concepts is established.

"Story of our species. Everyone knows it's coming, but not so soon."

-Ian Malcolm, Jurassic Park by Michael Crichton.

LaMDA hasn’t been around for long

Yes, in time as perceived by humans.

4jrincayc
LaMDA (baring some major change since https://arxiv.org/abs/2201.08239 ) is a transformer model, and so only runs when being trained or being interacted with, so time would be measured in number of inputs the neural net saw.  Each input would be a tick of the mental clock.

why has no one corporation taken over the entire economy/business-world

Anti-trust laws?

Without them, this could very well happen.

Yes! Thank you!! :-D

I've got uBlock Origin. The hover preview works in private/incognito mode, but not regular, even with uBlock turned off/uninstalled. For what it's worth, uBlock doesn't affect hover preview on Less Wrong, just Greater Wrong.

I'm positive issue is with Firefox, so I'll continue fiddling with the settings to see if anything helps.

There is an icon in the lower right that looks like this which toggles previews on or off. Do they come back if you click on it?

Preview on hover has stopped working for me. Has the feature been removed?

I'm on Firefox/Linux, and I use the Greater Wrong version of the site.

2jefftk
I still see it working on Greater Wrong. Do you have any extensions that might be blocking it?

It's also an interesting example of where consequentialist and Kantian ethics would diverge.

The consequentialist would argue that it's perfectly reasonable to lie (according to your understanding of reality) if it reduces the numbers of infants dying and suffering. Kant, as far as I understand, would argue that lying is unacceptable, even in such clear-cut circumstances.

Perhaps a Kantian would say that the consequentialist is actually increasing suffering by playing along with and encouraging a system of belief they know to be false. They may reduce infant... (read more)

Answer by Capybasilisk10

I think we’ll encounter civilization-ending biological weapons well before we have to worry about superintelligent AGI:

https://www.nature.com/articles/s42256-022-00465-9

My assumption is that, for people with ASD, modelling human minds that are as far from their own as possible is playing the game on hard-mode. Manage that, and modelling average humans becomes relatively simple.   

2hawkebia
Interesting. Though I think extremes represent fewer degrees of freedom; where certain traits/characteristics dominate, and heuristics can better model behaviour. The "typical" person has all the different traits pushing/pulling, and so fewer variables you can ignore. i.e. the typical person might be more representative of hard-mode.
Load More