Awesome ideas! These ideas are some of the things missing for LLMs to have economic impact. Companies expected them to just automate certain jobs, but that's an all or nothing solution that's never worked historically (until it eventually does, but we're not there yet).
One idea I thought of when reading Scott Aaronson's Reading Burden (https://scottaaronson.blog/?p=8217), is that people with interesting opinions and with somewhat of a public presence, have a TON of reading to do, not just to keep up with current events, but to observe people's reactions an...
I'll probably get disagree points, but wanted to share my reaction: I honestly don't mind the AI's output. I read it all and think it's just an elaboration of what you said. The only problem I noticed is it is too long.
Then again, I'm not an amazing writer, and my critical skills aren't so great for critiquing style. I will admit I rarely use assistance, because I have a tight set of points I want to include, and explaining them all to the AI is almost the same as writing the post itself.
Brief comments on what's bad about the output:
The instruction is to write an article arguing that AI-generated posts suffer from verbosity, hedging, and unclear trains of thought. But ChatGPT makes that complaint in a single sentence in the first paragraph and then spends 6 paragraphs adding a bunch of its own arguments:
Thanks for this post! I have always been annoyed when on Reddit or even here, the response to poverty always goes back to, "but poor people have cell phones!" It all comes down to freedom -- the amount of meaningfully distinct actions one person can take in the world to accomplish their goals. If there are few real alternatives, and one's best options all involve working until exhaustion, it is not true freedom.
I agree, the poverty restoring equilibrium is more complex than probably UBI -- maybe it's part of Moloch. I think the rents increasing by the UBI ...
Not my worst prediction, given the latest news!
That's fair. Here are some things to consider:
1 - I think 2017 was not that long ago. My hunch is that the low level architecture of the network itself is not a bottleneck yet. I'd lean on more training procedures and algorithms. I'd throw RLHF and MoE as significant developments, and those are even more recent.
2 - I give maybe 30% chance of a stall, in the case little commercial disruption comes of LLMs. I think there will still be enough research going on at the major labs, and even universities at a smaller scale gives a decent chance at efficiency gain...
I think it is plausible but not obvious if this is the case, that large language models have a fundamental issue with reasoning. However, I don't think this greatly impacts timelines. Here is my thinking:
I think time lines are fundamentally driven by scale and compute. We have a lot of smart people working on the problem, and there are a lot of obvious ways to address these limitations. Of course, given how research works, most of these ideas won't work, but I am skeptical of the idea that such a counter-intuitive paradigm shift is needed that nobody has e...
Is there a post in the Sequences about when it is justifiable to not pursue going down a rabbit hole? It's a fairly general question, but the specific context is a tale as old as time. My brother, who has been an atheist for decades, moved to Utah. After 10 years, he now asserts that he was wrong and his "rigorous pursuit" of verifying with logic and his own eyes, leads him to believe the Bible is literally true. I worry about his mental health so I don't want to debate him, but felt like I should give some kind of justification for why I'm not personally ...
I would pay to see this live at a bar or one of those county fair (we had a GLaDOS cover band once so it's not out of the question)
If we don't get a song like that, take comfort that GLaDoS's songs from the Portal soundtrack are basically the same idea as the Sydney reference. Link: https://www.youtube.com/watch?v=dVVZaZ8yO6o
Let me know if I've missed something, but it seems to me the hard part is still defining harm. In the one case, where we will use the model and calculate the probability of harm, if it has goals, it may be incentivized to minimize that probability. In the case where we have separate auxiliary models whose goals are to actively look for harm, then we have a deceptively adversarial relationship between these. The optimizer can try to fool the harm finding LLMs. In fact, in the latter case, I'm imagining models which do a very good job at always finding some ...
That's fair, I read the post but did not re-read it, and asking for "more" examples out of such a huge list seems a bit asking too much. Still though, I find the process of finding these examples somewhat fun, and for whatever reason, had not found many of them too shocking, so felt the instinct to keep searching.
Dissociative identity disorder would be an interesting case, I have heard there was much debate on whether it was real. As you know someone, I assume it's not exactly like you see in movies, and probably falls on a spectrum as discussed in this post?
One fear I have is that the open source community will come out ahead, and push for greater weight sharing of very powerful models.
Edit: To make more specific, I mean that the open source community will become more attractive, because they will say, you cannot rely on individual companies whose models may or may not be available. You must build on top of open source. Related tweet:
https://twitter.com/ylecun/status/1726578588449669218
Whether their plan works or not, dunno.
One thing that would help me, not sure if others agree -- would be some more concrete predictions. I think the historical examples of autism and being gay make sense, but are quite normalized now, that one can almost say, "That was previous generations. We are open minded and rational now". What are some new applications of this logic, that would surprise us? Are these omitted due to some info hazard? Surely we can find some that are not. I am honestly having a hard time coming up with them myself, but here goes:
It's less "you probably know a burglar" and more "successful burglaries are probably 10-100x more common than you would think, if you based your prediction solely on visible evidence."
The two that seem most obvious to me: well-behaved psychopaths (i.e. people who have little or no empathetic response but who have learned to follow the social rules anyway, for the sake of headache avoidance) and non-practicing pedophiles (i.e. people who are attracted to children but are zero percent interested in raping anyone) probably really actually are quite common.
Does the list starting shortly after "Here are some examples of things" not fit that desire for more examples?
I think the trouble is that this theory predicts everything one can offer as an example of dark matter either 1. Something that used to be very discouraged and now isn't as discouraged, like being gay or 2. Something that is currently very discouraged and therefore hidden, like being a sex worker. "What am I not seeing evidence for" is a hard thing to notice!
As far as I know, I don't personally know anyone HIV positive. Maybe that's because there a...
I share your disagreement with the original author as to the cause of the relief. For me, I find the modern day and age very confusing and difficult to measure one's value to society. Any great idea you can think of, probably someone else has thought of it, and you have little chance to be important. In a zombie apocalypse, instead of thinking how to out-compete your fellow man with some amazing invention, you fall back to survival. Important things in this world, like foraging for food, fending off zombies, etc, have quicker reward, and it's easier in som...
If we know they aren't conscious, then it is a non-issue. A random sample from conscious beings would land on the SAI with probability 0. I'm concerned we create something accidently conscious.
I am skeptical it is easy to avoid. If it can simulate a conscious being, why isn't that simulation conscious? If consciousness is a property of the physical universe, then an isomorphic process would have the same properties. And if it can't simulate a conscious being, then it is not a superintelligence.
It can, however, possibly have a non-conscious outer-program... and avoid simulating people. That seems like a reasonable proposal.
Agree. Obviously alignment is important, but it has always creeped me out in the back of my mind, some of the strategies that involve always deferring to human preferences. It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant. What if a random consciousness sampled from our universe in the future, comes from it with probability almost 1? We probably have to keep that in mind too. Sigh, yet another constraint we have to add!
Hi Critch,
I am curious to hear more of your perspectives, specifically on two points I feel least aligned with, the empathy part, and the Microsoft part. If I hear more I may be able to update in your direction.
Regarding empathy with people working on bias and fairness, concretely, how do you go about interacting with and compromising with them?
My perspective: it's not so much that I find these topics not sufficiently x-risky (but that is true, too), but it is that I perceive a hostility to the very notion of x-risk from at a subset of this same group. The...
This might be a good time for me to ask a basic question on mechanistic interpretability:
Why does targeting single neurons work? Does it work? One would think that if there is a single dimensional quantity to measure, why would it align with the standard basis? Why wouldn't it be aligned to a random one dimensional linear subspace? Then, examining single neurons is likely to give you some weighted combination of concepts instead, rather than a single interpretation...
Those are good questions! There's some existing research which address some of your questions.
Single neurons often do represent multiple concepts: https://transformer-circuits.pub/2022/toy_model/index.html
It seems to still be unclear why the dimensions are aligned with the standard basis: https://transformer-circuits.pub/2023/privileged-basis/index.html
Fascinating, thanks for the research. Your analysis makes sense and seems to indicate that for most situations, prompt engineering is the always the first plan of attack and often works well enough. Then, a step up from there, OpenAI/etc would most likely experiment with fine-tuning or RLHF as it relates to a specific business need. To train a better chatbot and fill in any gaps, they probably would get more bang for their buck on simply fine-tuning it on a large dataset that matched their needs. For example, if they wanted to do better mathematical reason...
I agree with the analysis of the ideas overall. I think however, AI x-risk does have some issue regarding communications. First of all, I think it's very unlikely that Yann will respond to the wall of text. Even though he is responding, I imagine him more to be on the level of your college professor. He will not reply to a very detailed post. In general, I think that AI x-risk should aim to explain a bit more, rather than to take the stance that all the "But What if We Just..." has already been addressed. It may have been, but this is not the way to gettin...
Essentially yes, heh. I take this as a learning experience for my writing, I don't know what I was thinking, but it is obvious in hindsight that saying to just "switch on backprop" sounds very naive.
I also confess I haven't done the due diligence to find out what the actual largest model that has been tried with this, whether someone has tried it with Pythia or LLaMa. I'll do some more googling tonight.
One intuition why the largest models might be different, is that part of the training/fine-tuning going on will have to do with the model's own output. The largest models are the ones where the model's own output is not essentially word salad.
I have noted the problem of catastrophic forgetting in the section "why it might not work". In general I agree continual learning is obviously a thing, otherwise I would not have used the established terminology. What I believe however is that the problems we face in continual learning in e.g. a 100M BERT model may not be the same as what we observe in models that can now meaningfully self critique. We have explored this technique publicly, but have we tried it with GPT-4? The publicly part was really just a question of whether OpenAI actually did it on this model or not, and it would be an amazing data point if they could say "We couldn't get it to work."
It's possible it's downvoted because it might be considered dangerous capability research. It just seems highly unlikely that this would not be one of many natural research directions perhaps already attempted, and I figure we might as well acknowledge it and find out what it actually does in practice.
Or maybe downvotes because it "obviously won't work", but I think it's not obvious to me and would welcome discussion on that.
Thanks, this is a great analysis on the power of agentized LLMs, which I probably need to spend some more time thinking about. I will work my way through the post over the next few days. I briefly skimmed the episodic memory section for now, and I see it is like an embedding based retrieval system for past outputs/interactions of the model, reminiscent of the way some Helper chatbots look up stuff from FAQs. My overall intuitions on this:
Very interesting write up. Do you have a high level overview of why, despite all of this, P(doom) is still 5%? What do you still see as the worst failure modes?
Noticed this as well. I tried to get it to solve some integration problems, and it could try different substitutions and things, but if they did not work, it kind of gave up and said to numerically integrate it. Also, it would make small errors, and you would have to point it out, though it was happy to fix them.
I'm thinking that most documents it reads tend to omit the whole search/backtrack phase of thinking. Even work that is posted online that shows all the steps, usually filters out all the false starts. It's like how most famous mathematicians were known for throwing away their scratchwork, leaving everyone to wonder how exactly they formed their thought processes...
The media does have its biases but their reaction seems perfectly reasonable to me. Occam's razor suggests this is not only unorthodox, but shows extremely poor judgment. This demonstrates that (a) either Elon is actually NOT as smart he has been hyped to be, or (b) there's some ulterior motive, but these are long-tailed.
Typically when one joins a company, you don't do anything for X number of months and get the lay of the land. I'm inclined to believe this is not just a local minimum, but typically close to the optimal strategy for a human being (but not ...
I'll say I definitely think it's too optimistic and I don't much too much stock into it. Still, I think it's worth thinking about.
Yes, absolutely we are not following the rule. The reason why I think it might change with an AGI: (1) currently we humans, despite what we say when we talk about aliens, still place a high prior on being alone in the universe, or from dominant religious perspectives, that we are the most intelligent. Those things combine to make us think there are no consequences to our actions against other life. An AGI, itself a proof of conc...
That is a very fair criticism. I didn't mean to imply this is something I was very confident in, but was interested in for three reasons:
1) This value function aside, is this a workable strategy, or is there a solid reason for suspecting the solution is all-or-nothing? Is it reasonable to 'look for' our values with human effort, or does this have to be something searched for using algorithms?
2) It sort of gives a flavor to what's important in life. Of course the human value function will be a complicated mix of different sensory inputs, reproduction, and g...
I'm an ML engineer at a FAANG-adjacent company. Big enough to train our own sub-1B parameter language models fairly regularly. I work on training some of these models and finding applications of them in our stack. I've seen the light after I read most of Superintelligence. I feel like I'd like to help out somehow. I'm in my late 30s with kids, and live in the SF bay area. I kinda have to provide for them, and don't have any family money or resources to lean on, and would rather not restart my career. I also don't think I should abandon ML and try to ...
You might want to consider registering for the AGI Safety Fundamentals Course (or reading through the content). The final project provides a potential way of dipping your toes into the water.
Both 80,000hours and AI Safety Support are keen to offer personalised advice to people facing a career decision and interested in working on alignment (and in 80k's case, also many other problems).
Noting a conflict of interest - I work for 80,000 hours and know of but haven't used AISS. This post is in a personal capacity, I'm just flagging publicly available information rather than giving an insider take.
Has there been effort into finding a "least acceptable" value function, one that we hope would not annihilate the universe or turn it degenerate, even if the outcome itself is not ideal? My example would be to try to teach a superintelligence to value all other agents facing surmountable challenges in a variety of environments. The degeneracy condition of this, is if it does not value the real world, will simply simulate all agents in a zoo. However, if the simulations are of faithful fidelity, maybe that's not literally the worst thing. Plus, the zoo, to truly be a good test of the agents, would approach being invisible.
I can see the argument of capabilities vs safety both ways. On the one hand, by working on capabilities, we may get some insights. We could figure out how much data is a factor, and what kinds of data they need to be. We could figure out how long term planning emerges, and try our hand at inserting transparency into the model. We can figure out whether the system will need separate modules for world modeling vs reward modeling. On the other hand, if intelligence turns out to be not that hard, and all we need to do is train a giant decision transforme...
I think we are getting some information. For example, we can see that token level attention is actually quite powerful for understanding language and also images. We have some understanding of scaling laws. I think the next step is a deeper understanding of how world modeling fits in with action generation -- how much can you get with just world modeling, versus world modeling plus reward/action combined?
If the transformer architecture is enough to get us there, it tells us a sort of null hypothesis for intelligence -- that the structure for predicting seq...
Not rhetorically, what kind of questions you think would better lead to understanding how AGI works?
Suppose I'm designing an engine. I try out a new design, and it surprises me - it works much worse or much better than expected. That's a few bits of information. That's basically the sort of information we get from AI experiments today.
What we'd really like is to open up that surprising engine, stick thermometers all over the place, stick pressure sensors all over the place, measure friction between the parts, measure vibration, measure fluid flow and conce...
I think the desire works because most honest people know, if they give a good-sounding answer that is ultimately meaningless, no benefits will come of the answers given. They may eventually stop asking questions, knowing the answers are always useless. It's a matter of estimating future rewards from building relationships.
Now, when a human gives advice to another human, most of the time it is also useless, but not always. Also, it tends to not be straight up lies. Even in the useless case, people still think there is some utility in there, for example, hav...
One other thing I'm interested in, is there a good mathematical model of 'search'? There may not be an obvious answer. I just feel like there is some pattern that could be leveraged. I was playing hide and seek with my kids the other day, and noticed that, in a finite space, you expect there to be finite hiding spots. True, but every time you think you've found them all, you end up finding one more. I wonder if figuring out optimizations or discoveries follow a similar pattern. There are some easy ones, then progressively harder ones, but there are far more to be found than one would expect... so to model finding these over time, in a very large room...
I agree, I have also thought I am not completely sure of the dynamics of the intelligence explosion. I would like to have more concrete footing to figure out what takeoff will look like, as neither fast nor slow are proved.
My intuition however is the opposite. I can't disprove a slow takeoff, but to me it seems intuitive that there are some "easy" modifications that should take us far beyond human level. Those intuitions, though they could be wrong, are thus:
- I feel like human capability is limited in some obvious ways. If I had more time and energy to fo...
Policy makers.
Policy makers
For ML researchers.
AI existential risk is like climate change. It's easy to come up with short slogans that make it seem ridiculous. Yet, when you dig deeper into each counterargument, you find none of them are very convincing, and the dangers are quite substantial. There's quite a lot of historical evidence for the risk, especially in the impact humans have had on the rest of the world. I strongly encourage further, open-minded study.
It's easy to imagine that the AI will have an off switch, and that we could keep it locked in a box and ask it questions. But just think about it. If some animals were to put you in a box, do you think you would stay in there forever? Or do you think you'd figure a way out that they hadn't thought of?
AI x-risk. It sounds crazy for two reasons. One, because we are used to nothing coming close to human intelligence, and two, because we are used to AI being unintelligent. For the first, the only point of comparison is imagining something that is to us what we are to cats. For the second, though we have not quite succeeded yet, it only takes one. If you have been following the news, we are getting close.
Yeah, I tend to agree. Just wanted to make sure I'm not violating norms. In that case, my specific thoughts are as follows, with a thought to implementing AI transparency at the end.
There is the observation that the transformer architecture doesn't have a hidden state like an LSTM. I thought for a while something like this was needed for intelligence, to have a compact representation of the state one is in. (My biased view, that I've updated away from, was that the weights represented HOW to think, and less about knowledge.) However, it's really intractabl...
I think this is absolutely correct. GPT-3/PaLM is scary impressive, but ultimately relies on predicting missing words, and its actual memory during inference is just the words in its context! What scares me about this is that I think there are some really simple low hanging fruit to modify something like this to be, at least, slightly more like an agent. Then plugging things like this as components into existing agent frameworks, and finally, having entire research programs think about it and experiment on it. Seems like the problem would crack. You never ...
As a ML engineer, I think it's plausible. I also think there are some other factors that could act to cushion or mitigate slowdown. First, I think there are more low hanging fruit available. Now that we've seen what large transformer models can do on the text domain, and in a text-to-image Dall-E model, I think the obvious next step is to ingest large quantities of video data. We often talk about the sample inefficiency of modern methods as compared with humans, but I think humans are exposed to a TON of sensory data in building their world model. This see...
I work at a large, not quite FAANG company, so I'll offer my perspective. It's getting there. Generally, the research results are good, but not as good as they sound in summary. Despite the very real and very concerning progress, most papers you take at face value are a bit hyped. The exceptions to some extent are the large language models. However, not everyone has access to these. The open source versions of them are good but not earth shattering. I think they might be if the goal is to general fluent sounding chatbots, but this is not the goal of most w...
I posted something I think could be relevant to this: https://www.lesswrong.com/posts/PfbE2nTvRJjtzysLM/instrumental-convergence-to-offer-hope
The takeaway is, for a sufficiently advanced agent, who wants to hedge against the possibility of itself being destroyed by a greater power, may decide the only surviving plan is to allow the lesser life forms some room to optimize their own utility. It's sort of an asymmetrical infinite game theoretic chain. If every agent kills lower agents, only the maximum survives and no one knows if they are the maximum. If there even is a maximum.
War. Poverty. Inequality. Inhumanity. We have been seeing these for millennia caused by nation states or large corporations. But what are these entities, if not greater-than-human-intelligence systems, who happen to be misaligned with human well-being? Now, imagine that kind of optimization, not from a group of humans acting separately, but by an entity with a singular purpose, with an ever diminishing proportion of humans in the loop.
Audience: all, but maybe emphasizing policy makers
Thanks for pointing to ECL, this looks fascinating!
I think the state handling child rearing is the long term solution. The need for new people is a society wide problem and not ultimately one of personal responsibility. Of course people should still be free to do it on their own if they want. It'll be weird that not everyone will have traditional parents, but I think we can figure it out. Maybe a mandatory or highly incentivized big brother/ sister program would help make it more nurturing.