All of tangerine's Comments + Replies

I’m glad you asked. I completely agree that nothing in the current LLM architecture prevents that technically and I expect that it will happen eventually.

The issue in the near future is practicality, because training models is currently—and will in the near future still be—very expensive. Inference is less expensive, but still so expensive that profit is only possible by serving the model statically (i.e., without changing its weights) to many clients, which amortizes the cost of training and inference.

These clients often rely heavily on models being stati... (read more)

2Petropolitan
Training a LoRA has a negligible cost compared to pre-training a full model because it only involves changing 1.5% to 7% of the parameters (per https://ar5iv.labs.arxiv.org/html/2502.16894#A6.SS1) and only on thousands to millions of tokens instead of trillions. Inferencing different LoRAs for the same model in large batches with current technology is also very much possible (even if not without some challenges), and OpenAI offers their finetuned models for just 1.5-2x the cost of the original ones: https://docs.titanml.co/conceptual-guides/gpu_mem_mangement/batched_lora_inference You probably don't need continual learning for a tech support use-case. I suspect you might need it for a task so long that all the reasoning chain doesn't fit into your model's effective context length (which is shorter than the advertised one). On these tasks the inference is going to be comparatively costly just because of the test-time scaling required, and users might be incentivized by discounts or limited free use if they agree that their dialogs will be used for improving the model.

That's alright. Would you be able to articulate what you associate with AGI in general? For example, do you associate AGI with certain intellectual or physical capabilities, or do you associate it more with something like moral agency, personhood or consciousness?

Thank you for the clarification!

Of course, it is much more likely to be predictable a couple of days in advance than a year in advance, but even the former may conceivably be quite challenging depending on situational awareness of near-human-level models in training.

Do I understand correctly that you think that we are likely to only recognize AGI after it has been built? If so, how would we recognize AGI as you define it?

Do you also think that AGI will result in a fast take-off?

2Cole Wyeth
I don’t think I have anything better than a guess about any of those questions. 

What would you expect the world to look like if AGI < 2030? Or put another way, what evidence would convince you that AGI < 2030?

8Cole Wyeth
Unfortunately, those are two importantly different questions. While it is certainly possible that AGI will arrive before 2030, I am not sure that it would be predictable in advance without a "textbook of the (far) future" on deep learning. Of course, it is much more likely to be predictable a couple of days in advance than a year in advance, but even the former may conceivably be quite challenging depending on situational awareness of near-human-level models in training.  In many respects, similar to how it does look (e.g. AI passes the Turing test pretty easily / convincingly, is competent at some coding tasks, etc. etc.). It's more "interesting" to discuss the departures: * Claude would probably be better at Pokemon * Maybe self-driving vehicles are more common / robust * I think the METR plot would look different. I don't think task length should be the bottleneck. I think that indicates that something isn't scalable. But I am not sure. * There would perhaps be a few novel insights or discoveries from LLMs in a few domains where they have an advantage over us. I am not convinced this is true and in fact absence of evidence is evidence of absence: https://www.lesswrong.com/posts/GADJFwHzNZKg2Ndti/have-llms-generated-novel-insights * I would weakly expect to find AI a little more useful, but more strongly, I would expect to be finding it increasingly useful over the last couple of years, and I don't. * "Hallucinations" (I think @Abram Demski has a better term I can't remember) would be easier to get rid of. Each point is capable of shifting my opinion to a greater or lesser degree. Mainly seeing novel insights would change my mind. If METR's results hold up that will also change my mind.

What do you make of feral children like Genie? While there are not many counterfactuals to cultural learning—probably mostly because depriving children of cultural learning is considered highly immoral—feral children do provide strong evidence that humans that are deprived of cultural learning do not come close to being functional adults. Additionally, it seems obvious that people who do not receive certain training, e.g., those who do not learn math or who do not learn carpentry, generally have low capability in that domain.

the genetic changes come first,

... (read more)
2Noosphere89
In a sense, evolution never stops, but yes the capacity to make tools was way later than the physical optimizations that used the tools. More generally, an even earlier force for bigger brains probably comes from both cooking food and social modeling in groups, but yes language and culture do help at the margin.   I actually think this is reasonably strong evidence for some form of cultural learning, thanks.

How do LLMs and the scaling laws make you update in this way? They make me update in the opposite direction. For example, I also believe that the human body is optimized for tool use and scaling, precisely because of the gene-culture coevolution that Henrich describes. Without culture, this optimization would not have occurred. Our bodies are cultural artifacts.

Cultural learning is an integral part of the scaling laws; the scaling laws show that indefinitely scaling the number of parameters in a model doesn't quite work; the training data also has to scale... (read more)

2Noosphere89
I think the crux is that the genetic changes come first, then the cultural changes come after, and the reason is the modern human body plan happened around 200,000 BC, and arguably could be drawn even earlier, whereas the important optimizations that culture introduced came over 100,000 years at the latest. That said, fair enough on calling me out on the scaling laws point, I was indeed not able to justify my update. As far as why I'm generally skeptical of cultural learning mattering nearly as much as Heinrich, it's because I believe a lot of the evidence for this fundamentally doesn't exist for human cultures, and a lot of work in the 1940s-1960s around lots of subjects that were relevant to cultural learning were classified and laundered to produce simulated results: https://www.lesswrong.com/posts/m8ZLeiAFrSAGLEvX8/the-present-perfect-tense-is-ruining-your-life#MFyWDjh4FomyLnXDk

Current LLMs can only do sequential reasoning of any kind by adjusting their activations, not their weights, and this is probably not enough to derive and internalize new concepts à la C.

For me this is the key bit which makes me update towards your thesis.

What makes you (and the author) think ML practitioners won't start finetuning/RL'ing on partial reasoning traces during the reasoning itself if that becomes necessary? Nothing in the current LLM architecture prevents that technically, and IIRC Gwern has stated he expects that to happen eventually

3p.b.
I think this inability of "learning while thinking" might be the key missing thing of LLMs and I am not sure "thought assessment" or "sequential reasoning" are not red herrings compared to this. What good is assessment of thoughts if you are fundamentally limited in changing them? Also, reasoning models seem to do sequential reasoning just fine as long as they already have learned all the necessary concepts. 

This is indeed an interesting sociological breakdown of the “movement”, for lack of a better word.

I think the injection of the author’s beliefs about whether or not short timelines are correct distracting from the central point. For example, the author states the following.

there is no good argument for when [AGI] might be built.

This is a bad argument against worrying about short timelines, bordering on intellectual dishonesty. Building anti-asteroid defenses is a good idea even if you don’t know that one is going to hit us within the next year.

The argument... (read more)

Ah, I see now. Thank you! I remember reading this discussion before and agree with your viewpoint that he is still directionally correct.

3Noosphere89
While he is probably directionally correct, I think it should be way weaker than people think, and I now believe that something like the human body being optimized for tool use/the scaling hypothesis applied to evolution is my broad explanation of how humans have rocketed away from animals, and I now think cultural learning is way, way less powerful in humans than people thought, and this is partially updating from LLMs, which have succeeded but so far have nowhere near the capabilities that humans have (yet), and there's an argument to be made that cultural learning was the majority of AI capabilities before o1/o3.

he apparently faked some of his evidence

Would be happy to hear more about this. Got any links? A quick Google search doesn’t turn up anything.

7Noosphere89
This was the faked evidence here: For simplicity let’s concentrate on the seal hunting description. I don’t know enough about Inuit techniques to critique the details, but instead of aiming for a fair description, it’s clear that Henrich’s goal is to make the process sound as difficult to achieve as possible. But this is just slight of hand: the goal of the stranded explorer isn’t to reproduce the exact technique of the Inuit, but to kill seals and eat them. The explorer isn’t going to use caribou antler probes or polar bear harpoon tips — they are going to use some modern wood or metal that they stripped from their ice bound ship. Then we hit “Now you have a seal, but you have to cook it.” What? The Inuit didn’t cook their seal meat using a soapstone lamp fueled with whale oil, they ate it raw! At this point, Henrich is not just being misleading, he’s making it up as he goes along. At this point I start to wonder if part about the antler probe and bone harpoon head are equally fictional. I might be wrong, but beyond this my instinct is to doubt everything that Henrich argues for, even if (especially if) it’s not an area where I have familiarity
tangerine*10

You talk about personhood in a moral and technical sense, which is important, but I think it’s important to also take into account the legal and economic senses of personhood. Let me try to explain.

I work for a company where there’s a lot of white-collar busywork going on. I’ve come to realize that the value of this busywork is not so much the work itself (indeed a lot of it is done by fresh graduates and interns with little to no experience), but the fact that the company can bear responsibility for the work due to its somehow good reputation (something s... (read more)

Reminds me of mimetic desire:

Man is the creature who does not know what to desire, and he turns to others in order to make up his mind. We desire what others desire because we imitate their desires.

However, I only subscribe to this theory insofar as it is supported by Joseph Henrich's work, e.g., The Secret of Our Success, in which Henrich provides evidence that imitation (including imitation of desires) is the basis of human-level intelligence. (If you’re curious how that works, I highly recommend Scott Alexander’s book review.)

2Noosphere89
I don't go as far as Henrich, and he apparently faked some of his evidence, but I believe he is directionally correct on imitation mattering: https://slatestarcodex.com/2019/06/11/highlights-from-the-comments-on-cultural-evolution/

But we already knew that some people think AGI is near and others think it's farther away!

And what do you conclude based on that?

I would say that as those early benchmarks ("can beat anyone at chess", etc.) are achieved without producing what "feels like" AGI, people are forced to make their intuitions concrete, or anyway reckon with their old bad operationalizations of AGI.

The relation between the real world and our intuition is an interesting topic. When people’s intuitions are violated (e.g., the Turing test is passed but it doesn’t “feel like” AGI), th... (read more)

tangerine*32

They spend more time thinking about the concrete details of the trip, not because they know the trip is happening soon, but because some think the trip is happening soon. Disagreement on and attention to concrete details is driven by only some people saying that the current situation looks like, or is starting to look like the event occurring according to their interpretation. If the disagreement had happened at the beginning, they would soon have started using different words.

In the New York example, it could be that when someone says “Guys, we should rea... (read more)

3Erich_Grunewald
I guess I don't understand what focusing on disagreements adds. Sure, in this situation, the disagreement stems from some people thinking the trip is near (and others thinking it's farther away). But we already knew that some people think AGI is near and others think it's farther away! What does observing that people disagree about that stuff add? Yeah, I would say that as those early benchmarks ("can beat anyone at chess", etc.) are achieved without producing what "feels like" AGI, people are forced to make their intuitions concrete, or anyway reckon with their old bad operationalizations of AGI. And that naturally leads to lots of discussion around what actually constitutes AGI. But again, all this is evidence of is that those early benchmarks have been achieved without producing what "feels like" AGI. But we already knew that.
tangerine100

The amount of contention says something about whether an event occurred according to the average interpretation. Whether it occurred according to your specific interpretation depends on how close that interpretation is to the average interpretation.

You can't increase the probability of getting a million dollars by personally choosing to define a contentious event as you getting a million dollars.

tangerine*32

I wouldn’t call either hypothesis invalid. People just use the same words to refer to different things. This is true for all words and hypotheses to some degree. When there is little to no contention that we’re not in New York, or that we don’t have AGI, or that the Second Coming hasn’t happened, then those differences are not apparent. But presumably there is some correlation between the different interpretations, such that when the Event does take place, contention rises to a degree that increases as that correlation decreases[1]. (Where by Event I mean ... (read more)

You’re kind of proving the point; the Second Coming is so vaguely defined that it might as well have happened. Some churches preach this.

If the Lord Himself did float down from Heaven and gave a speech on Capitol Hill, I bet lots of Christians would deride Him as an impostor.

7gwern
Specifically, as an antichrist, as the Gospels specifically warn that "false messiahs and false prophets will appear and produce great signs and omens", among other things. (And the position that the second coming has already happened - completely, not merely partially - is hyperpreterism.)
5leogao
suppose I believe the second coming involves the Lord giving a speech on capitol hill. one thing I might care about is how long until that happens. the fact that lots of people disagree about when the second coming is doesn't mean the Lord will give His speech soon. similarly, the thing that I define as AGI involves AIs building Dyson spheres. the fact that other people disagree about when AGI is doesn't mean I should expect Dyson spheres soon.
1Noosphere89
The actual Bayesian response would be for both the AGI case and the Second Coming case is that both hypotheses are invalid from the start due to underspecification, so any probability estimates/decision making for utility for these hypotheses are also invalid.

Thank you for the reply!

I’ve actually come to a remarkably similar conclusion as described in this post. We’re phrasing things differently (I called it the “myth of general intelligence”), but I think we’re getting at the same thing. The Secret of Our Success has been very influential on my thinking as well.

This is also my biggest point of contention with Yudkowsky’s views. He seems to suggest (for example, in this post) that capabilities are gained from being able to think well and a lot. In my opinion he vastly underestimates the amount of data/experienc... (read more)

3Noosphere89
I think the Secrets of our Success goes too far, and I'm less willing to rely on it than you, but I do think it got at least a significant share of how humans learn right (like 30-50% at minimum).

Entities that reproduce with mutation will evolve under selection. I'm not so sure about the "natural" part. If AI takes over and starts breeding humans for long floppy ears, is that selection natural?

In some sense, all selection is natural, since everything is part of nature, but an AI that breeds humans for some trait can reasonably be called artificial selection (and mesa-optimization). If such a breeding program happened to allow the system to survive, nature selects for it. If not, it tautologically doesn’t. In any case, natural selection still applie... (read more)

I don't know how selection pressures would take hold exactly, but it seems to me that in order to prevent selection pressures, there would have to be complete and indefinite control over the environment. This is not possible because the universe is largely computationally irreducible and chaotic. Eventually, something surprising will occur which an existing system will not survive. Diverse ecosystems are robust to this to some extent, but that requires competition, which in turn creates selection pressures.

tangerine1412

humans are general because of the data, not the algorithm

Interesting statement. Could you expand a bit on what you mean by this?

1a3orn237

So the story goes like this: there are two ways people think of "general intelligence." Fuzzy frame upcoming that I do not fully endorse.

  1. General Intelligence = (general learning algorithm) + (data)
  2. General Intelligence = (learning algorithm) + (general data)

It's hard to describe all the differences here, so I'm just going to enumerate some ways people approach the world differently, depending on the frame.

  1. Seminal text for the first The Power of Intelligence, which attributes general problem solving entirely to the brain. Seminal text for the secon

... (read more)

You cannot in general pay a legislator $400 to kill a person who pays no taxes and doesn't vote.

Indeed not directly, but when the inferential distance increases it quickly becomes more palatable. For example, most people would rather buy a $5 T-shirt that was made by a child for starvation wages on the other side of the world, instead of a $100 T-shirt made locally by someone who can afford to buy a house with their salary. And many of those same T-shirt buyers would bury their head in the sand when made aware of such a fact.

If I can tell an AI to increase... (read more)

7lukedrago
I agree. To add an example: the US government's 2021 expanded child tax credit lifted 3.7 million children out of poverty, a near 50% reduction. Moreover, according to the NBER's initial assessment: "First, payments strongly reduced food insufficiency: the initial payments led to a 7.5 percentage point (25 percent) decline in food insufficiency among low-income households with children. Second, the effects on food insufficiency are concentrated among families with 2019 pre-tax incomes below $35,000".  Despite this, Congress failed to renew the program. Predictably, child poverty spiked the following year. I don't have an estimate for how many lives this cost, but it's greater than zero.

Unfortunately, democracy itself depends on the economic and military relevance of masses of people. If that goes away, the iceberg will flip and the equilibrium system of government won't be democracy.

Agreed. The rich and powerful could pick off more and more economically irrelevant classes while promising the remaining ones the same won't happen to them, until eventually they can get everything they need from AI and live in enclaves protected by vast drone armies. Pretty bleak, but seems like the default scenario given the current incentives.

It seems real

... (read more)
tangerine110

Excellent post. This puts into words really well some thoughts that I have had.

I would also like to make an additional point: it seems to me that a lot of people (perhaps less so on LessWrong) hold the view that humanity has somehow “escaped” the process of evolution by natural selection, since we can choose to do a variety of things that our genes do not “want”, such as having non-reproductive sex. This is wrong. Evolution by natural selection is inescapable. When resources are relatively abundant, which is currently true for many Western nations, it can ... (read more)

7jbash
Entities that reproduce with mutation will evolve under selection. I'm not so sure about the "natural" part. If AI takes over and starts breeding humans for long floppy ears, is that selection natural? Bear in mind that in that scenario the AIs may not choose to let the humans breed to anywhere near the limits of the available resources no matter how good their ears are. If there's resource competition, it may be among the AIs themselves (assuming there's more than one AI running to begin with). But there won't necessarily be more than one AI, at least not in the sense of multiple entities that may be pursuing different goals or reproducing independently. And even if there are, they won't necessarily reproduce by copying with mutation, or at least not with mutation that's not totally under their control with all the implications understood in advance. They may very well be able prevent evolution from taking hold among themselves. Evolution is optional for them. So you can't be sure that they'll expand to the limits of the available resources.
1lukedrago
Glad you enjoyed it!  Could you elaborate on your last paragraph? Presuming a state overrides its economic incentives (ie establishes a robust post-AGI welfare system), I'd like to see how you think the selection pressures would take hold. For what it's worth, I don't think "utopian communism" and/or a world without human agency are good. I concur with Rudolf entirely here -- those outcomes miss agency what has so far been an core part of the human experience. I want dynamism to exist, though I'm still working on if/how I think we could achieve that. I'll save that for a future post.
tangerine*51

Very interesting write-up! When you say that orcas could be more intelligent than humans, do you mean something similar to them having a higher IQ or g factor? I think this is quite plausible.

My thinking has been very much influenced by Joseph Henrich's The Secret of Our Success, which you mentioned. For example, looking at the behavior of feral (human) children, it seems quite obvious to me now that all the things that humans can do better than other animals are all things that humans imitate from an existing cultural “reservoir” so to speak and that an i... (read more)

I agree with this view. Deep neural nets trained with SGD can learn anything. (“The models just want to learn.”) Human brains are also not really different from brains of other animals. I think the main struggles are 1. scaling up compute, which follows a fairly predictable pattern, and 2. figuring out what we actually want them to learn, which is what I think we’re most confused about.

My introduction to Dennett, half a lifetime ago, was this talk: 

That was the start of his profound influence on my thinking. I especially appreciated his continuous and unapologetic defense of the meme as a useful concept, despite the many detractors of memetics.

Sad to know that we won't be hearing from him anymore.

Yes. My bad, I shouldn’t have implied all hidden-variables interpretations.

Every non-deterministic interpretation has a virtually infinite Kolmogorov complexity because it has to hardcode the outcome of each random event.

Hidden-variables interpretations are uncomputable because they are incomplete.

1lemonhope
Are they complete if you include the hidden variables? Maybe I'm misunderstanding you.
tangerine2818

It’s the simplest explanation (in terms of Kolmogorov complexity).

It’s also the interpretation which by far has the most elegant explanation for the apparent randomness of reality. Most interpretations provide no mechanism for the selection of a specific outcome, which is absurd. Under the MWI, randomness emerges from determinism through indexical uncertainty, i.e., not knowing which branch you’re in. Some people, such as Sabine Hossenfelder for example, get confused by this and ask, “then why am I this version of me?”, which implicitly assumes dualism, as... (read more)

1martinkunev
I would add that questions such as “then why am I this version of me?” only show we're generally confused about anthropics. This is not something specific about many worlds and cannot be an argument against it.
-1TAG
"it" isn't a single theory. The argument that Everettian MW is favoured by Solomonoff induction, is flawed. If the program running the SWE outputs information about all worlds on a single output tape, they are going to have to be concatenated or interleaved somehow. Which means that to make use of the information, you gave to identify the subset of bits relating to your world. That's extra complexity which isn't accounted for because it's being done by hand, as it were..
4titotal
  Do you have proof of this? I see this stated a lot, but I don't see how you could know this when certain aspects of MWI theory (like how you actually get the Born probabilities) are unresolved. 
0lemonhope
Hmm I think I can implement pilot wave in fewer lines of C than I can many-worlds. Maybe this is a matter of taste... or I am missing something? I thought pilot wave's explanation was (very roughly) "of course you cannot say which way the particle will go because you cannot accurately measure it without moving it" plus roughly "that particle is bouncing around a whole lot on its wave, so its exact position when it hits the wall will look random". I find this quite elegant, but that's also a matter of taste perhaps. If this oversimplification is overtly wrong then please tell me.

It’s just a matter of definition. We say that “you” and “I” are the things that are entangled with a specific observed state. Different versions of you are entangled with different observations. Nothing is stopping you from defining a new kind of person which is a superposition of different entanglements. The reason it doesn’t “look” that way from your perspective is because of entanglement and the law of the excluded middle. What would you expect to see if you were a superposition?

0TAG
If I were in a coherent superposition, I would expect to see non classical stuff. Entanglement alone is not enough to explain my sharp-valued, quasi classical observations. It isn't just a matter of definition, because I don't perceive non classical stuff, so I lack motivation to define "I" in way that mispredicts that I do. You don't get to arbitrarily relabel things if you are in the truth seeking business. The objection isn't to using "I" or "the observer" to label a superposed bundle of sub-persons , each of which individually is unaware of the others and has normal, classical style experience, because that doesn't mispredict my experience. There problem is that "super posed bundle of persons , each of which is unaware of the others and has normal, classical style experience" is what you get from a decoherent superposition, and I am specifically talking about coherent superposition. ("In the Everett theory, everything that starts in a coherent superposition, stays in one.") Decoherence was introduced precisely to solve the problem with Everett's RSI.

Have you read Joseph Henrich’s books The Secret of Our Success, and its sequel The WEIRDest People in the World? If not, they provide a pretty comprehensive view of how humanity innovates and particularly the Western world, which is roughly in line with what you wrote here.

I kind of agree that most knowledge is useless, but the utility of knowledge and experience that people accrue is probably distributed like a bell curve, which means you can't just have more of the good knowledge without also accruing lots of useless knowledge. In addition, very often stuff that seems totally useless turns out to be very useful; you can't always tell which is which.

2TAG
I assume it isn't always like a bell curve, because smaller and poorer societies can't afford the deadweight of useless knowledge.

I completely agree. In Joseph Henrich’s book The Secret of Our Success, he shows that the amount of knowledge possessed by a society is proportional to the number of people in that society. Dwindling population leads to dwindling technology and dwindling quality of life.

Those who advocate for population decline are unwittingly advocating for the disappearance of the knowledge, experience and frankly wisdom that is required to keep the comfortable life that they take for granted going.

Keeping all that knowledge in books is not enough. Otherwise our long years in education would be unnecessary. Knowing how to apply knowledge is its own form of knowledge.

-4TAG
Most knowledge is useless. Many people have heads filled with sport results and entertainment trivia. 50 years ago, people used to fix their own cars and make their own clothes.

If causality is everywhere, it is nowhere; declaring “causality is involved” will have no meaning. It begs the question whether an ontology containing the concept of causality is the best one to wield for what you’re trying to achieve. Consider that causality is not axiomatic, since the laws of physics are time-reversible.

2silentbob
A basic operationalization of "causality is everywhere" is "if we ran an RCT on some effect with sufficiently many subjects, we'd always reach statistical significance" - which is an empirical claim that I think is true in "almost" all cases. Even for "if I clap today, will it change the temperature in Tokyo tomorrow?". I think I get what you mean by "if causality is everywhere, it is nowhere" (similar to "a theory that can explain everything has no predictive power"), but my "causality is everyhwere" claim is an at least in theory verifiable/falsifiable factual claim about the world. Of course "two things are causally connected" is not at all the same as "the causal connection is relevant and we should measure it / utilize it / whatever". My basic point is that assuming that something has no causal connection is almost always wrong. Maybe this happens to yield appropriate results, because the effect is indeed so small that you can simply act as if there was no causal connection. But I also believe that the "I believe X and Y have no causal connection at all" world view leads to many errors in judgment, and makes us overlook many relevant effects as well.
7Viliam
Instead of "is there causality?" we should ask "how much causality is there?". The closest analogy to the old question would be "is there enough causality (for my purpose)?". If drinking water improves my mood by 0.0001%, then drinking water is not a cost-effective way to improve my mood. I am not denying that there is a connection, I am just saying it does not make sense for me to act on it.
tangerine*1-3

I respect Sutskever a lot, but if he believed that he could get an equivalent world model by spending an equivalent amount of compute learning from next-token prediction using any other set of real-world data samples, why would they go to such lengths to specifically obtain human-generated text for training? They might as well just do lots of random recordings (e.g., video, audio, radio signals) and pump it all into the model. In principle that could probably work, but it’s very inefficient.

Human language is a very high density encoding of world models, so... (read more)

In theory, yes, but that’s obviously a lot more costly than running just one instance. And you’ll need to keep these virtual researchers running in order to keep the new capabilities coming. At some point this will probably happen and totally eclipse human ability, but I think the soft cap will slow things down by a lot (i.e., no foom). That’s assuming that compute and the number of researchers even is the bottleneck to new discoveries; it could also be empirical data.

If you accept the premise of AI remaining within the human capability range in some broad sense, where it brings great productivity improvements and rewards those who use it well but remains foundationally a tool and everything seems basically normal, essentially the AI-Fizzle world, then we have disagreements

There is good reason to believe that AI will have a soft cap at roughly human ability (and by “soft cap” I mean that anything beyond the cap will be much harder to achieve) for the same reason that humans have a soft cap at human ability: copying exis... (read more)

1quetzal_rainbow
You can bypass this by just running 1000 instances of imitating-genius-ML-researchers AI.
4angmoh
Sutskever's response to Dwarkesh in their interview was a convincing refutation of this argument for me: Dwarkesh Patel So you could argue that next-token prediction can only help us match human performance and maybe not surpass it? What would it take to surpass human performance? Ilya Sutskever I challenge the claim that next-token prediction cannot surpass human performance. On the surface, it looks like it cannot. It looks like if you just learn to imitate, to predict what people do, it means that you can only copy people. But here is a counter argument for why it might not be quite so. If your base neural net is smart enough, you just ask it — What would a person with great insight, wisdom, and capability do? Maybe such a person doesn't exist, but there's a pretty good chance that the neural net will be able to extrapolate how such a person would behave. Do you see what I mean? Dwarkesh Patel Yes, although where would it get that sort of insight about what that person would do? If not from… Ilya Sutskever From the data of regular people. Because if you think about it, what does it mean to predict the next token well enough? It's actually a much deeper question than it seems. Predicting the next token well means that you understand the underlying reality that led to the creation of that token. It's not statistics. Like it is statistics but what is statistics? In order to understand those statistics to compress them, you need to understand what is it about the world that creates this set of statistics? And so then you say — Well, I have all those people. What is it about people that creates their behaviors? Well they have thoughts and their feelings, and they have ideas, and they do things in certain ways. All of those could be deduced from next-token prediction. And I'd argue that this should make it possible, not indefinitely but to a pretty decent degree to say — Well, can you guess what you'd do if you took a person with this characteristic and that charac

The European socket map is deceptive. My charger will work anywhere on mainland Europe. Looking at the sockets, can you tell why?

2[anonymous]
I take it the pin size, spacing, and obviously voltage are the same for all the mainland, so long as you don't need a ground?

Does this count as “rational, deliberate design”? I think a case could be made for both yes and no, but I lean towards no. Humans who have studied a certain subject often develop a good intuition for what will work and what won’t and I think deep learning captures that; you can get right answers at an acceptable rate without knowing why. This is not quite rational deliberation based on theory.

3the gears to ascension
But it shows that you don't necessarily need to rely strictly on experimentation! Certainly it still relies on it, but humans have been doing this sort of thing for a while. While I agree it's the case that people still have to do a lot of experiments historically, it's quite possible to have very detailed sketches of what can and can't be done.
tangerine3222

I think that “rational, deliberate design”, as you put it, is simply far less common (than random chance) than you think; that the vast majority of human knowledge is a result of induction instead of deduction; that theory is overrated and experimentalism is underrated.

This is also why I highly doubt that anything but prosaic AI alignment will happen.

-2the gears to ascension
https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning/
4dkirmani
Yeah. Here's an excerpt from Antifragile by Taleb:

I don't think I disagree with what you're saying here, though we may be using different terms to say the same thing.

How does what you say here inform your thoughts about the Hard Problem?

tangerine*169

Regarding taking hints, the other gender typically does not see all the false positives one has to deal with. What seems obvious is usually not obvious at all. In fact, a socially skilled person will always try to use plausibly deniable (i.e., not-obvious) signals and will consider anything more a gauche faux pas. Acting on such signals is therefore inherently risky and is nowadays perhaps considered more risky than it used to be, especially at work and around close friends.

For example, a few years ago, a woman I had great rapport with called me her Valent... (read more)

All I’m asking for is a way for other people to determine whether a given explanation will satisfy you. You haven’t given enough information to do that. Until that changes we can’t know that we even agree on the meaning of the Hard Problem.

2TAG
The meaning of the Hard Problem doesn't depend on satisfying me, since I didn't invent it. If you want to find out what it is, you need to read Chalmers at some point.

Also., the existence of a problem does not depend on the existence of a solution.

Agreed, but even if no possible solution can ultimately satisfy objective properties, until those properties are defined the problem itself remains undefined. Can you define these objective properties?

2TAG
Weve been through this. 1. You don't have a non circular argument that everything is objective 2. It can be an objective fact that subjectivity exists.

I know. Like I said, neither Chalmers nor you or anyone else have shown it plausible that subjective experience is non-physical. Moreover, you repeatedly avoid giving an objective description what you’re looking for.

Until either of the above change, there is no reason to think there is a Hard Problem.

2TAG
Like.I said, I don't have to justify non physicalism when that is not.what the discussion is about. Also., the existence of a problem does not depend on the existence of a solution.

Chalmers takes hundreds of pages to set out his argument.

His argument does not bridge that gap. He, like you, does not provide objective criteria for a satisfying explanation, which means by definition you do not know what the thing is that requires explanation, no matter how many words are used trying to describe it.

2TAG
The discussion was about whether there is a Hard Problem , not whether Chalmers or I have solved it.

The core issue is that there’s an inference gap between having subjective experience and the claim that it is non-physical. One doesn’t follow from the other. You can define subjective experience as non-physical, as Chalmer’s definition of the Hard Problem does, but that’s not justified. I can just as legitimately define subjective experience as physical.

I can understand why Chalmers finds subjective experience mysterious, but it’s not more mysterious than the existence of something physical such as gravity or the universe in general. Why is General Relativity enough for you to explain gravity, even though the reason for the existence of gravity is mysterious?

2TAG
Of course there is. There is no reason there should not be. Who told you otherwise? Chalmers takes hundreds of pages to set out his argument. Physical reductionism is compatible with the idea that the stuff at the bottom of the stack is irreducible, but consciousness appears to be a high level phenomenon.

Let’s say the Hard Problem is real. That means solutions to the Easy Problem are insufficient, i.e., the usual physical explanations.

But when we speak about physics, we’re really talking about making predictions based on regularities in observations in general. Some observations we could explain by positing the force of gravity. Newton himself was not satisfied with this, because how does gravity “know” to pull on objects? Yet we were able to make very successful predictions about the motions of the planets and of objects on the surface of the Earth, so we... (read more)

You say you see colors and have other subjective experiences and you call those qualia and I can accept that, but when I ask why solutions to the Easy Problem wouldn’t be sufficient you say it’s because you have subjective experiences, but that’s circular reasoning. You haven’t said why exactly solutions to the Easy Problem don’t satisfy you, which is why I keep asking what kind of explanation would satisfy you. I genuinely do not know, based on what you have said. It doesn’t have to be scientific.

If we are talking about scientific explanation: a scienti

... (read more)
2TAG
No it's because the Easy Problem is , by definition ,everything except subjective experience. Its [consciousness-experience] explained [however], not [consciousness] explained [physically]. It happens to be the case that easy problems can be explained physically, but its not built into the definition. Because I've read the passages where Chalmers defines the Easy/ Hard distinction. "What makes the hard problem hard and almost unique is that it goes beyond problems about the performance of functions. To see this, note that even when we have explained the performance of all the cognitive and behavioral functions in the vicinity of experience—perceptual discrimination, categorization, internal access, verbal report—there may still remain a further unanswered question: Why is the performance of these functions accompanied by experience? (1995, 202, emphasis in original)." See? It's not defined in terms of physicality! Have you even read that passage before? ...an EP explanation isn't it even trying to be an explanation of X for X=qualia. Only by lowering the bar. Of course not ...you can't even express them. I'm a colour blind super scientist , what is this Red? Unfortunately , that's what "predict novel experiences" means. CF other areas of science: you don't get Nobels for saying "I predict some novel effect I cant describe or quantify". The problem isnt that you don't have infinite information, it's that you are not reaching the base line of every other scientific theory, because "novel qualia , don't ask me what" isn't a meaningful prediction. Not in a good enough way, you can't. Then you will again object and say, “but that doesn’t explain subjective experience”. And so on. It looks to me like you’re holding out for something you don’t know how to recognize. True, maybe an explanation is impossible, but you don’t know that either. When some great genius finally does explain it all, how will you know he’s right? You wouldn’t want to miss out, right? They
Load More