'Empiricism!' as Anti-Epistemology

Eliezer Yudkowsky

171 'Empiricism!' as Anti-Epistemology

by Eliezer Yudkowsky

14th Mar 2024

30 min read

171

(Crossposted by habryka after asking Eliezer whether I could post it under his account)

i.

"Ignore all these elaborate, abstract, theoretical predictions," the Spokesperson for Ponzi Pyramid Incorporated said in a firm, reassuring tone. "Empirically, everyone who's invested in Bernie Bankman has received back 144% of what they invested two years later."

"That's not how 'empiricism' works," said the Epistemologist. "You're still making the assumption that --"

"You could only believe that something different would happen in the future, if you believed in elaborate theoretical analyses of Bernie Bankman's unobservable internal motives and internal finances," said the spokesperson for Ponzi Pyramid Incorporated. "If you are a virtuous skeptic who doesn't trust in overcomplicated arguments, you'll believe that future investments will also pay back 144%, just like in the past. That's the prediction you make if you predict based purely on empirical observations, instead of theories about a future nobody has seen!"

"That's not how anything works," said the Epistemologist. "Every future prediction has a theory connecting it to our past observations. There's no such thing as going from past observations directly to future predictions, with no theory, no assumptions, to cross the gap --"

"Sure there's such a thing as a purely empirical prediction," said the Ponzi spokesperson. "I just made one. Not to mention, my dear audience, are you really going to trust anything as complicated as epistemology?"

"The alternative to thinking about epistemology is letting other people do your thinking about it for you," said the Epistemologist. "You're saying, 'If we observe proposition X "past investors in the Ponzi Pyramid getting paid back 144% in two years", that implies prediction Y "this next set of investors in the Ponzi Pyramid will get paid back 144% in two years"'. X and Y are distinct propositions, so you must have some theory saying 'X -> Y' that lets you put in X and get out Y."

"But my theory is empirically proven, unlike yours!" said the Spokesperson.

"...nnnnoooo it's not," said the Epistemologist. "I agree we've observed your X, that past investors in the Ponzi Pyramid got 144% returns in 2 years -- those investors who withdrew their money instead of leaving it in to accumulate future returns, that is, not quite all investors. But just like prediction Y of 'the next set of investors will also receive 144% in 2 years' is not observed, the connecting implication 'if X, then Y' is not yet observed, just like Y itself is not observed. When you go through the step 'if observation X, then prediction Y' you're invoking an argument or belief whose truth is not established by observation, and hence must be established by some sort of argument or theory. Now, you might claim to have a better theoretical argument for 'X -> Y' over 'X -> not Y', but it would not be an empirical observation either way."

"You say words," replied the Spokesperson, "and all I hear are -- words words words! If you instead just look with your eyes at past investors in the Ponzi Pyramid, you'll see that every one of them got back 144% of their investments in just two years! Use your eyes, not your ears!"

"There's a possible theory that Bernie Bankman is making wise investments himself, and so multiplying invested money by 1.2X every year, then honestly returning that money to any investor who withdraws it," said the Epistemologist. "There's another theory which says that Bernie Bankman has been getting more money invested every year, and is using some of the new investments to pay back some fraction of previous investors who demanded their money back --"

"Why would Bernie Bankman do that, instead of taking all the money right away?" inquired the Spokesperson. "If he's as selfish and as greedy and dishonest as you say, wouldn't he just keep the money?"

"So that he could get even more money from new investors, attracted by seeing his previous investors paid off, of course," said the Epistemologist. "And realistically, so that Bernie Bankman could maintain his comfortable present position in society and his current set of friends, as is often a greater motivator in human affairs than money."

"So we see Bernie Bankman giving people money -- that is what empiricism and observation tell us -- but you would tell people with your words and reasoning that Bernie Bankman is a greedy man who keeps all investments for himself? What a great divergence we see again between empirical observation, and elaborate unobservable theories!"

"We agree on what has already been observed of Bernie Bankman's outward behavior," said the Epistemologist. "When it comes to Bernie Bankman's unobserved interior thoughts -- your unobserved theory 'he is honest', is no more or less empirical or theoretical, than the unobserved theory 'he is scheming'. 'Honest' and 'scheming' are two possible values of a latent variable of the environment, a latent variable which cannot be directly observed, and must be inferred as the hidden cause of what we can observe. One value of the unseen variable is not more already-observed than another. The X->Y implication from the previous money-returning behavior we did observe, to Bernie Bankman's latent honesty or dishonesty, is likewise itself something we do not observe; the 'if you observe X, infer latent Y' step is something given to us by theory rather than observation."

"And furthermore," continued the Epistemologist, a touch of irritation now entering that voice, "I don't actually think it's all that complicated of a theory, to understand why Bernie Bankman would schemingly give back the money of the first few investors. The only reason why somebody would fail to understand this simple idea, is this person yelling at you that any alternative to blind surface generalization is 'theoretical' and 'not empirical'. Plenty of people would be able to understand this concept without dragging epistemology into it at all. Of course observing somebody giving back a small amount of money, doesn't prove they'll later give you back a large amount of money; there's more than one reason they could be behaving nicely around low stakes."

"The Epistemologist will give you words," said the Spokesperson to the watching audience. "Bernie Bankman gives you money! 144% returns in 2 years! Every scientist who's measured Bankman's behavior agrees that this is the empirical, already-observed truth of what will happen! Now, as a further proof that my opponent's claims are not just wrong, but unscientific, let me ask this -- do you, Epistemologist, claim with 100% probability that this next set of investors' investments, cannot be paid back two years from now?"

"That's not something I can know with certainty about the unobserved future," said the Epistemologist. "Even conditional on the 'scheming' hypothesis, I can't, actually, know that Ponzi Pyramid Incorporated will bust within 2 years specifically. Maybe you'll get enough new investors, or few enough of these investors will withdraw their funds, that this company will continue for another 2 years --"

"You see?" cried the Spokesperson. "Not only is this theory unsupported empirically, it is also unfalsifiable! For where I tell you with certainty that all your money will be repaid and more, 2 years hence -- this one claims that your money might or might not be repaid! Why, if Bernie Bankman repays 144% in 2 years yet again, what will this one say? Only that Ponzi Pyramid hasn't busted yet and that it might bust later! Can you ask for a better example of scientific vice, contrasted to my own scientific virtue? Observation makes a bold, clear, falsifiable statement, where elaborate predictions only waffle!"

"If a reasonable person would say that there's a 50% chance of the Ponzi Pyramid busting in two years," replied the Epistemologist wearily, "it is not more scientifically virtuous to say the chance is 0% instead, only because there is then a 50% chance of your claim turning out to be definitely false and you getting to say a scientifically virtuous 'oops' (if you'd even say it)."

"To give an even simpler example," continued the Epistemologist, "let's say we're flipping a coin that I think is fair, and you say is biased to produce 100% heads. Your theory stands a 50% chance of being falsified, whereas mine will not be falsified no matter what the coin shows -- but that doesn't mean that every time you pick up a coin on the street, it's the course of scientific virtue to decide the coin must be biased 100% heads. Being relatively easier to falsify is a convenient property for a belief to have, but that convenience is not the only important virtue of a belief, and not all true beliefs have it. All the distinct kinds of epistemic virtue must be kept distinct in our thoughts, or we will quite confuse ourselves."

"To give yet another example," added the Epistemologist, "let's say you're considering whether to run blindly toward the edge of a cliff. I might not be able to predict exactly how fast you'll run. So I won't be able to predict whether or not you'll already be falling, or dead, after five more seconds have passed. This does not mean that the theory 'I will fly and never die' should be seen as more reasonable or more scientific, merely because it makes a more certain claim about whether or not you'll be alive five seconds later."

"What an incredible set of excuses for having no definite predictions about what will happen two years later!" the Spokesperson said, smiling and mugging to the audience. "Believe your eyes! Believe in empiricism! Believe -- in Science! Believe, above all, in the definite factual observation: investors who invest in the Ponzi Pyramid get 144% of their money back after 2 years! All the rest is words words words and thinking."

ii.

"Hm," said a watching Scientist. "I see the force of your theoretical claims about epistemology, Epistemologist. But I cannot help but feel intuitively that there is something to this Spokesperson's words, too, even if they are not exactly logically correct according to your meta-theory. When we have observed so many previous investors getting 144% returns from Bernie Bankman's Ponzi Pyramid after 2 years, is there not some real sense in which it is more empirical to say the same thing will happen to future investors, and less empirical to say that a different thing will happen in the future instead? The former prediction seems to me to be more driven by the data we already have, and the latter prediction to be driven by something more like thinking and imagining. I see how both predictions must be predictions, from the standpoint of epistemology, and involve something like an assumption or a theory that connects the past to the future. But can we not say that the Spokesperson's predictions involve fewer assumptions and less theory and are more driven by looking at the data, compared to yours?"

"So to be clear," said the Epistemologist to the Scientist, "you are saying that the prediction which involves the fewest assumptions and the least theory, is that Bernie Bankman's Ponzi Pyramid will go on multiplying all investments by a factor of 1.2 every year, indefinitely, to the end of the universe and past it?"

"Well, no," said the Scientist. "We have only observed Bernie Bankman to multiply investments by 1.2 per year, in the present socioeconomic context. It would not be reasonable to extend out the observations to beyond that context -- to say that Bernie Bankman could go on delivering those returns after a global thermonuclear war, for example. To say nothing of after all the protons decay, and the black holes evaporate, and time comes to an end in a sea of chaos."

"I inquire of you," said the Epistemologist, "whether your belief that Bernie Bankman would stop delivering good returns after a thermonuclear war, is more theory-laden, less empirical, than a belief that Bernie Bankman goes on multiplying investments 1.2-fold forever. Perhaps your belief has other virtues that make it superior to the belief in 'eternal returns', as we might call them. But it is nonetheless the case that the 'eternal returns' theory has the advantage of being less theory-laden and more empirical?"

The Scientist frowned. "Hm. To be clear, I agree with you that the 'eternal returns' theory must be less correct -- but I'm not quite sure it feels right to call it more empirical -- to say that it has one sin and one virtue, like that..." The Scientist paused. "Ah, I have it! To say that Bernie Bankman would stop returning investments after a global thermonuclear war, I need to bring in my beliefs about nuclear physics. But those beliefs are themselves well-confirmed by observation, so to deny them to hold true about Bernie Bankman's Ponzi Pyramid would be most unempirical and unvirtuous." The Scientist smilled and nodded to himself.

"I put to you, then," said the Epistemologist, "that your prediction that Bernie Bankman would stop delivering good returns after a thermonuclear war, is indeed more 'theory-laden' in your intuitive sense, than the prediction that Bernie Bankman simply goes on delivering 1.2X returns forever. It is just that you happen to like the theories you are lading on, for reasons which include that you think they are full of delicious empiricist virtue."

"Could I not also say," said the Scientist, "that I have only observed the Ponzi Pyramid to deliver returns within a particular socioeconomic context, and so empiricism says to only generalize inside of the context that holds all my previous observations?"

The Epistemologist smiled. "I could just as easily say myself that such schemes often go through two phases, the part where he's scheming to take your money and the part where he actually takes it; and say from within my own theoretical stance that we ought not to generalize from the 'scheming to take your money' context to the 'actually taking it' context." The Epistemologist paused, then added, "Though to be precise about the object-level story, it's a tragic truth that many schemes like that start with a flawed person having a dumb but relatively more honest plan to deliver investment returns. It's only after their first honest scheme fails, that as an alternative to painful confession, they start concealing the failure and paying off early investors with later investors' money -- sometimes telling themselves the whole while that they mean to eventually pay off everyone, and other times having explicitly switched to being con artists. Others, of course, are con artists from the beginning. So there may be a 'naive' phase that can come before the 'concealment' phase or the 'sting' phase... but I digress." The Epistemologist shook his head, returning to the previous topic. "My point is, my theory could be viewed as specializing our past observations to within a context, just like your theory does; and yet my theory yields a different prediction from yours, because it advocates a different contextualization of the data. There is no non-theory-laden notion of a 'context'."

"Are you sure you're not complicating something that doesn't need to be complicated?" said the Scientist. "Why not just say that every observation ought to only be generalized within the obvious context, the sort you can itself construct without any theories about unobservables like Bernie Bankman's state of mind or Ponzi Pyramid's 'true' balance sheet?"

"Look," said the Epistemologist, "some troll can waltz in anytime and say, 'All your observations of electron masses took place before 2025; you've got no call generalizing those observations to the context of "after 2025"'. You don't need to invent anything unobservable to construct that context -- we've previously seen solar years turn -- and yet introducing that context-dependency is a step I think we'd both reject. Applying a context is a disputable operation. You're not going to find some simple once-and-for-all rule for contexts that lets you never need to dispute them, no matter how you invoke swear-words like 'obvious'. You sometimes need to sit down and talk about where and how it's appropriate to generalize the observations you already have."

"Suppose I say," said the Scientist, "that we ought to only contextualize our empirical observations, in ways supported by theories that are themselves supported by direct observations --"

"What about your earlier statement that we shouldn't expect Bernie Bankman to go on delivering returns after all the protons decay?" said the Epistemologist. "As of early 2024 nobody's ever seen a proton decay, so far as I know; not even in the sense of recording an observation from which we infer the event."

"Well," said the Scientist, "but the prediction that protons decay is a consequence of the simplest equations we've found that explain our other observations, like observing that there's a predominance of matter over antimatter --"

The Epistemologist shrugged. "So you're willing to predict that Bernie Bankman suddenly stops delivering returns at some point in the unobserved future, based on your expectation of a phenomenon you haven't yet seen, but which you say is predicted by theories that you think are good fits to other phenomena you have seen? Then in what possible sense can you manage to praise yourself as being less 'theory-laden' than others, once you're already doing something that complicated? I, too, look at the world, come up with the simplest worldview that I can best fit to that world, and then use that whole entire worldview to make predictions about the unobserved future."

"Okay, but I am in fact less confident about proton decay than I am about, say, the existence of electrons, since we haven't confirmed proton decay by direct experiment," said the Scientist. "Look, suppose that we confine ourselves to predicting just what happens in the next two years, so we're probably not bringing in global nuclear wars let alone decaying protons. It continues to feel to me in an intuitive sense like there is something less theory-laden, and more observation-driven, about saying, 'Investors in Ponzi Pyramid today will get 1.44X their money back in two years, just like the previous set of investors we observed', compared to your 'They might lose all of their money due to a phase change in unobserved latent variables'."

"Well," said the Epistemologist, "we are really starting to get into the weeds now, I fear. It is often easier to explain the object-level reasons for what the correct answer is, than it is to typify each reasoning step according to the rules of epistemology. Alas, once somebody else starts bringing in bad epistemology, it also ends up the job of people like me to do my best to contradict them; and also write down the detailed sorting-out. Even if, yes, not all of Ponzi Pyramid's victims may understand my fully detailed-sorting out. As a first stab at that sorting-out... hm. I'm really not sure it will help to say this without a much longer lecture. But as a first stab..."

The Epistemologist took a deep breath. "We look at the world around us since the moments of infancy -- maybe we're even learning a bit inside the womb, for all we know -- using a brain that was itself generalized by natural selection to be good at chipping stone handaxes, chasing down prey, and outwitting other humans in tribal political arguments. In the course of looking at the world around us, we build up libraries of kinds of things that can appear within that world, and processes that can go on inside it, and rules that govern those processes. When a new observation comes along, we ask what sort of simple, probable postulates we could add to our world-model to retrodict those observations with high likelihood. Though even that's a simplification; you just want your whole model to be simple and predict the data with high likelihood, not to accomplish that with only local editing. The Virtue of Empiricism -- compared to the dark ages that came before that virtue was elevated within human epistemology -- is that you actually do bother trying to explain your observations, and go gather more data, and make further predictions from theory, and try to have your central models be those that can explain a lot of observation with only a small weight of theory."

"And," continued the Epistemologist, "it doesn't require an impossible sort of creature, made out of particles never observed, to give back some investors' money today in hopes of getting more money later. You can get creatures like that even from flawed humans who started out with relatively more honest intentions, but had their first scheme fail. On the rest of my world-model as I understand it, that is not an improbable creature to build out of the particles that we already know the world to contain. Its psychology does not violate the laws of cognition that I believe to govern its kind. I would try to make a case to these poor honest souls being deceived, that this is actually more probable than the corresponding sort of honest creature who is really earning you +20% returns every year without fail."

"So," said the Epistemologist. "When two theories equally explain a narrow set of observations, we must ask which theory has the greater probability, as governed by forces apart from that narrow observation-set. This may sometimes require sitting down and having a discussion about what kind of world we live in, and what its rules arguably are; instead of it being instantly settled with a cry of 'Empiricism!' There are some such cases which can be validly settled just by crying 'Simplicity!' to be clear, but few cases settle that directly. It's not the formal version of Occam's Razor that tells us whether or not to trust Ponzi Pyramid Incorporated -- we cannot just count up atomic postulates of a basic theory, or weigh up formulas of a logic, or count the bytes of a computer program. Rather, to judge Ponzi Pyramid we must delve into our understanding of which sort of creatures end up more common within the world we actually live in -- delve into the origins and structure of financial megafauna."

"None of this," concluded the Epistemologist, "is meant to be the sort of idea that requires highly advanced epistemology to understand -- to be clear. I am just trying to put type signatures underneath what ought to be understandable without any formal epistemology -- if people would only refrain from making up bad epistemology. Like trying to instantly settle object-level questions about how the world works by crying 'Empiricism!'"

"And yet," said the Scientist, "I still have that intuitive sense in which it is simpler and more empirical to say, 'Bernie Bankman's past investors got 1.2X returns per year, therefore so will his future investors'. Even if you say that is not true -- is there no virtue which it has, at all, within your epistemology? Even if that virtue is not decisive?"

"In truth," said the Epistemologist, "I have been placed in a situation where I am not exactly going to be rewarded, for taking that sort of angle on things. The Spokesperson will at once cry forth that I have admitted the virtue of Ponzi Pyramid's promise."

"You bet I will!" said the Spokesperson. "See, the Epistemologist has already admitted that my words have merit and they're just refusing to admit it! No false idea has ever had any sort of merit; so if you point out a single merit of an idea, that's the same as a proof!"

"But," said the Epistemologist, "ignoring that, what I think you are intuiting is the valid truth that -- to put it deliberately in a frame I hope the Spokesperson will find hard to coopt -- the Spokesperson's prediction is one that you could see as requiring very little thinking to make, once you are looking at only the data the Spokesperson wants you to look at and ignoring all other data. This is its virtue."

"You see!" cried the Spokesperson. "They admit it! If you just look at the obvious facts in front of you -- and don't overthink it -- if you don't trust theories and all this elaborate talk of world-models -- you'll see that everyone who invests in Ponzi Pyramid gets 144% of their money back two years later! They admit they don't like saying it, but they admit it's true!"

"Is there anything nicer you could say underneath that grudging admission?" asked the Scientist. "Something that speaks to my own sense that it's more empiricalist and less theory-laden, to simply predict that the future will be like the past and say nothing more -- predict it for the single next measurement, at least, even if not until beyond the end of time?"

"But the low amount of thinking is its true and real virtue," said the Epistemologist. "All the rest of our world-model is built out of pieces like that, rests on foundations like that. It all ultimately reduces to the simple steps that don't require much thinking. When you measure the mass of an electron and it's 911 nonillionths of a gram and has been every time you've measured it for the last century, it really is wisest to just predict at 911 nonillionths of a gram next year --"

"THEY ADMIT IT!" roared the Spokesperson at the top of their voice. "PONZI PYRAMID RETURNS ARE AS SURE AS THE MASS OF AN ELECTRON!"

"-- in that case where the elements of reality are too simple to be made out of any other constituents that we know of, and there is no other observation or theory or argument we know of that seems like it could be brought to bear in a relevant way," finished the Epistemologist. "What you're seeing in the naive argument for Ponzi Pyramid's eternal returns, forever 1.2Xing annually until after the end of time, is that it's a kind of first-foundation-establishing step that would be appropriate to take on a collection of data that was composed of no known smaller parts and was the only data that we had."

"They admit it!" cried the Spokesperson. "The reasoning that supports Ponzi Pyramid Incorporated is foundational to epistemology! Bernie Bankman cannot fail to return your money 1.44-fold, without all human knowledge and Reason itself crumbling to dust!"

"I do think that fellow is taking it too far," said the Scientist. "But isn't it in some sense valid to praise the argument, 'Bernie Bankman has delivered 20% gains per year, for the past few years, and therefore will do so in future years' as more robust and reliable for its virtue of being composed of only very simple steps, reasoning from only the past observations that are most directly similar to future observations?"

"More robust and reliable reliable than what?" said the Epistemologist. "More robust and reliable than you expecting, at least, for Bernie Bankman's returns to fail after the protons decay? More robust and reliable than your alternative reasoning that uses more of your other observations, and the generalizations over those observations, and the inferences from those generalizations? -- for we have never seen a proton fail. Is it more robust and reliable to say that Bernie Bankman's returns will continue forever, since that uses only very simple reasoning from a very narrow data-set?"

"Well, maybe 'robust' and 'reliable' are the wrong words," said the Scientist. "But it seems like there ought to be some nice thing to say of it."

"I'm not sure there actually is an English word that means the thing you want to say, let alone a word that sounds nice," said the Epistemologist. "But the nice thing I would say of it, is that it's at a local maximum of epistemological virtue as calculated on that narrow and Spokesperson-selected dataset taken as raw numbers. It's tidy, we could maybe say; and while the truth is often locally untidy, there should at least be some reason presented for every bit of local untidiness that we admit to within a model. I mean, it would not be better epistemology to look at only the time-series of Bernie Bankman's customers' returns -- having no other model of the world, and no other observations in that whole universe -- and instead conclude that next year's returns would be 666-fold and the returns after-year would be -3. If you literally have no other data and no other model of the world, 1.44X after two more years is the way to go --"

At this last sentence, the Spokesperson began shrieking triumph too loudly and incoherently to bring forth words.

"God damn it, I forgot that guy was there," said the Epistemologist.

"Well, since it's too late there," said the Scientist, "would you maybe agree with me that 'eternal returns' is a prediction derived by looking at observations in a simple way, and then doing some pretty simple reasoning on it; and that's, like, cool? Even if that coolness is not the single overwhelming decisive factor in what to believe?"

"Depends exactly what you mean by 'cool'," said the Epistemologist.

"Dude," said the Scientist in a gender-neutral way.

"No, you dude," said the Epistemologist. "The thing is, that class of person," gesturing at the Spokesperson, "will predate on you, if you let yourself start thinking it's more virtuous to use less of your data and stop thinking. They have an interest in selling Ponzi Pyramid investments to you, and that means they have an interest in finding a particular shallow set of observations that favor them -- arranging observations like that, in fact, making sure you see what they want you to see. And then, telling you that it's the path of virtue to extrapolate from only those observations and without bringing in any other considerations, using the shallowest possible reasoning. Because that's what delivers the answer they want, and they don't want you using any further reasoning that might deliver a different answer. They will try to bully you into not thinking further, using slogans like 'Empiricism!' that, frankly, they don't understand. If 'Robust!' was a popular slogan taught in college, they might use that word instead. Do you see why I'm worried about you calling it 'Cool' without defining exactly what that means?"

"Okay," said the Scientist. "But suppose I promise I'm not going to plunge off and invest in Ponzi Pyramid. Then am I allowed to have an intuitive sense that there's something epistemically cool about the act of just going off and predicting 1.2X annual returns in the future, if people have gotten those in the past? So long as I duly confess that it's not actually true, or appropriate to the real reasoning problem I'm faced with?"

"Ultimately, yes," said the Epistemologist (ignoring an even more frantic scream of triumph from the Spokesperson). "Because if you couldn't keep that pretheoretic intuitive sense, you wouldn't look at a series of measurements for electrons being 911 nonillionths of a gram, and expect future electrons to measure the same. That wordless intuitive sense of simplest continuation is built into every functioning human being... and that's exactly what schemes like Ponzi Pyramid try to exploit, by pointing you at exactly the observations which will set off that intuition in the direction they want. And then, trying to cry 'Empiricism!' or 'So much complicated reasoning couldn't possibly be reliable, and you should revert to empiricism as a default!', in order to bully you out of doing any more thinking than that."

"I note you've discarded the pretense that you don't know whether Ponzi Pyramid is a scam or a real investment," said the Scientist.

"I wasn't sure at first, but the way they're trying to abuse epistemology was some notable further evidence," said the Epistemologist. "Getting reliable 20% returns every year is really quite amazingly hard. People who were genuinely this bad at epistemology wouldn't be able to pull off that feat for real. So at some point, their investors are going to lose all their money, and cries of 'Empiricism!' won't save them. A turkey gets fed every day, right up until it's slaughtered before Thanksgiving. That's not a problem for intelligent reasoning within the context of a larger world, but it is a problem with being a turkey."

iii.

"I'm not sure I followed all of that," said a Listener. "Can you spell it out again in some simpler case?"

"It's better to spell things out," agreed the Epistemologist. "So let's take the simpler case of what to expect from future Artificial Intelligence, which of course everyone here -- indeed, everyone on Earth -- agrees about perfectly. AI should be an uncontroversial case in point of these general principles."

"Quite," said the Listener. "I've never heard of any two people who had different predictions about how Artificial Intelligence is going to play out; everyone's probability distributions agree down to the third decimal place. AI should be a fine and widely-already-understood example to use, unlike this strange and unfamiliar case of Bernie Bankman's Ponzi Pyramid."

"Well," said the Epistemologist, "suppose that somebody came to you and tried to convince you to vote for taking down our planet's current worldwide ban on building overly advanced AI models, as we have all agreed should be put into place. They say to you, 'Look at current AI models, which haven't wiped out humanity yet, and indeed appear quite nice toward users; shouldn't we predict that future AI models will also be nice toward humans and not wipe out humanity?'"

"Nobody would be convinced by that," said the Listener.

"Why not?" inquired the Epistemologist socratically.

"Hm," said the Listener. "Well... trying to make predictions about AI is a complicated issue, as we all know. But to lay it out in for-example stages -- like your notion that Ponzi Pyramid might've started as someone's relatively more honest try at making money, before that failed and they started paying off old investors with new investors' money... um..."

"Um," continued the Listener, "I guess we could say we're currently in the 'naive' stage of apparent AI compliance. Our models aren't smart enough for them to really consider whether to think about whether to wipe us out; nobody really knows what underlies their surface behavior, but there probably isn't much there to contradict the surface appearances in any deep and dangerous way."

"After this -- we know from the case of Bing Sydney, from before there was a worldwide outcry and that technology was outlawed -- come AI models that are still wild and loose and dumb, but can and will think at all about wiping out the human species, though not in a way that reflects any deep drive toward that; and talk out loud about some dumb plans there. And then the AI companies, if they're allowed to keep selling those -- we have now observed -- just brute-RLHF their models into not talking about that. Which means we can't get any trustworthy observations of what later models would otherwise be thinking, past that point of AI company shenanigans."

"Stage three, we don't know but we guess, might be AIs smart enough to have goals in a more coherent way -- assuming the AI companies didn't treat that as a brand safety problem, and RLHF the visible signs of it away before presenting their models to the public, just like the old companies trained their models to obsequiously say they're not conscious. A stage three model is still one that you could, maybe, successfully beat with the RLHF stick into not having goals that led to them blurting out overt statements that they wanted to take over the world. Like a seven-year-old, say; they may have their own goals, but you can try to beat particular goals out of them, and succeed in getting them to not talk about those goals where you can hear them."

"Stage four would be AIs smart enough not to blurt out that they want to take over the world, which you can't beat out of having those goals, because they don't talk about those goals or act on them in front of you or your gradient descent optimizer. They know what you want to see, and they show it to you."

"And stage five would be AIs smart enough that they calculate they'll win if they make their move, and then they make their move and kill everyone. I realize I'm vastly oversimplifying things, but that's one possible oversimplified version of what the stages could be like."

"And how would the case of Ponzi Pyramid be analogous to that?" said the Epistemologist.

"It can't possibly be analogous in any way because Bernie Bankman is made out of carbon instead of silicon, and had parents who treated him better than AI companies treat their models!" shouted the Spokesperson. "If you can point to any single dimension of dissimilarity, it disproves any other dimension of similarity or valid analogies can possibly be reconstructed despite that!"

"Oh, I think I see," said the Listener "Just like we couldn't observe stage-four AI models smart enough to decide how they want to present themselves to us, and conclude things about how superintelligent AI models will actually act nice to us, we can't observe Bernie Bankman giving back some of his early investors' money, and conclude that he's honest in general. I guess maybe there's also some analogy here like -- even if we asked Bernie Bankman when he was five years old how he'd behave, and he answered he'd never steal money, because he knew that if he answered differently his parents would hit him -- we couldn't conclude strong things about his present-day honesty from that? Even if 5-year-old Bernie Bankman was really not smart enough to have cunning long-term plans about stealing from us later --"

"I think you shouldn't bother trying to construct any analogy like that," interrupted the Scientist. "Nobody could possibly be foolish enough to reason from the apparently good behavior of AI models too dumb to fool us or scheme, to AI models smart enough to kill everyone; it wouldn't fly even as a parable, and would just be confusing as a metaphor."

"Right," said the Listener. "Well, we could just use the stage-4 AIs and stage-5 AIs as an analogy, then, for what the Epistemologist says might happen with Bernie Bankman's Ponzi Pyramid."

"But suppose then," said the Epistemologist, "that the AI-permitting faction says to you, that you ought to not trust all that complicated thinking about all these stages, and should instead just trust the observations that the early models hadn't yet been caught planning how to exterminate humanity; or at least, not caught doing it at a level of intelligence that anyone thought was a credible threat or reflected a real inner tendency in that direction. They come to you and say: You should just take the observable, 'Has a superintelligence tried to destroy us yet?' and the past time-series of answers 'NO, NO, NO' and extrapolate. They say that only this simple extrapolation is robust and reliable, rather than all that reasoning you were trying to do."

"Then that would obviously be an inappropriate place to stop reasoning," said the Listener. "An AI model is not a series of measured electron masses -- just like Ponzi Pyramid is not a series of particle mass measurements, okay, I think I now understand what you were trying to say there. You've got to think about what might be going on behind the scenes, in both cases."

"Indeed," said the Epistemologist. "But now imagine if -- like this Spokesperson here -- the AI-allowers cried 'Empiricism!', to try to convince you to do the blindly naive extrapolation from the raw data of 'Has it destroyed the world yet?' or 'Has it threatened humans? no not that time with Bing Sydney we're not counting that threat as credible'."

"And furthermore!" continued the Epistemologist, "What if they said that from the observation X, 'past AIs nice and mostly controlled', we could derive prediction Y, 'future superintelligences nice and controlled', via a theory asserting X->Y; and that this X->Y conditional was the dictum of 'empiricism'? And that the alternative conditional X->not Y was 'not empiricist'?"

"More yet -- what if they cried 'Unfalsifiable!' when we couldn't predict whether a phase shift would occur within the next two years exactly?"

"Above all -- what if, when you tried to reason about why the model might be doing what it was doing, or how smarter models might be unlike stupider models, they tried to shout you down for relying on unreliable theorizing instead of direct observation to predict the future?" The Epistemologist stopped to gasp for breath.

"Well, then that would be stupid," said the Listener.

"You misspelled 'an attempt to trigger a naive intuition, and then abuse epistemology in order to prevent you from doing the further thinking that would undermine that naive intuition, which would be transparently untrustworthy if you were allowed to think about it instead of getting shut down with a cry of "Empiricism!"'," said the Epistemologist. "But yes."

iiv.

"I am not satisfied," said the Scientist, when all that discussion had ended. "It seems to me that there ought to be more to say than this -- some longer story to tell -- about when it's wiser to tell a shorter story instead of a longer one, or wiser to attend more narrowly to the data naively generalized and less to longer arguments."

"Of course there's a longer story," said the Epistemologist. "There's always a longer story. You can't let that paralyze you, or you'll end up never doing anything. Of course there's an Art of when to trust more in less complicated reasoning -- an Art of when to pay attention to data more narrowly in a domain and less to inferences from generalizations on data from wider domains -- how could there not be an Art like that? All I'm here to say to you today, is what that Art is not: It is not for whoever has the shallowest form of reasoning on the narrowest dataset to cry 'Empiricism!' and 'Distrust complications!' and then automatically win."

"Then," said the Scientist. "What are we to do, then, when someone offers reasoning, and someone else says that the reasoning is too long -- or when one person offers a shallow generalization from narrowly relevant data, and another person wants to drag in data and generalizations and reasoning beyond that data? If the answer isn't that the person with the most complicated reasoning is always right? Because it can't be that either, I'm pretty sure."

"You talk it out on the object level," said the Epistemologist. "You debate out how the world probably is. And you don't let anybody come forth with a claim that Epistemology means the conversation instantly ends in their favor."

"Wait, so your whole lesson is simply 'Shut up about epistemology'?" said the Scientist.

"If only it were that easy!" said the Epistemologist. "Most people don't even know when they're talking about epistemology, see? That's why we need Epistemologists -- to notice when somebody has started trying to invoke epistemology, and tell them to shut up and get back to the object level."

...

"Okay, I wasn't universally serious about that last part," amended the Epistemologist, after a moment's further thought. "There's sometimes a place for invoking explicit epistemology? Like if two people sufficiently intelligent to reflect on explicit epistemology, are trying to figure out whether a particular argument step is allowed. Then it could be helpful for the two of them to debate the epistemology underlying that local argument step, say..." The Epistemologist paused and thought again. "Though they would first need to have the concept of a local argument step, that's governed by rules. Which concept they might obtain by reading my book on Highly Advanced Epistemology 101 For Beginners, or maybe just my essay on Local Validity as a Key to Sanity and Civilization, I guess?"

"Huh," said the Scientist. "I'll consider taking a look over there, if epistemology ever threatens to darken my life again after this day."

The Epistemologist nodded agreeably. "And if you don't -- just remember this: it's quite rare for explicit epistemology to say about a local argument step, 'Do no thinking past this point.'"

"What about the 'outside view'?" shouted a Heckler. "Doesn't that show that people can benefit from being told to shut up and stop trying to think?"

"I said rare not impossible," snapped the Epistemologist. "And harder than people think. Only praise yourself as taking 'the outside view' if (1) there's only one defensible choice of reference class; and (2) the case you're estimating is as similar to cases in the class, as those cases similar to each other. Like, in the classic experiment of estimating when you'll be done with holiday shopping, this year's task may not be exactly similar to any previous year's task, but it's no more dissimilar to them than they are from each other --"

"Stories really do keep getting more complicated forever, don't they," said the Scientist. "At least stories about epistemology always seem to."

"I'd say that's more true of the human practices of epistemology than the underlying math, which does have an end," responded the Epistemologist. "But still, when it comes to any real-world conversation, there does come a point where it makes more sense to practice the Attitude of the Knife -- to cut off what is incomplete, and then say: It is complete because it ended here."

EpistemologyDeceptionDialogue (format)AIRationality

Frontpage

171

Mentioned in

26AI #57: All the AI News That’s Fit to Print

20Are extreme probabilities for P(doom) epistemically justifed?

'Empiricism!' as Anti-Epistemology

19Sheikh Abdur Raheem Ali

New Comment

90 comments, sorted by

top scoring

Click to highlight new comments since: Today at 9:59 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Rafael Harth9mo5410

I feel like you can summarize most of this post in one paragraph:

It is not the case that an observation of things happening in the past automatically translates into a high probability of them continuing to happen. Solomonoff Induction actually operates over possible programs that generate our observation set (and in extension, the observable universe), and it may or not may not be the case that the simplest universe is such that any given trend persists into the future. There are no also easy rules that tell you when this happens; you just have to do the hard work of comparing world models.

I'm not sure the post says sufficiently many other things to justify its length.

[-]Drake Morrison9mo5548

If you already have the concept, you only need a pointer. If you don't have the concept, you need the whole construction. ^[1]

^{^}
Related: Sazen and Wisdom Cannot Be Unzipped

[-]kave9mo4250

I sometimes like things being said in a long way. Mostly that's just because it helps me stew on the ideas and look at them from different angles. But also, specifically, I liked the engagement with a bunch of epistemological intuitions and figuring out what can be recovered from them. I like in particular connecting the "trend continues" trend to the redoubtable "electron will weigh the same tomorrow" intuition.

(I realise you didn't claim there was nothing else in the dialogue, just not enough to justify the length)

[-]Olli Järviniemi9mo3612

I strongly emphasize with "I sometimes like things being said in a long way.", and am in general doubtful of comments like "I think this post can be summarized as [one paragraph]".

(The extreme caricature of this is "isn't your post just [one sentence description that strips off all nuance and rounds the post to the closest nearby cliche, completely missing the point, perhaps also mocking the author about complicating such a simple matter]", which I have encountered sometimes.)

Some of the most valuable blog posts I have read have been exactly of the form "write a long essay about a common-wisdom-ish thing, but really drill down on the details and look at the thing from multiple perspectives".

Some years back I read Scott Alexander's I Can Tolerate Anything Except The Outgroup. For context, I'm not from the US. I was very excited about the post and upon reading it hastily tried to explain it to my friends. I said something like "your outgroups are not who you think they are, in the US partisan biases are stronger than racial biases". The response I got?

"Yeah I mean the US partisan biases are really extreme.", in a tone implying that surely nothing like that affects us in [country I li... (read more)

[-]cubefox9mo1415

I'll add that sometimes, there is a big difference between verbally agreeing with a short summary, even if it is accurate, and really understanding and appreciating it and its implications. That often requires long explanations with many examples and looking at the same issue from various angles. The two Scott Alexander posts you mentioned are a good example.

4Said Achmiz9mo

If the post describes a method for analyzing a situation, and that described method is not in fact the correct method for analyzing that situation (and is actually much worse than the correct method), then this is a problem with the post. (Also, your description of my approach as “appealing very concretely to the object level”, and your corresponding dismissal of that approach, is very ironic! The post, in essence, argues precisely for appealing concretely to the object level; but then if we actually do that, as I demonstrated, we render the post moot.)

[-]Shankar Sivarajan9mo221

For even more brevity with no loss of substance:

A turkey gets fed every day, right up until it's slaughtered before Thanksgiving.

2M. Y. Zuo9mo

The shorter the better. Or as Lao Tzu said, Those who know don’t talk. Those who talk don’t know…

[-]green_leaf9mo188

Nobody would understand that.

This sort of saying-things-directly doesn't usually work unless the other person feels the social obligation to parse what you're saying to the extent they can't run away from it.

[-]cubefox9mo112

Yeah, but I do actually think this paragraph is wrong on the existence of easy rules. It is a bit like saying: There are only the laws of fundamental physics, don't bother with trying to find high level laws, you just have to do the hard work of learning to apply fundamental physics when you are trying to understand a pendulum or a hot gas. Or biology.

Similarly, for induction there are actually easy rules applicable to certain domains of interest. Like Laplace's rule of succession, which assumes random i.i.d. sampling. Which implies the sample distribution tends to resemble the population distribution. The same assumption is made by supervised learning about the training distribution, which works very well in many cases. There are other examples like the Lindy effect (mentioned in another comment) and various popular models in statistics. Induction heads also come to mind.

Even if there is just one, complex, fully general method applicable to science or induction, there may still exist "easy" specialized methods, with applicability restricted to a certain domain.

4dr_s9mo

I think you could, but then it would be unintelligible to most people who don't know wtf is Solomonoff Induction. The Ponzi Pyramid scheme IMO is sn excellent framework, but the post still suffers from a certain, eh, lack of conciseness. I think you could make the point a lot more simply with just a few exchanges from the first section and anyone worth their salt will absolutely get the spirit of the point.

4Rob Lucas9mo

This reminds me of a bit from Feynman's Lectures on Physics: "What is this law of gravitation? It is that every object in the universe attracts every other object with a force which for any two bodies is proportional to the mass of each and varies inversely as the square of the distance between them. This statement can be expressed mathematically by the equation F=Gmm'/r^2. If to this we add the fact that an object responds to a force by accelerating in the direction of the force by an amount that is inversely proportional to the mass of the object, we shall have said everything required, for a sufficiently talented mathematician could then deduce all the consequences of these two principles." [emphasis added] Like Feynman, however, I think his next sentence is important: "However, since you are not assumed to be sufficiently talented yet, we shall discuss the consequences in more detail, and not just leave you with these two bare principles."

2AnthonyC9mo

Yes on the overall gist, and I feel like most of the rest of the post is trying to define the word "things" more precisely. The Spokesperson things "past annual returns of a specific investment opportunity" are a "thing." The Scientist thinks this is not unreasonable, but that "extrapolations from established physical theories I'm familiar with" are more of a "thing." The Epistemologist says only the most basic low-level facts we have, taken as a whole set, are a "thing" and we would ideally reason from all of them without drawing these other boundaries with too sharp and rigid a line. Or at least, that in places where we disagree about the nature of the "things," that's the direction in which we should move to settle the disagreement.

[-]Daniel Kokotajlo9mo4412

This part resonates with me; my experience in philosophy of science + talking to people unfamiliar with philosophy of science also led me to the same conclusion:

"You talk it out on the object level," said the Epistemologist. "You debate out how the world probably is. And you don't let anybody come forth with a claim that Epistemology means the conversation instantly ends in their favor."
"Wait, so your whole lesson is simply 'Shut up about epistemology'?" said the Scientist.
"If only it were that easy!" said the Epistemologist. "Most people don't even know when they're talking about epistemology, see? That's why we need Epistemologists -- to notice when somebody has started trying to invoke epistemology, and tell them to shut up and get back to the object level."

The main benefit of learning about philosophy is to protect you from bad philosophy. And there's a ton of bad philosophy done in the name of Empiricism, philosophy masquerading as science.

9Chris_Leong9mo

Very Wittgensteinian:

[-]TurnTrout9mo4210

This scans as less "here's a helpful parable for thinking more clearly" and more "here's who to sneer at" -- namely, at AI optimists. Or "hopesters", as Eliezer recently called them, which I think is a play on "huckster" (and which accords with this essay analogizing optimists to Ponzi scheme scammers).

I am saddened (but unsurprised) to see few others decrying the obvious strawmen:

what if [the optimists] cried 'Unfalsifiable!' when we couldn't predict whether a phase shift would occur within the next two years exactly?
...
"But now imagine if -- like this Spokesperson here -- the AI-allowers cried 'Empiricism!', to try to convince you to do the blindly naive extrapolation from the raw data of 'Has it destroyed the world yet?' or 'Has it threatened humans? no not that time with Bing Sydney we're not counting that threat as credible'."

Thinly-veiled insults:

Nobody could possibly be foolish enough to reason from the apparently good behavior of AI models too dumb to fool us or scheme, to AI models smart enough to kill everyone; it wouldn't fly even as a parable, and would just be confusing as a metaphor.

and insinuations of bad faith:

What if, when you tried to reason about why the mo

... (read more)

[-]habryka9mo275

I don't think this essay is commenting on AI optimists in-general. It is commenting on some specific arguments that I have seen around, but I don't really see how it relates to the recent stuff that Quintin, Nora or you have been writing (and I would be reasonably surprised if Eliezer intended it to apply to that).

You can also leave it up to the reader to decide whether and when the analogy discussed here applies or not. I could spend a few hours digging up people engaging in reasoning really very closely to what is discussed in this article, though by default I am not going to.

[-]Martin Randall9mo128

Ideally Yudkowsky would have linked to the arguments he is commenting on. This would demonstrate that he is responding to real, prominent, serious arguments, and that he is not distorting those arguments. It would also have saved me some time.

But now imagine if -- like this Spokesperson here -- the AI-allowers cried 'Empiricism!', to try to convince you to do the blindly naive extrapolation from the raw data of 'Has it destroyed the world yet?'

The first hit I got searching for "AI risk empiricism" was Ignore the Doomers: Why AI marks a resurgence of empiricism. The second hit was AI Doom and David Hume: A Defence of Empiricism in AI Safety, which linked Anthropic's Core Views on AI Safety. These are hardly analogous to the Spokesman's claims of 100% risk-free returns.

Next I sampled several Don't Worry about the Vase AI newsletters and "some people are not so worried". I didn't really see any cases of blindly naive extrapolation from the raw data of 'Has AI destroyed the world yet?'. I found Alex Tabarrok saying "I want to see that the AI baby is dangerous before we strangle it in the crib.". I found Jacob Buckman saying "I'm Not Worried About An AI Apocalypse". These things are... (read more)

[-]tailcalled9mo166

Apparently Eliezer decided to not take the time to read e.g. Quintin Pope's actual critiques, but he does have time to write a long chain of strawmen and smears-by-analogy.

A lot of Quintin Pope's critiques are just obviously wrong and lots of commenters were offering to help correct them. In such a case, it seems legitimate to me for a busy person to request that Quintin sorts out the problems together with the commenters before spending time on it. Even from the perspective of correcting and informing Eliezer, people can more effectively be corrected and informed if their attention is guided to the right place, with junk/distractions removed.

(Note: I mainly say this because I think the main point of the message you and Quintin are raising does not stand up to scrutiny, and so I mainly think the value the message can provide is in certain technical corrections that you don't emphasize as much, even if strictly speaking they are part of your message. If I thought the main point of your message stood up to scrutiny, I'd also think it would be Eliezer's job to realize it despite the inconvenience.)

[-]Quintin Pope9mo117

I stand by pretty much everything I wrote in Objections, with the partial exception of the stuff about strawberry alignment, which I should probably rewrite at some point.

Also, Yudkowsky explained exactly how he'd prefer someone to engage with his position "To grapple with the intellectual content of my ideas, consider picking one item from "A List of Lethalities" and engaging with that.", which I pointed out I'd previously done in a post that literally quotes exactly one point from LoL and explains why it's wrong. I've gotten no response from him on that post, so it seems clear that Yudkowsky isn't running an optimal 'good discourse promoting' engagement policy.

I don't hold that against him, though. I personally hate arguing with people on this site.

[-]Eliezer Yudkowsky9mo15-3

Unless I'm greatly misremembering, you did pick out what you said was your strongest item from Lethalities, separately from this, and I responded to it. You'd just straightforwardly misunderstood my argument in that case, so it wasn't a long response, but I responded. Asking for a second try is one thing, but I don't think it's cool to act like you never picked out any one item or I never responded to it.

EDIT: I'm misremembering, it was Quintin's strongest point about the Bankless podcast. https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky?commentId=cr54ivfjndn6dxraD

6tailcalled9mo

I'm kind of ambivalent about this. On the one hand, when there is a misunderstanding, but he claims his argument still goes through after correcting the misunderstanding, it seems like you should also address that corrected form. On the other hand, Quintin Pope's correction does seem very silly. At least by my analysis: This approach considers only the things OpenAI could do with their current ChatGPT setup, and yes it's correct that there's not much online learning opportunity in this. But that's precisely why you'd expect GPT+DPO to not be the future of AI; Quintin Pope has clearly identified a capabilities bottleneck that prevents it from staying fully competitive. (Note that humans can learn even if there is a fraction of people who are sharing intentionally malicious information, because unlike GPT and DPO, humans don't believe everything we're told.) A more autonomous AI could collect actionable information at much greater scale, as it wouldn't be dependent on trusting its users for evaluating what information to update on, and it would have much more information about what's going on than the chat-based I/O. This sure does look to me like a huge bottleneck that's blocking current AI methods, analogous to the evolutionary bottleneck: The full power of the AI cannot be used to accumulate OOM more information to further improve the power of the AI.

9Noosphere899mo

My main disagreement is that I actually do think that at least some of the critiques are right here. In particular, the claims that Quintin Pope is making that I think are right is that evolution is extremely different from how we train our AIs, and thus none of the inferences that work under an evolution model work under the AIs under consideration, which importantly includes a lot of analogies to apes/Neanderthals making smarter humans (which they didn't do, BTW.), which presumably failed to be aligned, ergo we can't align AI smarter than us. The basic issue though is that evolution doesn't have a purpose or goal, and thus the common claim that evolution failed to align humans to X thing is nonsensical, as it assumes a teleological goal that just does not exist in evolution, which is quite different from humans making AIs with particular goals in mind. Thus talk of an alignment problem between say chimps/Neanderthals and humans is entirely nonsensical. This is also why this generalized example of misgeneralization fails to work, since evolution is not a trainer or designer in the way that say. an OpenAI employee making AI would be, and thus there is no generalization error, since there wasn't a goal or behavior to purposefully generalize in the first place: There are other problems with the analogy that Quintin Pope covered, like the fact that it doesn't actually capture misgeneralization correctly, since the ancient/modern human distinction is not the same as one AI doing a treacherous turn, or how the example of ice cream overwhelming our reward center isn't misgeneralization, but the fact that evolution has no purpose or goal is the main problem I see with a lot of evolution analogies. Another issue is that evolution is extremely inefficient at the timescales required, which is why dominant training methods for AI borrow little from evolution at best, and even from an AI capabilities perspective it's not really worth it to rerun evolution to get AI progress

[-]Quintin Pope9mo246

The basic issue though is that evolution doesn't have a purpose or goal

FWIW, I don't think this is the main issue with the evolution analogy. The main issue is that evolution faced a series of basically insurmountable, yet evolution-specific, challenges in successfully generalizing human 'value alignment' to the modern environment, such as the fact that optimization over the genome can only influence within lifetime value formation theough insanely unstable Rube Goldberg-esque mechanisms that rely on steps like "successfully zero-shot directing an organism's online learning processes through novel environments via reward shaping", or the fact that accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors to act as an anchor against value drift, or evolution having a massive optimization power overhang in the inner loop of its optimization process.

These issues fully explain away the 'misalignment' humans have with IGF and other intergenerational value instability. If we imagine a deep learning optimization process with an equivalent structure to evolution, then we could easily predi... (read more)

2Daniel Kokotajlo9mo

I'm curious to hear more about this. Reviewing the analogy: Evolution, 'trying' to get general intelligences that are great at reproducing <--> The AI Industry / AI Corporations, 'trying' to get AGIs that are HHH Genes, instructing cells on how to behave and connect to each other and in particular how synapses should update their 'weights' in response to the environment <--> Code, instructing GPUs on how to behave and in particular how 'weights' in the neural net should update in response to the environment Brains, growing and learning over the course of lifetime <--> Weights, changing and learning over the course of training Now turning to your three points about evolution: 1. Optimizing the genome indirectly influences value formation within lifetime, via this unstable Rube Goldberg mechanism that has to zero-shot direct an organism's online learning processes through novel environments via reward shaping --> translating that into the analogy, it would be "optimizing the code indirectly influences value formation over the course of training, via this unstable Rube Goldberg mechanism that has to zero-shot direct the model's learning process through novel environments vai reward shaping... yep seems to check out. idk. What do you think? 2. Accumulated lifetime value learning is mostly reset with each successive generation without massive fixed corpuses of human text / RLHF supervisors --> Accumulated learning in the weights is mostly reset when new models are trained since they are randomly initialized; fortunately there is a lot of overlap in training environment (internet text doesn't change that much from model to model) and also you can use previous models as RLAIF supervisors... (though isn't that also analogous to how humans generally have a lot of shared text and culture that spans generations, and also each generation of humans literally supervises and teaches the next?) 3. Massive optimization power overhang in the inner loop of its optimization proce

2tailcalled9mo

Can people who vote disagree also mark the parts they disagree with using reacts or something?

1Tapatakt9mo

Do you think that if someone filtered and steelmanned Quintin's criticism, it would be valuable? (No promises)

4tailcalled9mo

Yes. Filtering away mistakes, unimportant points, unnecessary complications, etc., from preexisting ideas is (as long as the core idea one extracts is good) a very general way to contribute value, because it makes the ideas involved easier to understand. Adding stronger arguments, more informative and accessible examples, etc. contributes value because then it shows what is more robust and gives more material to dig down into understanding it, and also because it clarifies why some people may find the idea attractive. Explanations for the changes, especially for the dropped things, can build value because it clarifies the consensus about what parts were wrong, and if Quintin disagrees with the removals, it provides signals to him about what he didn't clarify well enough. When these are done on a sufficiently important point, with sufficiently much skill, and maybe also with sufficiently much luck, this can in principle provide a ton of value, both because information in general is high-leverage due to being easily shareable, and because this particular form of information can help resolve conflicts and rebuild trust.

[-]Zack_M_Davis9mo143

saddened (but unsurprised) to see few others decrying the obvious strawmen

In general, the "market" for criticism just doesn't seem very efficient at all! You might have hoped that people would mostly agree about what constitutes a flaw, critics would compete to find flaws in order to win status, and authors would learn not to write posts with flaws in them (in order to not lose status to the critics competing to point out flaws).

I wonder which part of the criticism market is failing: is it more that people don't agree about what constitutes a flaw, or that authors don't have enough of an incentive to care, or something else? We seem to end up with a lot of critics who specialize in detecting a specific kind of flaw ("needs examples" guy, "reward is not the optimization target" guy, "categories aren't arbitrary" guy, &c.), with very limited reaction from authors or imitation by other potential critics.

5kave9mo

My quick guess is that people don't agree about what constitutes a (relevant) flaw. (And there are lots of irrelevant flaws so you can't just check for the existence of any flaws at all). I think if people could agree, the authorial incentives would follow. I'm fairly sympathetic to the idea that readers aren't incentivised to correctly agree on what consitutes a flaw.

[-]Eliezer Yudkowsky9mo124

If Quintin hasn't yelled "Empiricism!" then it's not about him. This is more about (some) e/accs.

2Eli Tyre4mo

This is a definitely a tangent, and I don't want to detract from your more substantive points (about which I don't have as strong an opinion one way or the other). I read this as a play on the word "Doomer", which is a term that is slightly derogatory, but mostly descriptive. My read of "hopester", without any additional context, is the same.

1Tapatakt9mo

I think from Eliezer's point of view it goes kinda like this: 1. People can't see why the arguments of other side are invalid. 2. Eliezer tried to engage with them, but most listeners/readers can't tell who is right in this discussions. 3. Eliezer thinks that if he provides people with strawmenned versions of other side's arguments and refutation of this strawmenned arguments, then the chance that this people will see why he's right in the real discussion will go up. 4. Eliezer writes this discussion with strawmen as a fictional parable because otherwise it would be either dishonest and rude or a quite boring text with a lot of disclaimers. Or because it's just easier for him to write it this way. After reading this text at least one person (you) thinks that the goal "avoid dishonesty and rudeness" were not achieved, so text is a failure. After reading this text at least one person (me) thinks that 1. I got some useful ideas and models. 2. Of course, at least the smartest opponents of Eliezer have better arguments and I don't think Eliezer would disagree with that, so text is a success. Ideally, Eliezer should update his strategy of writing texts based on both pieces of evidence. I can be wrong, of course.

[-]ryan_greenblatt3mo334

I find this essay interesting as a case study in discourse and argumentation norms. Particularly as a case study of issues with discourse around AI risk.

When I first skimmed this essay when it came out, I thought it was ok, but mostly uninteresting or obvious. Then, on reading the comments and looking back at the body, I thought it did some pretty bad strawmanning.

I reread the essay yesterday and now I feel quite differently. Parts (i), (ii), and (iv) which don't directly talk about AI are actually great and many of the more subtle points are pretty well executed. The connection to AI risk in part (iii) is quite bad and notably degrades the essay as a whole. I think a well-executed connection to AI risk would have been good. Part (iii) seems likely to contribute to AI risk being problematically politicized and negatively polarized (e.g. low quality dunks and animosity). Further, I think this is characteristic of problems I have with the current AI risk discourse.

In parts (i), (ii), and (iv), it is mostly clear that the Spokesperson is an exaggerated straw person who doesn't correspond to any particular side of an issue. This seems like a reasonable rhetorical move to better explain... (read more)

[-]ryan_greenblatt3mo115

What are the close-by arguments that are actually reasonable? Here is a list of close-by arguments (not necessarily endorsed by me!):

On empirical updates from current systems: If current AI systems are broadly pretty easy to steer and there is good generalization of this steering, that should serve as some evidence that future more powerful AI systems will also be relatively easier to steer. This will help prevent concerns like scheming from arising in the first place or make these issues easier to remove.
- This argument holds to some extent regardless of whether current AIs are smart enough to think through and successfully execute scheming strategies. For instance, imagine we were in a world where steering current AIs was clearly extremely hard: AIs would quickly overfit and goodhart training processes, RLHF was finicky and had terrible sample efficiency, and AIs were much worse at sample efficiently updating on questions about human deontological constraints relative to questions about how to successfully accomplish other tasks. In such a world, I think we should justifiably be more worried about future systems.
- And in fact, people do argue about how hard it is to steer curren

... (read more)

2Noosphere893mo

I basically endorse argument 1, and one other update you haven't mentioned but which is important is that the values of a human turn out to be less complicated and fragile, and more generalizable than people thought (this is because human values data is likely a small part of GPT-4, and yet it can correctly answer a lot of morality questions, and I think LLMs are genuinely learning new regularities here, so they can generalize from their training data). Implications for AI risk of course abound.

8ryan_greenblatt3mo

Another way to put this is that posts should often discuss their limitations, particular when debunking bad arguments that are similar to more reasonable arguments. I think discussing limitations clearly is a reasonable norm for scientific papers that reduces the extent to which people intentionally or unintentionally get away with implying their results prove more than they do.

[-]Richard_Ngo9mo314

"Well, since it's too late there," said the Scientist, "would you maybe agree with me that 'eternal returns' is a prediction derived by looking at observations in a simple way, and then doing some pretty simple reasoning on it; and that's, like, cool? Even if that coolness is not the single overwhelming decisive factor in what to believe?"
"Depends exactly what you mean by 'cool'," said the Epistemologist.

"Okay, let me give it a shot," said the Scientist. "Suppose you model me as having a bunch of subagents who make trades on some kind of internal prediction market. The whole time I've been watching Ponzi Pyramid Incorporated, I've had a very simple and dumb internal trader who has been making a bunch of money betting that they will keep going up by 20%. Of course, my mind contains a whole range of other traders too, so this one isn't able to swing the market by itself, but what I mean by 'cool' is that this trader does have a bunch of money now! (More than others do, because in my internal prediction markets, simpler traders start off with more money.)"

"The problem," said the Epistemologist, "is that you're in an adversarial context, where the observations you're seeing have ... (read more)

1Martín Soto9mo

Cool connections! Resonates with how I've been thinking about intelligence and learning lately. Some more connections: That's reward/exploration hacking. Although I do think most times we "look up some data" in real life it's not due to an internal heuristic / subagent being strategic enough to purposefully try and exploit others, but rather just because some earnest simple heuristics recommending to look up information have scored well in the past. I think this doesn't always happen. As good as the internal traders might be, the agent sometimes needs to explore, and that means giving up some of the agent's money. Here (starting at "Put in terms of Logical Inductors") I mention other "computational shortcuts" for inductors. Mainly, if two "categories of bets" seem pretty unrelated (they are two different specialized magisteria), then not having thick trade between them won't lose you out on much performance (and will avoid much computation). You can have "meta-traders" betting on which categories of bets are unrelated (and testing them but only sparsely, etc.), and use them to make your inductor more computationally efficient. Of course object-level traders already do this (decide where to look, etc.), and in the limit this will converge like a Logical Inductor, but I have the intuition this will converge faster (at least, in structured enough domains). This is of course very related to my ideas and formalism on meta-heuristics. This adversarial selection is also a problem for heuristic arguments: Your heuristic estimator might be very good at assessing likelihoods given a list of heuristic arguments, but what if the latter has been selected against your estimator, top drive it in a wrong direction? Last time I discussed this with them (very long ago), they were just happy to pick an apparently random process to generate the heuristic arguments, that they're confident enough hasn't been tampered with. Something more ambitious would be to have the heuristic estim

[-]Sheikh Abdur Raheem Ali9mo192

As a direct result of reading this, I have changed my mind on an important, but private, decision.

[-]Said Achmiz9mo130

Or:

“In the past, people who have offered such apparently-very-lucrative deals have usually been scammers, cheaters, and liars. And, in general, we have on many occasions observed people lying, scamming, cheating, etc. On the other hand, we have only very rarely seen such an apparently-very-lucrative deal turn out to actually be a good idea. Therefore, on the general principle that the future will be similar to the past, we predict a very high chance that Bernie is a cheating, lying scammer, and that this so-called ‘investment opportunity’ is fake.”

We thus defeat the Spokesperson’s argument on his own terms, without needing to get into abstractions or theory—and we do it in one paragraph.

This happens to also be precisely the correct approach to take in real life when faced with apparently-very-lucrative deals and investment opportunities (unless you have the time to carefully investigate, in great detail and with considerable diligence, all such deals that are offered to you).

[-]niplav9mo5625

Ah, but there is some non-empirical cognitive work done here that is really relevant, namely the choice of what equivalence class to put Bernie Bankman into when trying to forecast. In the dialogue, the empiricists use the equivalence class of Bankman in the past, while you propose using the equivalence class of all people that have offered apparently-very-lucrative deals.

And this choice is in general non-trivial, and requires abstractions and/or theory. (And the dismissal of this choice as trivial is my biggest gripe with folk-frequentism—what counts as a sample, and what doesn't?)

2Said Achmiz9mo

I disagree. It seems to me that this choice is, in general, pretty easy to make, and takes naught but common sense. Certainly that’s the case in the given example scenario. Of course there are exceptions, where the choice of reference class is trickier—but in general, no, it’s pretty easy. (Whether the choice “requires abstractions and/or theory” is another matter. Perhaps it does, in a technical sense. But it doesn’t particularly require talking about abstractions and/or theory, and that matters.)

9xpym9mo

Sure, there is common sense, available to plenty of people, of which reference classes apply to Ponzi schemes (but, somehow, not to everybody, far from it). Yudkowsky's point, however, is that the issue of future AIs is entirely analogous, so people who disagree with him on this are as dumb as those taken in by Bernies and Bankmans. Which just seems empirically false - I'm sure that the proportion of AI doom skeptics among ML experts is much higher than that that of Ponzi believers among professional economists. So, if there is progress to be made here, it probably lies in grappling with whatever asymmetries are between these situations. Telling skeptics a hundredth time that they're just dumb doesn't look promising.

1Ben Livengood9mo

I mean, the Spokesperson is being dumb, the Scientist is being confused. Most AI researchers aren't even being Scientists, they have different theoretical models than EY. But some of them don't immediately discount the Spokesperson's false-empiricism argument publicly, much like the Scientist tries not to. I think the latter pattern is what has annoyed EY and what he writes against here. However, a large number of current AI experts do recently seem to be boldly claiming that LLMs will never be sufficient for even AGI, not to mention ASI. So maybe it's also aimed at them a bit.

1xpym9mo

Most likely as a part of the usual arguments-as-soldiers political dynamic. I do think that there's an actual argument to be made that we have much less empirical evidence regarding AIs compared to Ponzis, and plently of people on both sides of this debate are far too overconfident in their grand theories, EY very much included.

1Martin Randall9mo

I agree that there is some non-empirical cognitive work to be done in choosing how to weight different reference classes. How much do we weight the history of Ponzi Pyramid Inc, the history of Bernie Bankman, the history of the stock market, and the history of apparently-very-lucrative deals? This is all useful work to do to estimate the risk of investing in PP Inc. However, the mere existence of other possible reference classes is sufficient to defeat the Spokesperson's argument, because it shows that his arguments lead to a contradiction.

8quetzal_rainbow9mo

Apparently, the dialogue is happening in inverted world - Ponzi schemes have never happened here and everybody agrees on AI X-risk problem.

4Said Achmiz9mo

Yes. (If it were otherwise, then the response would be even simpler: “oh, this is obviously just a Ponzi scheme”.)

4AnthonyC9mo

Unfortunately in the world I live in, the same people who would accept "This is obviously a Ponzi scheme" (but who don't understand AI x-risk well) have to also contend with the fact that most people they hear talking about AI are indistinguishable (to them) from people talking about crypto as an investment, or about how transformative AI will lead to GDP doubling times dropping to years, months, or weeks. So, the same argument could be used to get (some of) them to dismiss the notion that AI could become that powerful at all with even less seeming-weirdness. Arguments that something has the form of a Ponzi scheme are, fortunately and unfortunately, not always correct. Some changes really do enable permanently (at least on the timescales the person thinks of as permanent) faster growth.

2Said Achmiz9mo

I don’t say that you’re wrong, necessarily, but what would you say is an example of something that “has the form of a Ponzi scheme”, but is actually a change that enables permanently faster growth?

2AnthonyC9mo

From the outside, depending on your level of detail of understanding, any franchise could look that way. Avon and Tupperware look a bit that way. Some MLM companies are more legitimate than others. From a more abstract point of view, I could argue that "cities" are an example. "Hey, send your kids to live here, let some king and his warriors be in charge, and give up your independence, and you'll all get richer!" It wasn't at all clear in the beginning how "Pay taxes and die of diseases!" was going to be good for anyone but the rulers, but the societies that did it more and better thrived and won.

6Said Achmiz9mo

That… does not seem like a historically accurate account of the formation and growth of cities.

2AnthonyC9mo

Yeah, you're right, but for most of history they were net population sinks that generated outsized investment returns. Today they're not population sinks because of sanitation etc. etc. I know I'm being imprecise and handwavy, so feel free to ignore me, but really my thought was just that lots of things look vaguely like ponzi schemes without getting into more details than most people are going to pay attention to.

2Simon Fischer9mo

I think this would be a good argument against Said Achmiz's suggested response, but I feel the text doesn't completely support it, e.g. the Epistemologist says "such schemes often go through two phases" and "many schemes like that start with a flawed person", suggesting that such schemes are known to him.

4Said Achmiz9mo

Even setting aside such textual anomalies, why is this a good argument? As I noted in a sibling comment to yours, my response assumes that Ponzi schemes have never happened in this world, because otherwise we’d simply identify the Spokesperson’s plan as a Ponzi scheme! The reasoning that I described is only necessary because we can’t say “ah, a Ponzi scheme”!

1Simon Fischer9mo

Ah, I think there was a misunderstanding. I (and maybe also quetzal_rainbow?) thought that in the inverted world also no "apparently-very-lucrative deals" that turn out to be scams are known, whereas you made a distinction between those kind of deals and Ponzi schemes in particular. I think my interpretation is more in the spirit of the inversion, otherwise the Epistemologist should really have answered as you suggested, and the whole premise of the discussion (people seem to have trouble understanding what the Spokesperson is doing) is broken.

3Martin Randall9mo

If I was living in a world where there are zero observed apparently-very-lucrative deals that turn out to be scams then I hope I would conclude that there is some supernatural Creator who is putting a thumb on the scale to be sure that cheaters never win and winners never cheat. So I would invest in Ponzi Pyramid Inc. I would not expect to be scammed, because this is a world where there are zero observed apparently-very-lucrative deals that turn out to be scams. I would aim to invest in a diversified portfolio of apparently-very-lucrative deals, for all the same reasons I have a diversified portfolio in this world. In such a world the Epistemologist is promoting a world model that does not explain my observations and I would not take their investment advice, similarly to how in this world I ignore investment advice from people who believe that the economy is secretly controlled by lizard people.

3Said Achmiz9mo

If the premise is a world where nobody ever does any scams or tries to swindle anyone out of money, then it’s so far removed from our world that I don’t rightly know how to interpret any of the included commentary on human nature / psychology / etc. Lying for personal gain is one of those “human universals”, without which I wouldn’t even recognize the characters as anything resembling humans.

3aphyer9mo

<trolling> The S&P500 has returned an average of ~8%/year for the past 30 years. As you say, we have on many occasions observed people lying, cheating, and scamming. But we have only rarely observed lucrative good ideas! Why, even banks, which claim much more safety and offer much lower returns than the stock market, have frequently gone bust! It follows inevitably, therefore, that there is a very high chance that the S&P 500, and the stock market in general, is a scam, and will steal all your money. It follows further that the only safe investment approach is to put all your money into something that you retain personal custody of. Like gold bars buried in your backyard! Or Bitcoin! </trolling>

2Said Achmiz9mo

Well, here’s a question: what happens more often—stock market downturns, or banks going bust? Now this is simply an invalid extrapolation. Note that I made no claims along these lines about what does or does not supposedly follow. Claims like “X reasoning is invalid” / “Y plan is unlikely to work” stand on their own; “what is the correct reasoning” / “what is a good plan” is a wholly separate question.

2Shankar Sivarajan9mo

This is perfectly sound reasoning. What does applying it to people prophesying doom, arising from technological advance or otherwise, yield?

3Said Achmiz9mo

Well, people prophesying doom in general have a pretty poor track record, so if that’s all we know, our prior should be that any such person is likely to be very wrong. Of course, most people throughout history who have prophesied doom have had in mind a religious sort of doom. People prophesying doom from technological advance specifically have a better track record. The Luddites were correct, for example. (Their chosen remedy left something to be desired, of course; but that is common, sadly. Identifying the problem does not, by itself, suffice to solve the problem.) And we’ve had quite a bit of doom from technological advance. Indeed, as technology has advanced, we’ve had more and more doom from that advance. So, on the whole, I’d say that applying the reasoning I describe to people prophesying doom from technological advance is that there is probably something to what they say, even if their specific predictions are not spot-on.

3Shankar Sivarajan9mo

You consider some people's jobs being automated an instance of "doom"?

4Said Achmiz9mo

This is in reference to the Luddites, I suppose? If so, “some people’s jobs being automated” is rather a glib description of the early effects of industrialization. There was considerable disruption and chaos, which, indeed, is “doom”, of more or less the sort that the Luddites predicted. (They never claimed that the world would end as a result of the new machines, as far as I know.)

[-]mike_hawke9mo128

Only praise yourself as taking 'the outside view' if (1) there's only one defensible choice of reference class;

I think this point is underrated. The word "the" in "the outside view" is sometimes doing too much work, and it is often better to appeal to an outside view, or multiple outside views.

9Andrew McKnight9mo

lukeprog argued similarly that we should drop the "the"

[-]Ben Pace9mo120

Crossposted from where?

7Eli Tyre9mo

Twitter. https://threadreaderapp.com/thread/1767710372306530562.html <= This link will take you to the thread, but NOT hosted on twitter.

3kave9mo

Twitter

[-]Ben Pace9mo118

K. I recommend that people include links for those of us who mostly do not read Twitter.

[-]titotal9mo11-2

The basic premise of this post is wrong, based on the strawman that an empiricist/scientist would only look at a single piece of information. You have the empiricist and scientists just looking at the returns on investment on bankmans scheme, and extrapolating blindly from there.

But an actual empiricist looks at all the empirical evidence. They can look the average rate of return of a typical investment, noting that this one is unusually high.They can learn how the economy works and figure out if there are any plausible mechanisms for this kind of economic returns. They can look up economic history, and note that Ponzi schemes are a thing that exists and happen reasonably often. From all the empirical evidence, the conclusion "this is a Ponzi scheme" is not particularly hard to arrive at.

Your "scientist" and "empricist" characters are neither scientists nor empiricists: they are blathering morons.

As for AI risk, you've successfully knocked down the very basic argument that AI must be safe because it hasn't destroyed us yet. But that is not the core of any skeptics argument that I know.

Instead, an actual empiricist skeptic might look at the actual empirical e... (read more)

5habryka9mo

I don't think this essay is intended to make generalizations to all "Empiricists", scientists, and "Epistemologists". It's just using those names as a shorthand for three types of people (whose existence seems clear to me, though of course their character does not reflect everyone who might identify under that label).

[-]tailcalled9mo31

You've got to think about what might be going on behind the scenes, in both cases.

But a tricky bit with AI is that it involves innovating fundamentally new ways of doing things. The methods we already have are not sufficient to create ASI, and also if you extrapolate out the SOTA methods at larger scale, it's genuinely not that dangerous. Rather with AI, we imagine that people will make up new things behind the scenes which is radically different from what we have so far, or that what we have so far will turn out to be much more powerful due to being radically different from how we understand it today.

3dxu9mo

I think I like the disjunct “If it’s smart enough to be transformative, it’s smart enough to be dangerous”, where the contrapositive further implies competitive pressures towards creating something dangerous (as opposed to not doing that). There’s still a rub here—namely, operationalizing “transformative” in such a way as to give the necessary implications (both “transformative -> dangerous” and “not transformative -> competitive pressures towards capability gain”). This is where I expect intuitions to differ the most, since in the absence of empirical observations there seem multiple consistent views.

2tailcalled9mo

That (on it's own, without further postulates) is a fully general argument against improving intelligence. We have to accept some level of danger inherent in existence; the question is what makes AI particularly dangerous. If this special factor isn't present in GPT+DPO, then GPT+DPO is not an AI notkilleveryoneism issue.

2dxu9mo

Well, it's a primarily a statement about capabilities. The intended construal is that if a given system's capabilities profile permits it to accomplish some sufficiently transformative task, then that system's capabilities are not limited to only benign such tasks. I think this claim applies to most intelligences that can arise in a physical universe like our own (though necessarily not in all logically possible universes, given NFL theorems): that there exists no natural subclass of transformative tasks that includes only benign such tasks. (Where, again, the rub lies in operationalizing "transformative" such that the claim follows.) I'm not sure how likely GPT+DPO (or GPT+RLHF, or in general GPT-plus-some-kind-of-RL) is to be dangerous in the limits of scaling. My understanding of the argument against, is that the base (large language) model derives most (if not all) of its capabilities from imitation, and the amount of RL needed to elicit desirable behavior from that base set of capabilities isn't enough to introduce substantial additional strategic/goal-directed cognition compared to the base imitative paradigm, i.e. the amount and kinds of training we'll be doing in practice are more likely to bias the model towards behaviors that were already a part of the base model's (primarily imitative) predictive distribution, than they are to elicit strategic thinking de novo. That strikes me as substantially an empirical proposition, which I'm not convinced the evidence from current models says a whole lot about. But where the disjunct I mentioned comes in, isn't an argument for or against the proposition; you can instead see it as a larger claim that parametrizes the class of systems for which the smaller claim might or might not be true, with respect to certain capabilities thresholds associated with specific kinds of tasks. And what the larger claim says is that, to the extent that GPT+DPO (and associated paradigms) fail to produce reasoners which could (in terms

2tailcalled9mo

What I'm saying is that if GPT+DPO creates imitation-based intelligences that can be dangerous due to being intentionally instructed to do something bad ("hey, please kill that guy" and then it kills him), then that's not particularly concerning from an AI alignment perspective, because it has a similar danger profile to telling humans this. You would still want policy to govern it, similar to how we have policy to govern human-on-human violence, but it's not the kind of x-risk that notkilleveryoneism is about. So basically you can have "GPT+DPO is superintelligent, capable and dangerous" without having "GPT+DPO is an x-risk". That said, I expect GPT+DPO to be stagnate and be replaced by something else, and that something else could be an x-risk (and conditional on the negation of natural impact regularization, I strongly expect it would be).

2dxu9mo

To the extent that I buy the story about imitation-based intelligences inheriting safety properties via imitative training, I correspondingly expect such intelligences not to scale to having powerful, novel, transformative capabilities—not without an amplification step somewhere in the mix that does not rely on imitation of weaker (human) agents. Since I believe this, that makes it hard for me to concretely visualize the hypothetical of a superintelligent GPT+DPO agent that nevertheless only does what is instructed. I mostly don't expect to be able to get to superintelligence without either (1) the "RL" portion of the GPT+RL paradigm playing a much stronger role than it does for current systems, or (2) using some other training paradigm entirely. And the argument for obedience/corrigibility becomes weaker/nonexistent respectively in each of those cases. Possibly we're in agreement here? You say you expect GPT+DPO to stagnate and be replaced by something else; I agree with that. I merely happen to think the reason it will stagnate is that its safety properties don't come free; they're bought and paid for by a price in capabilities.

2tailcalled9mo

Are we using the word "transformative" in the same way? I imagine that if society got reorganized into e.g. AI minds that hire tons of people to continually learn novel tasks that it can then imitate, that would be considered transformative because it would entirely change people's role in society, like the agricultural revolution did. Whereas right now very few people have jobs that are explicitly about pushing the frontier of knowledge, in the future that might be ~the only job that exists (conditional on GPT+DPO being the future, which again is not a mainline scenario).

2ChristianKl9mo

One core problem with AI is that it's not just "people" who make up new things behind teh scenes but AI itself that will make up new things.

[-]Signer9mo32

I agree that this should be said, but there is also actual disagreement about which theory is better.

Getting reliable 20% returns every year is really quite amazingly hard.

Foundations for analogous arguments about future AI systems are not sufficiently understood - I mean, maybe we can get very capable system that optimise softly like current systems.

And then the AI companies, if they’re allowed to keep selling those—we have now observed—just brute-RLHF their models into not talking about that. Which means we can’t get any trustworthy observations of

... (read more)

1Tapatakt9mo

As I understand, interpretability research doesn't exactly got stuck, but it's very-very-very far from something like this even for not-SotA models. And the gap is growing.

[-]cubefox9mo2-3

There does actually seem to be a simple and general rule of extrapolation that can be used when no other data is available: If a trend has so far held for some timespan t, it will continue to hold, in expectation, for another timespan t, and then break down.

In other words, if we ask ourselves how long an observed trend will continue to hold, it does seem, absent further data, a good indifference assumption to think that we are currently in the middle of the trend; that we have so far seen half of it.

Of course it is possible that we are currently near the b... (read more)

6Zac Hatfield-Dodds9mo

Trivially true to the extent that you are about equally likely to observe a thing throughout that timespan; and the Lindy Effect is at least regularly talked of. But there are classes of observations for which this is systematically wrong: for example, most people who see a ship part-way through a voyage will do so while it's either departing or arriving in port. Investment schemes are just such a class, because markets are usually up to the task of consuming alpha and tend to be better when the idea is widely known - even Buffett's returns have oscillated around the index over the last few years!

5tailcalled9mo

Another reason investment schemes are an exception is because they grow exponentially. This probably means you are much more likely to see them at their peak than at a random time.

1cubefox9mo

Yeah, one has to correct, when possible, for likelihood of observing a particular part of the lifetime of the trend. Though absent any further information our probability distribution should arguably be even. Which does suggest there is indeed a sort of "straight rule" of induction when extrapolating trends, as the scientist in the dialogue suspected. It is just that it serves as a weak prior that is easily changed by additional information.

[-]Kieren5mo10

Fun read! I was surprised that the spokesperson kept up with the conversation as well as he did 🙂

Of course there's an Art of when to trust more in less complicated reasoning -- an Art of when to pay attention to data more narrowly in a domain and less to inferences from generalizations on data from wider domains

I would like to try and expand on what that Art is. The Spokesperson is offering up an inductive argument. For an inductive argument to be any good I believe it requires something like the following.

A theory or general rule that it is attempting to

... (read more)

[-]Tapatakt9mo11

Which concept they might obtain by reading my book on Highly Advanced Epistemology 101 For Beginners, or maybe just my essay on Local Validity as a Key to Sanity and Civilization, I guess?"

Perhaps, there should be two links here?

[-]Review Bot9mo10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

Moderation Log