All of gallabytes's Comments + Replies

the track record of people trying to broadly ensure that humanity continues to be in control of the future

What track record?

gallabytes1610

But do they also generalize out of training distribution more similarly? If so, why?

Neither of them is going to generalize very well out of distribution, and to the extent they do it will be via looking for features that were present in-distribution. The old adage "to imagine 10-dimensional space, first imagine 3-space, then say 10 really hard".

My guess is that basically every learning system which tractably approximates Bayesian updating on noisy high dimensional data is going to end up with roughly Gaussian OOD behavior. There's been some experiments ... (read more)

adversarial examples definitely still exist but they'll look less weird to you because of the shape bias.

anyway this is a random visual model, raw perception without any kind of reflective error correction loop, I'm not sure what you expect it to do differently, or what conclusion you're trying to draw from how it does behave? the inductive bias doesn't precisely match human vision, so it has different mistakes, but as you scale both architectures they become more similar. that's exactly what you'd expect for any approximately Bayesian setup.

the shape bias... (read more)

2Wei Dai
I can certainly understand that as you scale both architectures, they both make less mistakes on distribution. But do they also generalize out of training distribution more similarly? If so, why? Can you explain this more? (I'm not getting your point from just "approximately Bayesian setup".) This is also confusing/concerning for me. Why would it be necessary or helpful to have such a large dataset to align the shape/texture bias with humans?

Scale basically solves this too, with some other additions (not part of any released version of MJ yet) really putting a nail in the coffin, but I can't say too much here w/o divulging trade secrets. I can say that I'm surprised to hear that SD3 is still so much worse than Dalle3, Ideogram on that front - I wonder if they just didn't train it long enough?

gallabytes113

They put too much emphasis on high frequency features, suggesting a different inductive bias from humans.

This was found to not be true at scale! It doesn't even feel that true w/weaker vision transformers, seems specific to convnets. I bet smaller animal brains have similar problems.

2Wei Dai
Do you know if it is happening naturally from increased scale, or only correlated with scale (people are intentionally trying to correct the "misalignment" between ML and humans of shape vs texture bias by changing aspects of the ML system like its training and architecture, and simultaneously increasing scale)? I somewhat suspect the latter due the existence of a benchmark that the paper seems to target ("humans are at 96% shape / 4% texture bias and ViT-22B-384 achieves a previously unseen 87% shape bias / 13% texture bias"). In either case, it seems kind of bad that it has taken a decade or two to get to this point from when adversarial examples were first noticed, and it's unclear whether other adversarial examples or "misalignment" remain in the vision transformer. If the first transformative AIs don't quite learn the right values due to having a different inductive bias from humans, it may not matter much that 10 years later the problem would be solved.

Order matters more at smaller scales - if you're training a small model on a lot of data and you sample in a sufficiently nonrandom manner, you should expect catastrophic forgetting to kick in eventually, especially if you use weight decay.

I think I can just tell a lot of stuff wrt human values! How do you think children infer them? I think in order for human values to not be viable to point to extensionally (ie by looking at a bunch of examples) you have to make the case that they're much more built-in to the human brain than seems appropriate for a species that can produce both Jains and (Genghis Khan era) Mongols.

 

I'd also note that "incentivize" is probably giving a lot of the game away here - my guess is you can just pull them out much more directly by gathering a large dataset of human preferences and predicting judgements.

4jessicata
If you define "human values" as "what humans would say about their values across situations", then yes, predicting "human values" is a reasonable training objective. Those just aren't really what we "want" as agents, and agentic humans would have motives not to let the future be controlled by an AI optimizing for human approval. That's also not how I defined human values, which is based on the assumption that the human brain contains one or more expected utility maximizers. It's possible that the objectives of these maximizers are affected by socialization, but they'll be less affected by socialization than verbal statements about values, because they're harder to fake so less affected by preference falsification. Children learn some sense of what they're supposed to say about values, but have some pre-built sense of "what to do / aim for" that's affected by evopsych and so on. It seems like there's a huge semantic problem with talking about "values" in a way that's ambiguous between "in-built evopsych-ish motives" and "things learned from culture about what to endorse", but Yudkowsky writing on complexity of value is clearly talking about stuff affected by evopsych. I think it was a semantic error for the discourse to use the term "values" rather than "preferences". In the section on subversion I made the case that terminal values make much more difference in subversive behavior than compliant behavior. It seems like to get at the values of approximate utility maximizers located in the brain you would need something like Goal Inference as Inverse Planning rather than just predicting behavior.

Why do you expect it to be hard to specify given a model that knows the information you're looking for? In general the core lesson of unsupervised learning is that often the best way to get pointers to something you have a limited specification for is to learn some other task that necessarily includes it then specialize to that subtask. Why should values be any different? Broadly, why should values be harder to get good pointers to than much more complicated real-world tasks?

3jessicata
How would you design a task that incentivizes a system to output its true estimates of human values? We don't have ground truth for human values, because they're mind states not behaviors. Seems easier to create incentives for things like "wash dishes without breaking them", you can just tell.

yeah I basically think you need to construct the semantic space for this to work, and haven't seen much work on that front from language modeling researchers.

drives me kinda nuts because I don't think it would actually be that hard to do, and the benefits might be pretty substantial.

Can you give an example of a theoretical argument of the sort you'd find convincing? Can be about any X caring about any Y.

3DanielFilan
Not sure how close you want it to be but how about this example: "animals will typically care about their offspring's survival and reproduction in worlds where their action space is rich enough for them to be helpful and too rich for them to memorize extremely simple heuristics, because if they didn't their genes wouldn't propagate as much". Not air-tight, and also I knew the stylized fact before I heard the argument so it's a bit unfair, but I think it's pretty good as it goes.
1O O
Testing it on out of distribution examples seems helpful. If an AI still acts as if it follows human values out of distribution, it probably truly cares about human values. For AI with situational awareness, we can probably run simulations to an extent (and probably need bootstrap this after a certain capabilities threshold)

On the impossible-to-you world: This doesn’t seem so weird or impossible to me? And I think I can tell a pretty easy cultural story slash write an alternative universe novel where we honor those who maximize genetic fitness and all that, and have for a long time—and that this could help explain why civilization and our intelligence developed so damn slowly and all that. Although to truly make the full evidential point that world then has to be weirder still where humans are much more reluctant to mode shift in various ways. It’s also possible this points

... (read more)

In case it is not clear: My expectation is that sufficiently large capabilities/intelligence/affordances advances inherently break our desired alignment properties under all known techniques.

Nearly every piece of empirical evidence I've seen contradicts this - more capable systems are generally easier to work with in almost every way, and the techniques that worked on less capable versions straightforwardly apply and in fact usually work better than on less intelligent systems.

2ryan_greenblatt
Presumably you agree this would become false if the system was deceptively aligned or otherwise scheming against us? Perhaps the implicit claim is that we should generalize from current evidence toward thinking the deceptive alignment is very unlikely? I also think it's straightforward to construct cases where goodharting implies that applying the technique you used for a less capable model onto a more capable model would result in worse performance for the more capable model. I think it should be straightforward to construct such a case using scaling laws for reward model overoptimization. (That said, I think if you vary the point of early stopping as models get more capable then you likely get strict performance improvements on most tasks. But, regardless there is a pretty reasonable technique of "train for duration X" which clearly gets worse performance in realistic cases as you go toward more capable systems.)

When I explain my counterargument to pattern 1 to people in person, they will very often try to "rescue" evolution as a worthwhile analogy for thinking about AI development. E.g., they'll change the analogy so it's the programmers who are in a role comparable to evolution, rather than SGD.

In general one should not try to rescue intuitions, and the frequency of doing this is a sign of serious cognitive distortions. You should only try to rescue intutions when they have a clear and validated predictive or pragmatic track record.

The reason for this is very... (read more)

The obvious question here is to what degree do you need new techniques vs merely to train new models with the same techniques as you scale current approaches.

 

One of the virtues of the deep learning paradigm is that you can usually test things at small scale (where the models are not and will never be especially smart) and there's a smooth range of scaling regimes in between where things tend to generalize.

 

If you need fundamentally different techniques at different scales, and the large scale techniques do not work at intermediate and small scal... (read more)

It's more like calling a human who's as smart as you are and directly plugged into your brain and in fact reusing your world model and train of thought directly to understand the implications of your decision. That's a huge step up from calling a real human over the phone!

The reason the real human proposal doesn't work is that

  1. the humans you call will lack context on your decision
  2. they won't even be able to receive all the context
  3. they're dumber and slower than you so even if you really could write out your entire chain of thoughts and intuitions consulting them for every decision would be impractical

Note that none of these considerations apply to integrated language models!

To pick a toy example, you can use text as a bottleneck to force systems to "think out loud" in a way which will be very directly interpretable by a human reader, and because language understanding is so rich this will actually be competitive with other approaches and often superior.

I'm sure you can come up with more ways that the existence of software that understands language and does ~nothing else makes getting computers to do what you mean easier than if software did not understand language. Please think about the problem for 5 minutes. Use a clock.

I appreciate the example!

Are you claiming that this example solves "a major part of the problem" of alignment? Or that, e.g., this plus four other easy ideas solve a major part of the problem of alignment?

Examples like the Visible Thoughts Project show that MIRI has been interested in research directions that leverage recent NLP progress to try to make inroads on alignment. But Matthew's claim seems to be 'systems like GPT-4 are grounds for being a lot more optimistic about alignment', and your claim is that systems like these solve "a major part of the pr... (read more)

ML models in the current paradigm do not seem to behave coherently OOD but I'd bet for nearly any metric of "overall capability" and alignment that the capability metric decays faster vs alignment as we go further OOD.

 

See https://arxiv.org/abs/2310.00873 for an example of the kinds of things you'd expect to see when taking a neural network OOD. It's not that the model does some insane path-dependent thing, it collapses to entropy. You end up seeing a max-entropy distribution over outputs not goals. This is a good example of the kind of thing that's o... (read more)

0rotatingpaguro
<snark> Your models of intelligent systems collapse to entropy on OOD intelligence levels. </snark>

Historically you very clearly thought that a major part of the problem is that AIs would not understand human concepts and preferences until after or possibly very slightly before achieving superintelligence. This is not how it seems to have gone.

 

Everyone agrees that you assumed superintelligence would understand everything humans understand and more. The dispute is entirely about the things that you encounter before superintelligence. In general it seems like the world turned out much more gradual than you expected and there's information to be found in what capabilities emerged sooner in the process.

Reply93311

AI happening through deep learning at all is a huge update against alignment success, because deep learning is incredibly opaque.  LLMs possibly ending up at the center is a small update in favor of alignment success, because it means we might (through some clever sleight, this part is not trivial) be able to have humanese sentences play an inextricable role at the center of thought (hence MIRI's early interest in the Visible Thoughts Project).

The part where LLMs are to predict English answers to some English questions about values, and show common-se... (read more)

5hairyfigment
This would make more sense if LLMs were directly selected for predicting preferences, which they aren't. (RLHF tries to bridge the gap, but this apparently breaks GPT's ability to play chess - though I'll grant the surprise here is that it works at all.) LLMs are primarily selected to predict human text or speech. Now, I'm happy to assume that if we gave humans a D&D-style boost to all mental abilities, each of us would create a coherent set of preferences from our inconsistent desires, which vary and may conflict at a given time even within an individual. Such augmented humans could choose to express their true preferences, though they still might not. If we gave that idealized solution to LLMs, it would just boost their ability to predict what humans or augmented humans would say. The augmented-LLM wouldn't automatically care about the augmented-human's true values. While we can loosely imagine asking LLMs to give the commands that an augmented version of us would give, that seems to require actually knowing how to specify how a D&D ability-boost would work for humans - which will only resemble the same boost for AI at an abstract mathematical level, if at all. It seems to take us back to the CEV problem of explaining how extrapolation works. Without being able to do that, we'd just be hoping a better LLM would look at our inconsistent use of words like "smarter," and pick the out-of-distribution meaning we want, for cases which have mostly never existed. This is a lot like what "Complexity of Wishes" was trying to get at, as well as the longstanding arguments against CEV. Vaniver's comment seems to point in this same direction. Now, I do think recent results are some evidence that alignment would be easier for a Manhattan Project to solve. It doesn't follow that we're on track to solve it.
8Garrett Baker
I do not necessarily disagree or agree, but I do not know which source you derive "very clearly" from. So do you have any memory which could help me locate that text?

Historically you very clearly thought that a major part of the problem is that AIs would not understand human concepts and preferences until after or possibly very slightly before achieving superintelligence. This is not how it seems to have gone.

"You very clearly thought that was a major part of the problem" implies that if you could go to Eliezer-2008 and convince him "we're going to solve a lot of NLP a bunch of years before we get to ASI", he would respond with some version of "oh great, that solves a major part of the problem!". Which I'm pretty sure ... (read more)

We should clearly care if their arguments were wrong in the past, especially if they were systematically wrong in a particular direction, as it's evidence about how much attention we should pay to their arguments now. At some point if someone is wrong enough for long enough you should discard their entire paradigm and cease to privilege hypotheses they suggest, until they reacquire credibility through some other means e.g. a postmortem explaining what they got wrong and what they learned, or some unambiguous demonstration of insight into the domain they're talking about.

Sure, a stop button doesn't have the issues I described, as long as it's used rarely enough. If it's too commonplace then you should expect similar effects on safety to eg CEQA's effects on infrastructure innovation. Major projects can only take on so much risk, and the more non-technical risk you add the less technical novelty will fit into that budget.

This line from the proposed "Responsible AI Act" seems to go much further than a stop button though?

Require advanced AI developers to apply for a license & follow safety standards.

Where do these saf... (read more)

It depends on the form regulation takes. The proposal here requires approval of training runs over a certain scale, which means everything is banned at that scale, including safety techniques, with exceptions decided by the approval process.

What would your plan be to ensure that this kind of regulation actually net-improves safety? The null hypothesis for something like this is that you'll empower a bunch of bureaucrats to push rules that are at least 6 months out of date under conditions of total national emergency where everyone is watching, and years to decades out of date otherwise.

This could be catastrophic! If the only approved safety techniques are as out of date as the only approved medical techniques, AI regulation seems like it should vastly increase P(doom) at the point that TAI is developed.

6Zach Stein-Perlman
It's hard for me to imagine regulators having direct authority to decline to license big training runs but instead decide to ban safety techniques. In fact, I can't think of a safety technique that could plausibly be banned in ~any context. Some probably exist, but they're not a majority.

Which brings me to my main disagreement with bottom-up approaches: they assume we already have a physics theory in hand, and are trying to locate consciousness within that theory. Yet, we needed conscious observations, and at least some preliminary theory of consciousness, to even get to a low-level physics theory in the first place. Scientific observations are a subset of conscious experience, and the core task of science is to predict scientific observations; this requires pumping a type of conscious experience out of a physical theory, which requires at

... (read more)
4jessicata
Not sure how satisfying this is, but here's a rough sketch: Anthropically, the meat we're paying attention to is meat that implements an algorithm that has general cognition including the capacity of building physics theories from observations. Such meat may become more common either due to physics theories being generally useful or general cognition that does physics among other things being generally useful. The algorithm on the meat selects theories of physics that explain their observations. To explain the observations, the physics theory has to bridge between the subject matter of physics and the observational inputs to the algorithm that are used to build and apply the theory. The thing that is bridged to isn't, according to the bridging law, identical to the subject matter of low level physics (atoms or whatever), and there also isn't a very simple translation, although there is a complex translation. The presence of a complex but not simple load-bearing translation motivates further investigation to find a more parsimonious theory. Additionally, there are things the algorithm implemented on the meat does other than building physics theories that use similar algorithmic infrastructure to the infrastructure that builds physics theories from observations. It is therefore parsimonious to posit that there is a broader class of entity than "physical observation" that includes observations not directly used to build physical theories, due to natural category considerations. "Experience" seems a fitting name for such a class.

This seems especially unlikely to work given it only gives a probability. You know what you call someone whose superintelligent AI does what they want 95% of the time? Dead.

 

if you can get it to do what you want even 51% of the time and make that 51% independent on each sampling (it isn't, so in practice you'd like some margin, but 95% is actually a lot of margin!) you can get arbitrarily good compliance by creating AI committees and taking a majority vote.

2Zvi
Yep, if you can make it a flat 51% that's a victory condition but I would be shocked if that is how any of this works.

that paper is one of many claiming some linear attention mechanism that's as good as full self attention. in practice they're all sufficiently much worse that nobody uses them except the original authors in the original paper, usually not even the original authors in subsequent papers.

the one exception is flash attention, which is basically just a very fancy fused kernel for the same computation (actually the same, up to numerical error, unlike all these "linear attention" papers).

4, 5, and 6 are not separate steps - when you only have 1 example, the bits to find an input that generates your output are not distinct from the bits specifying the program that computes output from input.

Yeah my guess is that you almost certainly fail on step 4 - an example of a really compact ray tracer looks like it fits in 64 bytes. You will not do search over all 64 byte programs. Even if you could evaluate 1 of them per atom per nanosecond using every atom in the universe for 100 billion years, you'd only get 44.6 bytes of search.

Let's go with something more modest and say you get to use every atom in the milky way for 100 years, and it takes about 1 million atom-seconds to check a single program. This gets you about 30 bytes of search.

Priors over pro... (read more)

3faul_sname
I was not imagining you would, no. I was imagining something more along the lines of "come up with a million different hypotheses for what a 2-d grid could encode", which would be tedious for a human but would not really require extreme intelligence so much as extreme patience, and then for each of those million hypotheses try to build a model, and iteratively optimize for programs within that model for closeness to the output. I expect, though I cannot prove, that "a 2d projection of shapes in a 3d space" is a pretty significant chunk of the hypothesis space, and that all of the following hypotheses would make it into the top million 1. The 2-d points represent a rectangular grid oriented on a plane within that 3-d space. The values at each point are determined by what the plane intersects. 2. The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent distance. 3. The 2-d points represent a grid oriented on a plane within that 3-d space. The values at each point are determined by drawing lines orthogonal to the plane and seeing what they intersect and where. The values represent something else about what the lines intersect. 4-6. Same as 1-3, but with a cylinder. 6-9. Same as 1-3, but with a sphere. 10-12. Same as 1-3, but with a torus. 13-21: Same as 4-12 but the rectangular grid is projected onto only part of the cylinder/sphere/torus instead of onto the whole thing. I am not a superintelligence though, nor do I have any special insight into what the universal prior looks like, so I don't know if that's actually a reasonable assumption or whether it's an entity embedded in a space that detects other things within that space using signals privileging the hypothesis that "a space where it's possible to detect other stuff in that space using signals" is a simple construct. If the size-optimal scene suc
2TekhneMakre
If you grant the image being reconstructed, then 2 dimensional space is already in the cards. It's not remotely 64 bits to make the leap to 3d space projected to 2d space. The search doesn't have to be "search all programs in some low-level encoding", it can be weighted on things that are mathematically interesting / elegant (which is a somewhat a priori feature).

This response is totally absurd. Your human priors are doing an insane amount of work here - you're generating an argument for the conclusion, not figuring out how you would privilege those hypotheses in the first place.

See that the format describes something like a grid of cells, where each cell has three [something] values.

This seems maybe possible for png (though it could be hard - the pixel data will likely be stored as a single contiguous array, not a bunch of individual rows, and it will be run-length encoded, which you might be able to figure ou... (read more)

3localdeity
I think a decent candidate for what a sufficiently great mind would do, in the absence of priors other than its own existence and the data fed to it... is to enumerate universes with different numbers of dimensions and different fundamental forces and values of physical constants and initial conditions, and see which of them are likely to produce it and the data fed to it.  Which, at least in our case, means "a universe in which intelligent life spontaneously developed and made computers". There was a book, Flatland, describing a fictional 2D world.  One of the issues is... you can't have things like digestive tracts that pass all the way through you, unless you consist of multiple non-connected pieces.  I'm not sure I can rule it out entirely—after all, 2D cellular automata can be Turing-complete, and can therefore simulate anything you like—but it seems possible that a sufficiently great mind could say that no 2D universe with laws of physics resembling our own could support life. Is it actually the case that Occam's razor would prefer "A universe, such as a 3-space 1-time dimensional universe with the following physical constants within certain ranges and a Big Bang that looked like this, developed intelligent life and made me and this data" over "The universe is one big 2D cellular automaton that simulates me and this data, and contains nothing else"?  I dunno.  Kolmogorov complexity of a machine simulating the universe, I guess?  That seems like the right question even if I don't know the answer.

This seems maybe possible for png (though it could be hard - the pixel data will likely be stored as a single contiguous array, not a bunch of individual rows, and it will be run-length encoded, which you might be able to figure out but very well might not - and if it's jpg compressed this is even further from the truth).

I mean, once you've got your single continuous array it's pretty easy to notice "hey this pattern almost repeats every 1080 triplets". Getting from the raw data stream to your single continuous array might be very simple (if your video ... (read more)

10 million times faster is really a lot - on modern hardware, running SOTA object segmentation models at even 60fps is quite hard, and those are usually much much smaller than the kinds of AIs we would think about in the context of AI risk.

But - 100x faster is totally plausible (especially w/100x the energy consumption!) - and I think the argument still mostly works at that much more conservative speedup.

1the gears to ascension
it's completely implausible they'd run their entire processing system 10 million times faster, yeah. running a full brain costs heat, and that heat has to dissipate, there aren't speed shortcuts. our fastest neurons are on order 1000 hz, and our median neurons are on order 1hz. it's the fast paths through the network that affect fastest reasoning. the question, then, is how much a learning system can distill its most time-sensitive reasoning into the fast paths. eg, self-directed distillation of a skill into an accelerated network that calls out to the slower one. there's no need for a being to run their entire brain at speed. being able to generate a program to run at 1ghz that can outsmart a human's motor control is not difficult - flies are able to outsmart our motor control by running at a higher frequency despite being much smaller than us in every single way. this is the video I would link to show how much frequency matters: https://www.youtube.com/watch?v=Gvg242U2YfQ

for me it mostly felt like I and my group of closest friends were at the center of the world, with the last hope for the future depending on our ability to hold to principle. there was a lot of prophesy of varying qualities, and a lot of importance placed suddenly on people we barely knew then rapidly withdrawn when those people weren't up for being as crazy as we were.

4AnnaSalamon
Thanks.  Are you up for saying more about what algorithm (you in hindsight notice/surmise) you were following internally during that time, and how it did/didn't differ from the algorithm you were following during your "hyper-analytical programmer" times?

This seems roughly on point, but is missing a crucial aspect - whether or not you're currently a hyper-analytical programmer is actually a state of mind which can change. Thinking you're on one side when actually you've flipped can lead to some bad times, for you and others.

9jimrandomh
I'm genuinely uncertain whether this is true. The alternate hypothesis is that it's more of a skillset than a frame of mind, which mean that it can atrophy but only partially and only slowly.

I don't know how everyone else on LessWrong feels but I at least am getting really tired of you smugly dismissing others' attempts at moral reductionism wrt qualia by claiming deep philosophical insight you've given outside observers very little reason to believe you have. In particular, I suspect if you'd spent half the energy on writing up these insights that you've spent using the claim to them as a cudgel you would have at least published enough of a teaser for your claims to be credible.

But here Yudkowsky gave a specific model for how qualia, and other things in the reference class "stuff that's pointing at something but we're confused about what", is mistaken for convergently instrumental stuff. (Namely: pointers point both to what they're really trying to point to, but also somewhat point at simple things, and simple things tend to be convergently instrumental.) It's not a reduction of qualia, and a successful reduction of qualia would be much better evidence that an unsuccessful reduction of qualia is unsuccessful, but it's still a logical relevant argument and a useful model.

3Rob Bensinger
I'd love to read an EY-writeup of his model of consciousness, but I don't see Eliezer invoking 'I have a secret model of intelligence' in this particular comment. I don't feel like I have a gears-level understanding of what consciousness is, but in response to 'qualia must be a convergently instrumental because it probably involves one or more of (Jessica's list)', these strike me as perfectly good rejoinders even if I assume that neither I nor anyone else in the conversation has a model of consciousness: * Positing that qualia involves those things doesn't get rid of the confusion re qualia. * Positing that qualia involve only simple mechanisms that solve simple problems (hence more likely to be convergently instrumental) is a predictable bias of early wrong guesses about the nature of qualia, because the simple ideas are likely to come to mind first, and will seem more appealing when less of our map (with the attendant messiness and convolutedness of reality) is filled in. E.g., maybe humans have qualia because of something specific about how we evolved to model other minds. In that case, I wouldn't start with a strong prior that qualia are convergently instrumental (even among mind designs developed under selection pressure to understand humans). Because there are lots of idiosyncratic things about how humans do other-mind-modeling and reflection (e.g., the tendency to feel sad yourself when you think about a sad person) that are unlikely to be mirrored in superintelligent AI.

I disagree that GPT’s job, the one that GPT-∞ is infinitely good at, is answering text-based questions correctly. It’s the job we may wish it had, but it’s not, because that’s not the job its boss is making it do. GPT’s job is to answer text-based questions in a way that would be judged as correct by humans or by previously-written human text. If no humans, individually or collectively, know how to align AI, neither would GPT-∞ that’s trained on human writing and scored on accuracy by human judges.

This is actually also an incorrect statement of GPT's jo... (read more)

I think this is a persistent difference between us but isn't especially relevant to the difference in outcomes here.

I'd more guess that the reason you had psychoses and I didn't had to do with you having anxieties about being irredeemably bad that I basically didn't at the time. Seems like this would be correlated with your feeling like you grew up in a Shin Sekai Yori world?

I clearly had more scrupulosity issues than you and that contributed a lot. Relevantly, the original Roko's Basilisk post is putting AI sci-fi detail on a fear I am pretty sure a lot of EAs feel/felt in their heart, that something nonspecifically bad will happen to them because they are able to help a lot of people (due to being pivotal on the future), and know this, and don't do nearly as much as they could. If you're already having these sorts of fears then the abstract math of extortion and so on can look really threatening.

hmm... this could have come down to spending time in different parts of MIRI? I mostly worked on the "world's last decent logic department" stuff - maybe the more "global strategic" aspects of MIRI work, at least the parts behind closed doors I wasn't allowed through, were more toxic? Still feels kinda unlikely but I'm missing info there so it's just a hunch.

5jessicata
My guess is that it has more to do with willingness to compartmentalize than part of MIRI per se. Compartmentalization is negatively correlated with "taking on responsibility" for more of the problem. I'm sure you can see why it would be appealing to avoid giving into extortion in real life, not just on whiteboards, and attempting that with a skewed model of the situation can lead to outlandish behavior like Ziz resisting arrest as hard as possible.

By latent tendency I don't mean family history, though it's obviously correlated. I claim that there's this fact of the matter about Jess' personality, biology, etc, which is that it's easier for her to have a psychotic episode than for most people. This seems not plausibly controversial.

I'm not claiming a gears-level model here. When you see that someone has a pattern of <problem> that others in very similar situations did not have, you should assume some of the causality is located in the person, even if you don't know how.

3Benquo
Listing "I don't know, some other reason we haven't identified yet" as an "obvious source" can make sense as a null option, but giving it a virtus dormitiva type name is silly. I think that Jessica has argued with some plausibility that her psychotic break was in part the result of taking aspects of the AI safety discourse more seriously and unironically than the people around her, combined with adversarial pressures and silencing. This seems like a gears-level model that might be more likely in people with a cognitive disposition correlated with psychosis.

Verbal coherence level seems like a weird place to locate the disagreement - Jessica maintained approximate verbal coherence (though with increasing difficulty) through most of her episode. I'd say even in October 2017, she was more verbally coherent than e.g. the average hippie or Catholic, because she was trying at all.

The most striking feature was actually her ability to take care of herself rapidly degrading, as evidenced by e.g. getting lost almost immediately after leaving her home, wandering for several miles, then calling me for help and having dif... (read more)

jefftk*209

I want to specifically highlight "A bunch of people we respected and worked with had decided the world was going to end, very soon, uncomfortably soon, and they were making it extremely difficult for us to check their work." I noticed this second-hand at the time, but didn't see any paths toward making things better. I think it had a really harmful effects on the community, and is worth thinking a lot about before something similar happens again.

Benquo100

When I got back into town and talked with Jessica, she was talking about how it might be wrong to take actions that might possibly harm others, i.e. pretty much any actions, since she might not learn fast enough for this to come out net positive. Seems likely to me that the content of Jessica's anxious perseveration was partly causally upstream of the anxious perseveration itself.

I agree that a decline in bodily organization was the main legitimate reason for concern. It seems obviously legitimate for Jessica (and me) to point out that Scott is proposing a... (read more)

jessicata*121

Thanks for giving your own model and description of the situation!

Regarding latent tendency, I don't have a family history of psychosis (but I do of bipolar), although that doesn't rule out latent tendency. It's unclear what "latent tendency" means exactly, it's kind of pretending that the real world is a 3-node Bayesian network (self tendency towards X, environment tendency towards inducing X, whether X actually happens) rather than a giant web of causality, but maybe there's some way to specify it more precisely.

I think the 4 factors you listed are the ... (read more)

There's this general problem of Rationalists splitting into factions and subcults with minor doctrinal differences, each composed of relatively elite members of The Community, each with a narrative of how they’re the real rationalists and the rest are just posers and/or parasites. And, they're kinda right. Many of the rest are posers, we have a mop problem.

There’s just one problem. All of these groups are wrong. They are in fact only slightly more special than their rival groups think they are. In fact, the criticisms each group makes of the epistemics and... (read more)

5Benquo
Same. I don't think I can exit a faction by declaration without joining another, but I want many of the consequences of this. I think I get to move towards this outcome by engaging nonfactional protocols more, not by creating political distance between me & some particular faction.

Even with that as the goal this model is useless - social distancing demonstrably does not lead to 0 new infections. Even Wuhan didn't manage that, and they were literally welding people's doors shut.

1Anon User
But don't you see - those infections are a second wave, so do not have to be counted. The model is almost tautologically true that way. But terribly misleading, and very irresponsibly so.

...they're ants. That's just not how ants work. For a myriad of reasons. The whole point of the post is that there isn't necessarily local deliberative intent, just strategies filling ecological niches.

4Dagon
They're not ants, they're hybrid ant-human metaphors. Ants don't talk and don't wonder if the grasshopper is right. Ants don't consider counterfactual cases of never having met the other colony. Metaphorical ants that _CAN_ do these things can also consider other strategies than war.

Of course, if you don’t like how an exponential curve fits the data, you can always change models—in this case, probably to a curve with 1 more free parameter (indicating a degree of slowdown of the exponential growth) or 2 more free parameters (to have 2 different exponentials stitched together at a specific point in time).

Oh that's actually a pretty good idea. Might redo some analysis we built on top of this model using that.

correct. edited to make this more obvious

This argument would make much more sense in a just world. Information that should damage someone is very different from information that will damage someone. With blackmail you're optimized to maximize damage to the target, and I expect tails to mostly come apart here. I don't see too many cases of blackmail replacing MeToo. When was the last time the National Enquirer was a valuable whistleblower?

EDIT: fixed some wording

0Benquo
Right now people covertly getting away with unobjectionable stuff are making it easy to punish honest people who do the thing openly. Plausible that the former should in fact pay costs for their complicity. The addendum to this Overcoming Bias post seems relevant:
1jessicata
What do you mean?
When trying to fit an exponential curve, don't weight all the points equally

We didn't. We fit a line in log space, but weighted the points by sqrt(y). The reason we did that is because it doesn't actually appear linear in log space.

This is what it looks like if we don't weight them. If you want to bite the bullet of this being a better fit, we can bet about it.

6Charlie Steiner
Interesting, thanks. This "unweighted" (on a log scale) graph looks a lot more like what I'd expect to be a good fit for a single-exponential model. Of course, if you don't like how an exponential curve fits the data, you can always change models - in this case, probably to a curve with 1 more free parameter (indicating a degree of slowdown of the exponential growth) or 2 more free parameters (to have 2 different exponentials stitched together at a specific point in time).
I'd optimize more for not making enemies or alienating people than for making people realize how bad the situation is or joining your cause.

Why isn't this a fully general argument for never rocking the boat?

3philh
You quoted the conclusion, not the argument. The argument is based on skepticism that rocking the boat will do much good.
Based on my models (such as this one), the chance of AGI "by default" in the next 50 years is less than 15%, since the current rate of progress is not higher than the average rate since 1945, and if anything is lower (the insights model linked has a bias towards listing recent insights).

Both this comment and my other comment are way understating our beliefs about AGI. After talking to Jessica about it offline to clarify our real beliefs rather than just playing games with plausible deniability, my actual probability is between 0.5 and 1% in the next 50 years. Jessica can confirm that hers is pretty similar, but probably weighted towards 1%.

I think I'm more skeptical than you are that it's possible to do much better (i.e., build functional information-processing institutions) before the world changes a lot for other reasons (e.g., superintelligent AIs are invented)

Where do you think the superintelligent AIs will come from? AFAICT it doesn't make sense to put more than 20% on AGI before massive international institutional collapse, even being fairly charitable to both AGI projects and prospective longevity of current institutions.

Huh, I notice I've not explicitly estimated my timeline distribution for massive international institutional collapse, and that I want to do that. Do you have any links to places where others/you have thought about it?

gallabytesΩ000

When considering an embedder , in universe , in response to which SADT picks policy , I would be tempted to apply the following coherence condition:

(all approximately of course)

I'm not sure if this would work though. This is definitely a necessary condition for reasonable counterfactuals, but not obviously sufficient.

I'm fairly interested but don't really want to be around children.

Alicorn140

How around is around, and can you say more about what about a baugruppe would satisfy your desiderata that the existing group house network can't?

gallabytesΩ000

By censoring I mean a specific technique for forcing the consistency of a possibly inconsistent set of axioms.

Suppose you have a set of deduction rules over a language . You can construct a function that takes a set of sentences and outputs all the sentences that can be proved in one step using and the sentences in . You can also construct a censored by letting .

Load More