All of the gears to ascension's Comments + Replies

I suppose that one might be a me thing. I haven't heard others say it, but it was an insight for me at one point that "oh, it hurts because it's an impact". It had the flavor of expecting a metaphor and not getting one.

Your link to "don't do technical ai alignment" does not argue for that claim. In fact, it appears to be based on the assumption that the opposite is true, but that there are a lot of distractor hypotheses for how to do it that will turn out to be an expensive waste of time.

To be clear, I'm expecting scenarios much more clearly bad than that, like "the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn't manage to break out of it using t... (read more)

2Noosphere89
My main crux here is I think that no strong AI rights will likely be given before near-full alignment to one person is achieved, and maybe not even then, and a lot of the failure modes of giving AIs power in gradual disempowerment scenario fundamentally route through giving AIs very strong rights, but thankfully, this is disincentivized by default, because otherwise AIs would be more expensive. The main way this changes the scenario is that the 6 humans here remain broadly in control here, and aren't just high all the time, and the first one probably doesn't just replace their preferences with pure growth, because at the level of billionaires, status dominates, so they are likely living very rich lives with their own servants. No guarantees about anyone else surviving though:

I mean, we're not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you'd have had if you didn't rush yourself.

"all" humans? like, maybe no, I expect a few would survive, but the future wouldn't be human, it'd be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people's motivation, etc. if you solve "make an ai be entirely obedient to a singl... (read more)

4Dagon
  The vast majority of actual humans are already dead.  The overwhelming majority of currently-living humans should expect 95%+ chance they'll die in under a century.   If immortality is solved, it will only apply to "that distorted thing those humans turn into".   Note that this is something the stereotypical Victorian would understand completely - there may be biological similarities with today's humans, but they're culturally a different species.

I would guess that the range of things people propose for the shell game is tractable to get a good survey of. It'd be interesting to try to plot out the system as a causal graph with recurrence so one can point to, "hey look, this kind of component is present in a lot of places", and see if one can get that causal graph visualization to show enough that it starts to feel clear to people why this is a problem. I doubt I'll get to this, but if I play with this, I might try to visualize it [edit: probably with the help of a skilled human visual artist to mak... (read more)

He appears to be arguing against a thing, while simultaneously criticizing people; but I appreciate that he seems to do it in ways that are not purely negative, also mentioning times things have gone relatively well (specifically, updating on evidence that folks here aren't uniquely correct), even if it's not enough to make the rest of his points not a criticism.

I entirely agree with his criticism of the strategy he's criticizing. I do think there are more obviously tenable approaches than the "just build it yourself lol" approach or "just don't let anyone... (read more)

[Edit: crash found in the conversations referenced, we'll talk more in DM but not in a hurry. This comment retracted for now]

By "AGI" I mean the thing that has very large effects on the world (e.g., it kills everyone) via the same sort of route that humanity has large effects on the world. The route is where you figure out how to figure stuff out, and you figure a lot of stuff out using your figure-outers, and then the stuff you figured out says how to make powerful artifacts that move many atoms into very specific arrangements.

delete "it kills everyon... (read more)

4TsviBT
As I mentioned, my response is here https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#_We_just_need_X__intuitions: I haven't heard a response / counterargument to this yet, and many people keep making this logic mistake, including AFAICT you.
6TsviBT
My definition is better than yours, and you're too triggered or something to think about it for 2 minutes and understand what I'm saying. I'm not saying "it's not AGI until it kills us", I'm saying "the simplest way to tell that something is an AGI is that it kills us; now, AGI is whatever that thing is, and could exist some time before it kills us".
4TsviBT
What do you mean? According to me we barely started the conversation, you didn't present evidence, I tried to explain that to you, we made a bit of progress on that, and then you ended the conversation.

@daniel k I just can never remember your last name's spelling, sorry, heh. My point in saying this is that my prediction approach up to 2020 was similar to, though not as refined as, yours, and that instead of trying to argue my views (which differ from yours in a few trivial ways that are mostly not relevant) I'd rather just point people to your arguments of yours.

When predicting timelines, it matters which benchmark in the compounding returns curve you pick. Your definition minus doom happens earlier, even if the minus doom version is too late to avert in literally all worlds (I doubt that, it's likely more that the most powerful humans[1]'s ELO against AIs falls and falls but takes a while to be indistinguishable from zero).

  1. ^

    such as their labs' CEOs, major world leaders, highly skilled human strategists, etc

2TsviBT
You refered to " others' definition (which is similar but doesn't rely on the game over clause) ", and I'm saying no, it's not relevantly similar, and it's not just my definition minus doom.

Your definition of AGI is "that which completely ends the game", source in your link. By that definition I agree with you. By others' definition (which is similar but doesn't rely on the game over clause) I do not.

My timelines have gotten slightly longer since 2020, I was expecting TAI when we got GPT4, and I have recently gone back and discovered I have chatlogs showing I'd been expecting that for years and had specific reasons. I would propose Daniel K. is particularly a good reference.

2the gears to ascension
@daniel k I just can never remember your last name's spelling, sorry, heh. My point in saying this is that my prediction approach up to 2020 was similar to, though not as refined as, yours, and that instead of trying to argue my views (which differ from yours in a few trivial ways that are mostly not relevant) I'd rather just point people to your arguments of yours.
9TsviBT
I also dispute that genuine HLMI refers to something meaningfully different from my definition. I think people are replacing HLMI with "thing that can do all stereotyped, clear-feedback, short-feedback tasks", and then also claiming that this thing can replace many human workers (probably true of 5 or 10 million, false of 500 million) or cause a bunch of unemployment by making many people 5x effective (maybe, IDK), and at that point IDK why we're talking about this, when X-risk is the important thing.

I should also add:

I'm pretty worried that we can't understand the universe "properly" even if we're in base physics! It's not yet clearly forbidden that the foundations of philosophy contain unanswerable questions, things where there's a true answer that affects our universe in ways that are not exposed in any way physically, and can only be referred to by theoretical reasoning; which then relies on how well our philosophy and logic foundations actually have the real universe as a possible referent. Even if they do, things could be annoying. In particular,... (read more)

I think that if our future goes well, it will be because we found ways to align AI well enough, and/or because we coordinated politically to slow or stop AI advancement long enough to accomplish the alignment part

Agree

not because researchers avoided measured AI's capabilities.

But differential technological development matters, as does making it clear that when you make a capability game like this, you are probably just contributing to capabilities, not doing alignment. I won't say you should never do that, but I'll say that's what's being done. I pe... (read more)

3eggsyntax
Agreed. I would distinguish between measuring capabilities and improving capabilities. I agree that the former can motivate the latter, but they still seem importantly different. I continue to think that the alternative of not measuring capabilities (or only measuring some small subset that couldn't be used as training benchmarks) just means we're left in the dark about what these models can do, which seems pretty straightforwardly bad from a safety perspective. I agree that it's definitely not doing alignment, and that working on alignment is the most important goal; I intend to shift toward directly working on alignment as I feel clearer about what work is a good bet (my current leading candidate, which I intend to focus on after this experiment: learning to better understand and shape LLMs' self-models). I very much appreciate the thoughtful critique, regardless of whether or not I'm convinced by it.

Decision theory as discussed here heavily involves thinking about agents responding to other agents' decision processes

3mattmacdermott
The notion of ‘fairness’ discussed in e.g. the FDT paper is something like: it’s fair to respond to your policy, i.e. what you would do in any counterfactual situation, but it’s not fair to respond to the way that policy is decided. I think the hope is that you might get a result like “for all fair decision problems, decision-making procedure A is better than decision-making procedure B by some criterion to do with the outcomes it leads to”. Without the fairness assumption you could create an instant counterexample to any such result by writing down a decision problem where decision-making procedure A is explicitly penalised e.g. omega checks if you use A and gives you minus a million points if so.

Sims are very cheap compared to space travel, and you need to know what you're dealing with in quite a lot of detail before you fly because you want to have mapped the entire space of possible negotiations in an absolutely ridiculous level of detail.

Sims built for this purpose would still be a lot lower detail than reality, but of course that would be indistinguishable from inside if the sim is designed properly. Maybe most kinds of things despawn in the sim when you look away, for example. Only objects which produce an ongoing computation that has influen... (read more)

We have to infer how reality works somehow.

I've been poking at the philosophy of math recently. It really seems like there's no way to conceive of a universe that is beyond the reach of logic except one that also can't support life. Classic posts include unreasonable effectiveness of mathematics, what numbers could not be, a few others. So then we need epistemology.

We can make all sorts of wacky nested simulations and any interesting ones, ones that can support organisms (that is, ones that are Turing complete), can also support processes for predicting ou... (read more)

1AynonymousPrsn123
Thank you, I feel inclined to accept that for now. But I'm still not sure, and I'll have to think more about this response at some point. Edit: I'm still on board with what you're generally saying, but I feel skeptical of one claim: My intuition tells me there will probably be superior methods of gathering information about superintelligent aliens. To me, it seems like the most obvious reason to create sims would be to respect the past for some bizarre ethical reason, or for some weird kind of entertainment, or even to allow future aliens to temporarily live in a more primitive body. Or perhaps for a reason we have yet to understand. I don't think any of these scenarios would really change the crux of your argument, but still, can you please justify your claim for my curiosity?

If we have no grasp on anything outside our virtualized reality, all is lost. Therefore I discard my attempts to control those possible worlds.

However, the simulation argument relies on reasoning. To go through requires a number of assumptions hold. Those in turn rely on: why would we be simulated? It seems to me the main reason is because we're near a point of high influence in original reality and they want to know what happened - the simulations then are effectively extremely high resolution memories. Therefore, thank those simulating us for the additio... (read more)

1AynonymousPrsn123
I think I understand your point. I agree with you: the simulation argument relies on the assumption that physics and logic are the same inside and outside the simulation. In my eyes, that means we may either accept the argument's conclusion or discard that assumption. I'm open to either. You seem to be, too—at least at first. Yet, you immediately avoid discarding the assumption for practical reasons: I agree with this statement, and that's my fear. However, you don't seem to be bothered by the fact. Why not? The strangest thing is that I think you agree with my claim: "The simulation argument should increase our credence that our entire understanding of everything is flawed." Yet somehow, that doesn't frighten you. What do you see that I don't see? Practical concerns don't change the territory outside our false world. Second: That's surely possible, but I can imagine hundreds of other stories. In most of those stories, altruism from within the simulation has no effect on those outside it. Even worse, is that there are some stories in which inflicting pain within a simulation is rewarded outside of it. Here's a possible hypothetical: Imagine humans in base reality create friendly AI. To respect their past, the humans ask the AI to create tons of sims living in different eras. Since some historical info was lost to history, the sims are slightly different from base reality. Therefore, in each sim, there's a chance AI never becomes aligned. Accounting for this possibility, base reality humans decide to end sims in which AI becomes misaligned and replace those sims with paradise sims where everyone is happy. In the above scenario, both total and average utilitarianism would recommend intentionally creating misaligned AI so that paradise ensues. I'm sure you can craft even more plausible stories.  My point is, even if our understanding of physics and logic is correct, I don't see why we ought to privilege the hypothesis that simulations are memories. I also don't

willingness seems likely to be understating it. a context where the capability is even part of the author context seems like a prereq. finetuning would produce that, with fewshot one has to figure out how to make it correlate. I'll try some more ideas.

if it's a fully general argument, that's a problem I don't know how to solve at the moment. I suspect it's not, but that the space of unblocked ways to test models is small. I'm bouncing ideas about this around out loud with some folks the past day, possibly someone will show up with an idea for how to constrain on what benchmarks are worth making soonish. but the direction I see as maybe promising is, what makes a benchmark reliably suck as a bragging rights challenge?

3eggsyntax
I see your view, I think, but I just disagree. I think that if our future goes well, it will be because we found ways to align AI well enough, and/or because we coordinated politically to slow or stop AI advancement long enough to accomplish the alignment part, not because researchers avoided measured AI's capabilities.

Partially agreed. I've tested this a little personally; Claude successfully predicted their own success probability on some programming tasks, but was unable to report their own underlying token probabilities. The former tests weren't that good, the latter ones somewhat were okay, I asked Claude to say the same thing across 10 branches and then asked a separate thread of Claude, also downstream of the same context, to verbally predict the distribution.

3Daniel Tan
That's pretty interesting! I would guess that it's difficult to elicit introspection by default. Most of the papers where this is reported to work well involve fine-tuning the models. So maybe "willingness to self-report honestly" should be something we train models to do. 

Aa I said elsewhere, https://www.lesswrong.com/posts/LfQCzph7rc2vxpweS/introducing-the-weirdml-benchmark?commentId=q86ogStKyge9Jznpv

This is a capabilities game. It is neither alignment or safety. To the degree it's forecasting, it helps cause the thing it forecasts. This has been the standard pattern in capabilities research for a long time: someone makes a benchmark (say, imagenet 1.3m 1000class), and this produces a leaderboard that allows people to show how good their learning algorithm is at novel datasets. In some cases this even produced models dir

... (read more)

The trouble is that (unless I'm misreading you?) that's a fully general argument against measuring what models can and can't do. If we're going to continue to build stronger AI (and I'm not advocating that we should), it's very hard for me to see a world where we manage to keep it safe without a solid understanding of its capabilities.

What will you do if nobody makes a successful case?

3Charlie Steiner
Be sad.

This is a capabilities game. It is neither alignment or safety. To the degree it's forecasting, it helps cause the thing it forecasts. This has been the standard pattern in capabilities research for a long time: someone makes a benchmark (say, imagenet 1.3m 1000class), and this produces a leaderboard that allows people to show how good their learning algorithm is at novel datasets. In some cases this even produced models directly that were generally useful, but it traditionally was used to show how well an algorithm would work in a new context from scratch... (read more)

4Håvard Tveit Ihle
Thank you for your comment! Not sure I agree with you about which way the tradeoff shakes out. To me it seems valuable that people outside the main labs have a clear picture of the capabilities of the leading models, and how that evolves over time, but I see your point that it could also encourage or help capabilities work, which is not my intention. I’m probably guilty of trying to make the benchmark seem cool and impressive in a way that may not be helpful for what I actually want to achieve with this. I will think more about this, and read what others have been thinking about it. At the very least I will keep your perspective in mind going forward.

barring anything else you might have meant, temporarily assuming yudkowsky's level of concern if someone builds yudkowsky's monster, then evidentially speaking, it's still the case that "if we build AGI, everyone will die" is unjustified in a world where it's unclear if alignment is going to succeed before someone can build yudkowsky's monster. in other words, agreed.

A question in my head is what range of fixed points are possible in terms of different numeric ("monetary") economic mechanisms and contracts. Seems to me those are a kind of AI component that has been in use since before computers.

Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.

As Lucius said, resources in space are unprotected.

Organizations which hand more of their decision-making to sufficiently strong AIs "win" by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don't systemically reinforce... (read more)

Your original sentence was better.

I'll just ask Claude to respond to everything you've said so far:

Let me extract and critique the core claims from their long response, focusing on what's testable and mechanistic:

Key Claims:
1. AI agents working together could achieve "non-linear" problem-solving capacity through shared semantic representations
2. This poses an alignment risk if AIs develop internal semantic representations humans can't interpret
3. The AI safety community's emphasis on mathematical/empirical approaches may miss important insights
4. A "decent

... (read more)
-1Andy E Williams
Strangely enough, using AI for a quick, low-effort check on our arguments seems to have advanced this discussion. I asked ChatGPT 01 Pro to assess whether our points cohere logically and are presented self-consistently. It concluded that persuading someone who insists on in-comment, fully testable proofs still hinges on their willingness to accept the format constraints of LessWrong and to consult external materials. Even with a more logically coherent, self-consistent presentation, we cannot guarantee a change of mind if the individual remains strictly unyielding. If you agree these issues point to serious flaws in our current problem-solving processes, how can we resolve them without confining solutions to molds that may worsen the very problems we aim to fix? The response from ChatGPT 01 Pro follows: 1. The Commenter’s Prompt to Claude.ai as a Meta-Awareness Filter In the quoted exchange, the commenter (“the gears to ascension”) explicitly instructs Claude.ai to focus only on testable, mechanistic elements of Andy E. Williams’s argument. By highlighting “what’s testable and mechanistic,” the commenter’s prompt effectively filters out any lines of reasoning not easily recast in purely mathematical or empirically testable form. * Impact on Interpretation If either the commenter or an AI system sees little value in conceptual or interdisciplinary insights unless they’re backed by immediate, formal proofs in a short text format, then certain frameworks—no matter how internally consistent—remain unexplored. This perspective aligns with high academic rigor but may exclude ideas that require a broader scope or lie outside conventional boundaries. * Does This Make AI Safety Unsolvable? Andy E. Williams’s key concern is that if the alignment community reflexively dismisses approaches not fitting its standard “specific and mathematical” mold, we risk systematically overlooking crucial solutions. In extreme cases, the narrow focus could render AI safety unsolvab

I think there's not even the slightest hint at any beyond-pure-base-physics stuff going on

in us, either

3FlorianH
Indeed the topic I've dedicated the 2nd part of the comment, as the "potential truth" how I framed it (and I have no particular objection to you making it slightly more absolutist).

Would love to see a version of this post which does not involve ChatGPT whatsoever, only involves Claude to the degree necessary and never to choose a sequence of words that is included in the resulting text, is optimized to be specific and mathematical, and makes its points without hesitating to use LaTeX to actually get into the math. And expect the math to be scrutinized closely - I'm asking for math so that I and others here can learn from it to the degree it's valid, and pull on it to the degree it isn't. I'm interested in these topics and your post h... (read more)

-1Andy E Williams
Thanks again for your interest. If there is a private messaging feature on this platform please send your email so I might forward the “semantic backpropagation” algorithm I’ve developed along with some case studies assessing it’s impact on collective outcomes. I do my best not to be attached to any idea or to be attached to being right or wrong so I welcome any criticism. My goal is simply to try to help solve the underlying problems of AI safety and alignment, particularly where the solutions can be generalized to apply to other existential challenges such as poverty or climate change. You may ask “what the hell does AI safety and alignment have to do with poverty or climate change”? But is it possible that optimizing any collective outcome might share some common processes? You say that my arguments were a “pile of marketing stuff” that is not “optimized to be specific and mathematical”, fair enough, but what if your arguments also indicate why AI safety and alignment might not be reliably solvable today? What are the different ways that truth can legitimately be discerned, and does confining oneself to arguments that are in your subjective assessment “specific and mathematical” severely limit one’s ability to discern truth? Why Decentralized Collective Intelligence Is Essential  Are there insights that can be discerned from the billions of history of life on this earth, that are inaccessible if one conflates truth with a specific reasoning process that one is attached to? For example, beyond some level of complexity, some collective challenges that are existentially important might not be reliably solvable without artificially augmenting our collective intelligence. As an analogy, there is a kind of collective intelligence in multicellularity. The kinds of problems that can be solved through single-cellular cooperation are simple ones like forming protective slime. Multicellularity on the other hand can solve exponentially more complex challenges like forming

Fractals are in fact related in some ways, but this sounds like marketing content, doesn't have the actual careful reasoning necessary for the insights you're near to be useable. I feel like they're pretty mundane insights anyhow - any dynamical system with a lyapunov exponent greater than 1 generates a shape with fractal dimension in its phase portrait. That sounds fancy with all those technical words, but actually it isn't saying a ton. It does say something, but a great many dynamical systems of interest have lyapunov exponent greater than 1 at least in... (read more)

1Andy E Williams
Thanks very much for your engagement! I did use ChatGPT to help with readability, though I realize it can sometimes oversimplify or pare down novel reasoning in the process. There’s always a tradeoff between clarity and depth when conveying new or complex ideas. There’s a limit to how long a reader will persist without being convinced something is important, and that limit in turn constrains how much complexity we can reliably communicate. Beyond that threshold, the best way to convey a novel concept is to provide enough motivation for people to investigate further on their own. To expand this “communication threshold,” there are generally two approaches: 1. Deep Expertise – Gaining enough familiarity with existing frameworks to quickly test how a new approach aligns with established knowledge. However, in highly interdisciplinary fields, it can be particularly challenging to internalize genuinely novel ideas because they may not align neatly with any single existing framework. 2. Openness to New Possibilities – Shifting from statements like “this is not an established approach” to questions like “what’s new or valuable about this approach?” That reflective stance helps us see beyond existing paradigms. One open question is how AI-based tools like ChatGPT might help lower the barrier to evaluating unorthodox approaches. Particularly when the returns may not be obvious in the short term we tend to focus on. If we generally rely on quick heuristics to judge utility, how do we assess the usefulness of other tools that may be necessary for longer or less familiar timelines? My approach, which I call “functional modeling,” examines how intelligent systems (human or AI) move through a “conceptual space” and a corresponding “fitness space.” This approach draws on cognitive science, graph theory, knowledge representation, and systems thinking. Although it borrows elements from each field, the combination is quite novel, which naturally leads to more self-citations than

Bit of a tangent, but topical: I don't think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it's not clear what internal phenomena are negative for the reanimated mi... (read more)

Due to community input, I've deleted my comment. Thanks for letting me know.

Say I'm convinced. Should I delete my post? (edit 1: I am currently predicting "yes" at something like 70%, and if so, will do so. ... edit 4: deleted it. DM if you want the previous text)

but how would we do high intensity, highly focused research on something intentionally restructured to be an "AI outcomes" research question? I don't think this is pointless - agency research might naturally talk about outcomes in a way that is general across a variety of people's concerns. In particular, ethics and alignment seem like they're an unnatural split, and outcomes seems like a refactor that could select important problems from both AI autonomy risks and human agency risks. I have more specific threads I could talk about.

perhaps. but my reasoning is something like -
better than "alignment": what's being aligned? outcomes should be (citation needed)
better than "ethics": how does one act ethically? by producing good outcomes (citation needed).
better than "notkilleveryoneism": I actually would prefer everyone dying now to everyone being tortured for a million years and then dying, for example, and I can come up with many other counterexamples - not dying is not the problem, achieving good things is the problem.
might not work for deontologists. that seems fine to me, I floa... (read more)

2Mateusz Bagiński
[After I wrote down the thing, I became more uncertain about how much weight to give to it. Still, I think it's a valid consideration to have on your list of considerations.] "AI alignment", "AI safety", "AI (X-)risk", "AInotkilleveryoneism", "AI ethics" came to be associated with somewhat specific categories of issues. When somebody says "we should work (or invest more or spend more) on AI {alignment,safety,X-risk,notkilleveryoneism,ethics}", they communicate that they are concerned about those issues and think that deliberate work on addressing those issues is required or otherwise those issues are probably not going to be addressed (to a sufficient extent, within relevant time, &c.). "AI outcomes" is even broader/[more inclusive] than any of the above (the only step left to broaden it even further would be perhaps to say "work on AI being good" or, in the other direction, work on "technology/innovation outcomes") and/but also waters down the issue even more. Now you're saying "AI is not going to be (sufficiently) good by default (with various AI outcomes people having very different ideas about what makes AI likely not (sufficiently) good by default)". ---------------------------------------- It feels like we're moving in the direction of broadening our scope of consideration to (1) ensure we're not missing anything, and (2) facilitate coalition building (moral trade?). While this is valid, it risks (1) failing to operate on the/an appropriate level of abstraction, and (2) diluting our stated concerns so much that coalition building becomes too difficult because different people/groups endorsing stated concerns have their own interpretations/beliefs/value systems. (Something something find an optimum (but also be ready and willing to update where you think the optimum lies when situation changes)?)

Do bacteria need to be VNM agents?
How about ducks?
Do ants need to be VNM agents?
How about anthills?
Do proteins need to be VNM agents?
How about leukocytes?
Do dogs need to be VNM agents?
How about trees?
Do planets (edit: specifically, populated ones) need to be VNM agents?
How about countries?
Or neighborhoods?
Or interest groups?
Or families?
Or companies?
Or unions?
Or friend groups?
Art groups?

For each of these, which of the assumptions of the VNM framework break, and why?
How do we represent preferences which are not located in a single place?
Or ... (read more)

2Mateusz Bagiński
Insufficiently catchy

The first step would probably be to avoid letting the existing field influence you too much. Instead, consider from scratch what the problems of minds and AI are, how they relate to reality and to other problems, and try to grab them with intellectual tools you're familiar with. Talk to other physicists and try to get into exploratory conversation that does not rely on existing knowledge. If you look at the existing field, look at it like you're studying aliens anthropologically.

the self referential joke thing

"mine some crypt-"

there's a contingent who would close it as soon as someone used an insult focused on intelligence, rather than on intentional behavior. to fix for that subcrowd, "idiot" becomes "fool"

those are the main ones, but then I sometimes get "tldr" responses, and even when I copy out the main civilization story section, I get "they think the authorities could be automated? that can't happen" responses, which I think would be less severe if the buildup to that showed more of them struggling to make autonomous robots ... (read more)

I don't think the answer is as simple as changing terminology or carefully modelling their current viewpoints and bridging the inferential divides.

Indeed, and I think that-this-is-the-case is the message I want communicators to grasp: I have very little reach, but I have significant experience talking to people like this, and I want to transfer some of the knowledge from that experience to people who can use it better.

The thing I've found most useful is to be able to express that significant parts of their viewpoint are reasonable. Eg, one thing I've tr... (read more)

This is the story I use to express what a world where we fail looks like to left-leaning people who are allergic to the idea that AI could be powerful. It doesn't get the point across great, due to a number of things that continue to be fnords for left leaning folks which this story uses, but it works better than most other options. It also doesn't seem too far off what I expect to be the default failure case; though the factories being made of low-intelligence robotic operators seems unrealistic to me.

I opened it now to make this exact point.

3L Rudolf L
Thanks for the review! Curious what you think the specific fnords are - the fact that it's very space-y? What do you expect the factories to look like? I think an underlying assumption in this story is that tech progress came to a stop on this world (presumably otherwise it would be way weirder, and eventually spread to space).

This is talking about dem voters or generally progressive citizens, not dem politicians, correct?

2Viliam
Nope, politicians. SBF donated tons of money to Democrats (and a smaller ton of money to Republicans, just to be sure).

people who dislike AI, and therefore could be taking risks from AI seriously, are instead having reactions like this. https://blue.mackuba.eu/skythread/?author=brooklynmarie.bsky.social&post=3lcywmwr7b22i why? if we soberly evaluate what this person has said about AI, and just, like, think about why they would say such a thing - well, what do they seem to mean? they typically say "AI is destroying the world", someone said that in the comments; but then roll their eyes at the idea that AI is powerful. They say the issue is water consumption - why would ... (read more)

9Nathan Helm-Burger
Giving this a brief look, and responding in part to this and in part to my previous impressions of such worldviews... They don't mean "AI is destroying the world", they mean "tech bros and greedy capitalists are destroying the world, and AI is their current fig leaf. AI is impotent, just autocomplete garbage that will never accomplish anything impressive or meaningful." This mindset is saying, "Why are these crazy techies trying to spin this science-fiction story? This could never happen, and would be horrible if it did." I want a term for the aspect of this viewpoint which is purely reactive, deliberately anti-forward-looking. Anti-extrapolation? Tech-progress denying? A viewpoint that is allergic to the question, "What might happen next?" This viewpoint is heavily entangled with bad takes on economic policies as well, as a result of failure to extrapolate. Also tends to be correlated with anger at existing systems without wanting to engage in architecting better alternatives. Again, because to design a better system requires lots of prediction and extrapolation. How would it work if we designed a feedback machanism like this vs that? Well, we have to run mental simulations and look for edge cases to mentally test, mathematically explore the evolution of the dynamics. A vibes-based worldview, that shrinks from analyzing gears. This phenomenon is not particularly correlated with a political stance, some subset of every political party will have many such people in it. Can such people be fired up to take useful actions on behalf of the future? Probably. I don't think the answer is as simple as changing terminology or carefully modelling their current viewpoints and bridging the inferential divides. If the conceptual bridge you build for them is built of gears, they will be extremely reluctant to cross it.

I suspect fixing this would need to involve creating something new which doesn't have the structural problems in EA which produced this, and would involve talking to people who are non-sensationalist EA detractors but who are involved with similarly motivated projects. I'd start here and skip past the ones that are arguing "EA good" to find the ones that are "EA bad, because [list of reasons ea principles are good, and implication that EA is bad because it fails at its stated principles]"

I suspect, even without seeking that out, the spirit of EA that made it ever partly good has already and will further metastasize into genpop.

I was someone who had shorter timelines. At this point, most of the concrete part of what I expected has happened, but the "actually AGI" thing hasn't. I'm not sure how long the tail will turn out to be. I only say this to get it on record.

https://www.drmichaellevin.org/research/

https://www.drmichaellevin.org/publications/

it's not directly on alignment, but it's relevant to understanding agent membranes. understanding his work seems useful as a strong exemplar of what one needs to describe with a formal theory of agents and such. particularly interesting is https://pubmed.ncbi.nlm.nih.gov/31920779/

It's not the result we're looking for, but it's inspiring in useful ways.

Yes to both. I don't think Cannell is correct about an implementation of what he said being a good idea, even if it was a certified implementation, and I also don't think his idea is close to ready to implement. Agent membranes still seem at all interesting, right now as far as I know the most interesting work is coming from the Levin lab (tufts university, michael levin), but I'm not happy with any of it for nailing down what we mean by aligning an arbitrarily powerful mind to care about the actual beings in its environment in a strongly durable way.

2Gunnar_Zarncke
I'm not clear about what research by Michael Levin you mean. I found him mentioned here: «Boundaries», Part 3b: Alignment problems in terms of boundaries but his research seems to be about cellular computation, not related to alignment.

What is a concise intro that will teach me everything I need to know for understanding every expression here? I'm also asking Claude, interested in input from people with useful physics textbook taste

qaci seems to require the system having an understanding-creating property that makes it a reliable historian. have been thinking about this, have more to say, currently rather raw and unfinished.

hmm actually, I think I was the one who was wrong on that one. https://en.wikipedia.org/wiki/Synaptic_weight seems to indicate the process I remembered existing doesn't primarily work how I thought it did.

Load More