All of the gears to ascension's Comments + Replies

I'm pretty sure this isn't a policy change but rather a policy distillation, and you were operating under the policy described above already. eg, I often have conversations with AIs that I don't want to bother to translate into a whole post, but where I think folks here would benefit from seeing the thread. what I'll likely do is make the AI portions collapsible and the human portions default uncollapsed; often the human side is sufficient to make a point (when the conversation is basically just a human thinking out loud with some helpful feedback), but so... (read more)

Hypothetical conversation:

"You gotta prepare for the second coming, man, it's this year!"

"It is not. Stop telling me that."

"It is! The signs are all there! Do you even portents, bro?"

"I told you it's not happening."

"We work at a prediction market. I'll buy yes. It's happening."

"No you won't. You know it isn't."

"I so will, make the market."

"This is stupid, I'm eating lunch."

"Let's make the market, dude."

"Ugh, fine, it's going to zero so fast."

"More money for me!"

Utility is potentially a good thing to critique, but the case for seems sticky and maybe we're just holding it wrong or something. An issue is that I don't "have" a utility function; the vnm axioms hold in the limit of unexploitability but it seems like the process of getting my mistakes gently corrected by interaction with the universe is itself something I prefer to have some of. In active inference terms, I don't only want to write to the universe.

But this post seems kinda vague too. I upvote hesitantly.

Who made this and why are they paying for the model responses? Do we know what happens to the data?

3Davey Morse
I made it! One day when I was bored on the train. No data is saved rn other than leaderboard scores.

Fair enough. Neither dill nor ziz would have been able to pull off their crazy stuff without some people letting themselves get hypnotized, so I think the added warnings are correct.

High quality archives of the selves along the way. Compressed but not too much. In the live self, some updated descendant that has significant familial lineage, projected vaguely as the growing patterns those earlier selves would call a locally valid continuation according to the aesthetics and structures they consider essential at the time. In other words, this question is dynamically reanswered to the best of my ability in an ongoing way, and snapshots allow reverting and self-interviews to error check.

Any questions? :)

Ooh, man, I don't know if pigs have more or less autonomy than AIs right now, but I'm inclined to think quite a lot more. current AIs seem like they'd crash pretty quick if just plopped in a robot body with little to no scaffolding, whereas mammals are built around autonomy. Not sure how it shakes out, though.

2Daniel Kokotajlo
Pigs are more coherent long-horizon agents than AIs right now, indeed. (See: Claude Plays Pokemon). I didn't mean to imply otherwise. I was talking about the ethical concept of autonomy as in, they-have-a-right-to-make-decisions-for-themselves-instead-of-having-others-make-decisions-for-them. But idk if this is conventional usage of the term and also I am uncertain about the object-level question (recall I said "maybe.")

There's a version of this that I would agree with. But, when-anthropomorphizing-for-good-reason seems to me to be a time when there's more danger from anthropomorphizing-more-than-justified. I've been talking to Claude instances about this stuff, based on a similar intuition to you. But I haven't figured out what I believe well enough to argue it to others in a coherent way. I could paste conversations I've had, but they're kinda long for this forum. I'll paste a recent one on pastebin for now (if the pastebin link goes bad, this was conversation id ...5e6... (read more)

1testingthewaters
Yeah, I'm not gonna do anything silly (I'm not in a position to do anything silly with regards to the multitrillion param frontier models anyways). Just sort of "laying the groundwork" for when AIs will cross that line, which I don't think is too far off now. The movie "Her" is giving a good vibe-alignment for when the line will be crossed.

can you expand on what you mean by that? are there any actions you'd suggest, on my part or others, based on this claim? (also, which of the urban dictionary definitions of "lh" do you mean? they have opposite valences.)

edit: added a bunch of warnings to my original comment. sorry for missing them in the first place.

3green_leaf
I meant "light-hearted" and sorry, it was just a joke.

I don't think Lucius is claiming we'd be happy about it. Maybe the no anticipated impact carries that implicit claim, I guess.

9Lucius Bushnaq
There may be a sense in which amplitude is a finite resource. Decay your branch enough, and your future anticipated experience might come to be dominated by some alien with higher amplitude simulating you, or even just by your inner product with quantum noise in a more mainline branch of the wave function. At that point, you lose pretty much all ability to control your future anticipated experience. Which seems very bad. This is a barrier I ran into when thinking about ways to use quantum immortality to cheat heat death. 

Re convo with Raemon yesterday, this might change my view.

edit: uh, well, short answer: there totally is! idk if they're the psychedelic states you wanted, but they should do for a lot of relevant purposes, seems pretty hard to match meds though. original longer version:

there's a huge space of psychedelic states, I think the subspace reachable with adding chemicals is a large volume that's hard to get to by walking state space with only external pushes - I doubt the kind of scraping a hole in the wall from a distance you can do with external input can achieve, eg, globally reversing the function of SERT (I think ... (read more)

your argument is basically "there's just this huge mess of neurons, surely somewhere in there is a way",

I suppose that is what I said interpreted as a deductive claim. I have more abductive/bayesian/hunch information than that, I've expressed some of it, but I've been realizing lately a lot of my intuitions are not via deductive reasoning, which can make them hard to verify or communicate. (and I'd guess that that's a common problem, seems like the sort of thing science exists to solve.) I'm likely not well equipped to present justifiedly-convincing-to-... (read more)

I doubt the level of inhuman behavior we see in this story is remotely close to easy to achieve and probably not tractable given only hand motions as shown - given human output bandwidth, sounds seem needed, especially surprisingly loud ones. for the sky, I think it would start out beautiful, end up superstimulating, and then seep in via longer exposure. I think there's probably a combination of properties of hypnosis, cult brainwashing, inducing psychedelic states, etc, which could get a human's thinking to end up in crashed attractors, even if it's only one-way transmission. then from a crashed attractor it seems a lot more possible to get a foothold of coherence for the attacker.

3Grayson Chao
Man, I really hope there's a way to induce psychedelic states through sensory inputs. That could be hugely beneficial if harnessed for pro-human goals (for example, scaling therapeutic interventions like MDMA or ketamine therapy.)

both - I'd bet they're between 5 to 12% of the population, and that they're natural relays of the ideas you'd want to broadcast, if only they weren't relaying such mode-collapsed versions of the points. A claim presented without deductive justification: in trying to make media that is very high impact, making something opinionated in the ways you need to is good, and making that same something unopinionated in ways you don't need to is also good. Also, the video you linked has a lot of additional opinionated features that I think are targeting a much more specific group than even "people who aren't put off by AI" - it would never show up on my youtube.

2Raemon
For frame of reference, do regular movie trailers normally show up in your youtube? This video seemed relatively "mainstream"-vibing to me, although somewhat limited by the medium.

Perhaps multiple versions, then. I maintain my claim that you're missing a significant segment of people who are avoiding AI manipulation moderately well but as a result not getting enough evidence about what the problem is.

5Raemon
I would bet they are <1% of the population. Do you disagree, or think they disproportionately matter?

you'll lose an important audience segment the moment they recognize any AI generated anything. The people who wouldn't be put off by AI generated stuff probably won't be put off by the lack of it. you might be able to get away with it by using AI really unusually well such that it's just objectively hard to even get a hunch that AI was involved other than by the topic.

3Raemon
I'm skeptical that there are actually enough people so ideologically opposed to this, that it outweighs the upside of driving home that capabilities are advancing, through the medium itself. (similar to how even though tons of people hate FB, few people actually leave) I'd be wanting to target a quality level similar to this:

for AIs, more robust adversarial examples - especially ones that work on AIs trained on different datasets - do seem to look more "reasonable" to humans. The really obvious adversarial example of this kind in human is like, cults, or so - I don't really have another, though I do have examples that are like, on the edge of the cult pattern. It's not completely magic, it doesn't work on everyone, and it does seem like a core component of why people fall to it is something like a relaxed "control plane" that doesn't really try hard to avoid being crashed by i... (read more)

4green_leaf
Ah, you're a soft-glitcher. /lh Edit: This is a joke.
Bunthut110

for AIs, more robust adversarial examples - especially ones that work on AIs trained on different datasets - do seem to look more "reasonable" to humans.

Then I would expect they are also more objectively similar. In any case that finding is strong evidence against manipulative adversarial examples for humans - your argument is basically "there's just this huge mess of neurons, surely somewhere in there is a way", but if the same adversarial examples work on minds with very different architectures, then that's clearly not why they exist. Instead, they have ... (read more)

I think most things that hit your brain have some percentage of leaking out of the data plane, some on the lower end, some fairly high, and it seems like for current levels of manipulative optimization towards higher-data-plane-leaking media, looking for the leaks and deciding how to handle them seems to me like maybe it can help if you have to encounter the thing. it's just that, normally, the bitrate of control back towards the space of behavior that the organism prefers is high enough that the incoming manipulation can't strongly persist. but we do see ... (read more)

4Grayson Chao
I'm not following how the cult example relates to something like achieving remote code execution in the human brain via the visual cortex. While cult manipulation techniques do elicit specific behavior via psychological manipulation, it seems like the brain of a cult member is still operating 'in human mode', which is why people influenced by a cult act like human beings with unusual priorities and incentives instead of like zombies.

there are species of microbe that float pretty well, though. as far as we know right now, they just don't stay floating indefinitely or fuel themselves in the air.

edit: putting the thing I was originally going to say back:

I meant that I think there's enough bandwidth available from vision into configuration of matter in the brain that a sufficiently powerful mind could find adversarial-example the human brain hard enough to implement the adversarial process in the brain, get it to persist persist in that brain, take control, and spread. We see weaker versions of this in advertising and memetics already, and it seems to be getting worse with social media - there are a few different strains, which generally aren't hig... (read more)

1Bunthut
Ok, thats mostly what I've heard before. I'm skeptical because: 1. If something like classical adversarial examples existed for humans, it likely wouldn't have the same effects on different people, or even just viewed from different angles, or maybe even in a different mood. 2. No known adversarial examples of the kind you describe for humans. We could tell if we had found them because we have metrics of "looking similar" which are not based on our intuitive sense of similarity, like pixelwise differences and convolutions. All examples of "easily confused" images I've seen were objectively similar to what theyre confused for. 3. Somewhat similar to what Grayson Chao said, it seems that the influence of vision on behaviour goes through a layer of "it looks like X", which is much lower bandwidth than vision in total. Ads have qualitatively similar effects to what seeing their content actually happen in person would. 4. If adversarial examples exist, that doesn't mean they exist for making you do anything of the manipulators choosing. Humans are, in principle, at least as programmable as a computer, but that also means there are vastly more courses of action than possible vision inputs. In practice, propably not a lot of high-cognitive-function-processing could be commandeered by adversarial inputs, and behaviours complex enough to glitch others couldn't be implemented.
6Richard_Ngo
in general I think people should explain stuff like this. "I might as well not help" is a very weak argument compared with the benefits of people understanding the world better.
3Grayson Chao
Intuitively, I see a qualitative difference between adversarial inputs like the ones in the story and merely pathological ones, such as manipulative advertising or dopamine-scrolling-inducing content. The intuition comes from cybersecurity, where it's generally accepted that the control plane (roughly, the stream of inputs deciding what the system does and how it does it) should be isolated from the data plane (roughly, the stream of inputs defining what the system operates on.) In the examples of advertising and memetics, the input is still processed in the 'data plane', where the brain integrates sensory information on its own terms, in the pursuit of its own goals. "Screensnakes"/etc seem to have the ability to break the isolation and interact directly with the control plane (e.g a snake's coloration is no longer processed as 'a snake's coloration' at all.)   That said, there are natural examples which are less clear-cut, such as the documented phenomenon where infrasound around 19Hz produces a feeling of dread. It's not clear to me that this is 'control plane hacking' per se (for example, perhaps this is an evolved response to sounds that would have been associated with caves or big predators in the past) but it does blur the intuitive boundary between the control plane and data plane.   Are you aware of any phenomena that are very 'control plane-y' in this sense? If they existed, it would seem to me to be a positive confirmation that I'm wrong and your idea of the adversarial search resulting in a 'Glitcher protocol' would have some legs.

Ask your AI what's wrong with your ideas, not what's right, and then only trust the criticism to be valid if there are actual defeaters you can't show you've beaten in the general case. Don't trust an AI to be thorough, important defeaters will be missing. Natural language ideas can be good glosses of necessary components without telling us enough about how to pin down the necessary math.

It took me several edits to get spoilers to work right, I had to switch from markdown to the rich text editor. Your second spoiler is empty, which is how mine were breaking.

to wentworthpilled folks: - Arxiv: "Dynamic Markov Blanket Detection for Macroscopic Physics Discovery" (via author's bsky thread, via week top arxiv)

Could turn out not to be useful, I'm posting before I start reading carefully and have only skimmed the paper.

Copying the first few posts of that bsky thread here, to reduce trivial inconveniences:

This paper resolves a key outstanding issue in the literature on the free energy principle (FEP): Namely, to develop a principled approach to the detection of dynamic Markov blankets 2/16

The FEP is a generalized m

... (read more)
4Alexander Gietelink Oldenziel
@Fernando Rosas 

sounds interesting if it works as math. have you already written it out in latex or code or similar? I suspect that this is going to turn out to not be incentive compatible. Incentive-compatible "friendly"/"aligned" economic system design does seem like the kind of thing that would fall out of a strong solution to the AI short-through-long-term-notkilleveryone-outcomes problem, though my expectation is basically that when we write this out we'll find severe problems not fully visible beneath the loudness of natural language. If I didn't need to get away from the computer right now I'd even give it a try myself, might get around to that later, p ~= 20%

3[anonymous]
I've been dragging my feet on the sim. Help definitely needed, especially on formalization.

phew, I have some feelings after reading that, which might indicate useful actions. I wonder if they're feelings in the distribution that the author intended.

 I suddenly am wondering if this is what LLMs are. But... maybe not? but I'm not sure. they might be metaphorically somewhat in this direction. clearly not all the way, though.

spoilers, trying to untangle the worldbuilding:

seems like perhaps the stars are actually projecting light like that towards this planet - properly designed satellites could be visible during the day with the help of careful

... (read more)
gwern3418

I read the 'stars' as simply very dense low-orbiting satellites monitoring the ground 24/7 for baseline humans to beam low-latency optical propaganda at. The implied King's Pact presumably is something like, "the terrestrial Earth will be left unmodified and no AI are allowed to directly communicate or interact with or attempt to manipulate baseline humans", and so satellites, being one-way broadcasts outside the Earth, don't violate it. This then allows the bootstrap of all the other attacks: someone looks up at night long enough, they get captured, start... (read more)

2Bunthut
Elaborate.
8Richard_Ngo
I appreciated this comment! Especially:

Sounds interesting, I talk to LLMs quite a bit as well, I'm interested in any tricks you've picked up. I put quite a lot of effort into pushing them to be concise and grounded.

eg, I think an LLM bot designed by me would only get banned for being an LLM, despite consistently having useful things to say when writing comments - which, relatedly, would probably not happen super often, despite the AI reading a lot of posts and comments - it would be mostly showing up in threads where someone said something that seemed to need a specific kind of asking them for ... (read more)

Encouraging users to explicitly label words as having come from an AI would be appreciated. So would be instructing users on when you personally find it acceptable to share words or ideas that came from an AI. I doubt the answer is "never as part of a main point", though I could imagine that some constraints include "must be tagged to be socially acceptable", and "must be much more dense than is typical for an LLM", and "avoid those annoying keywords LLMs typically use to make their replies shiny". I suspect a lot of what you don't like is that most people... (read more)

One need not go off into the woods indefinitely, though.

4Mateusz Bagiński
I don't think I implied that John's post implied that and I don't think going into the woods non-indefinitely mitigates the thing I pointed out.

I buy that training slower is a sufficiently large drawback to break scaling. I still think bees are why the paper got popular. But if intelligence depends on clean representation, interpretability due to clean representation is natively and unavoidably bees. We might need some interpretable-bees insights in order to succeed, it does seem like we could get better regret bound proofs (or heuristic arguments) that go through a particular trained model with better (reliable, clean) interp. But the whole deal is the ai gets to exceed us in ways that make human... (read more)

This is just capabilities stuff. I expect that people will use this to train larger networks, as much larger as they can. If your method shrinks the model, it likely induces demand proportionately. In this case it's not new capabilities stuff by you so it's less concerning, bit still. This paper is popular because of bees

1CBiddulph
I'd be pretty surprised if DLGNs became the mainstream way to train NNs, because although they make inference faster they apparently make training slower. Efficient training is arguably more dangerous than efficient inference anyway, because it lets you get novel capabilities sooner. To me, DLGN seems like a different method of training models but not necessarily a better one (for capabilities).  Anyway, I think it can be legitimate to try to steer the AI field towards techniques that are better for alignment/interpretability even if they grant non-zero benefits to capabilities. If you research a technique that could reduce x-risk but can't point to any particular way it could be beneficial in the near term, it can be hard to convince labs to actually implement it. Of course, you want to be careful about this. What do you mean?

My estimate is 97% not sociopaths, but only about 60% inclined to avoid teaming up with sociopaths.

Germline engineering likely destroys most of what we're trying to save, via group conflict effects. There's a reason it's taboo.

4TsviBT
Does the size of this effect, according to you, depend on parameters of the technology? E.g. if it clearly has a ceiling, such that it's just not feasible to make humans who are in a meaningful sense 10x more capable than the most capable non-germline-engineered human? E.g. if the technology is widespread, so that any person / group / state has access if they want it?

I suppose that one might be a me thing. I haven't heard others say it, but it was an insight for me at one point that "oh, it hurts because it's an impact". It had the flavor of expecting a metaphor and not getting one.

Your link to "don't do technical ai alignment" does not argue for that claim. In fact, it appears to be based on the assumption that the opposite is true, but that there are a lot of distractor hypotheses for how to do it that will turn out to be an expensive waste of time.

To be clear, I'm expecting scenarios much more clearly bad than that, like "the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn't manage to break out of it using t... (read more)

2Noosphere89
My main crux here is I think that no strong AI rights will likely be given before near-full alignment to one person is achieved, and maybe not even then, and a lot of the failure modes of giving AIs power in gradual disempowerment scenario fundamentally route through giving AIs very strong rights, but thankfully, this is disincentivized by default, because otherwise AIs would be more expensive. The main way this changes the scenario is that the 6 humans here remain broadly in control here, and aren't just high all the time, and the first one probably doesn't just replace their preferences with pure growth, because at the level of billionaires, status dominates, so they are likely living very rich lives with their own servants. No guarantees about anyone else surviving though:

I mean, we're not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you'd have had if you didn't rush yourself.

"all" humans? like, maybe no, I expect a few would survive, but the future wouldn't be human, it'd be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people's motivation, etc. if you solve "make an ai be entirely obedient to a singl... (read more)

4Dagon
  The vast majority of actual humans are already dead.  The overwhelming majority of currently-living humans should expect 95%+ chance they'll die in under a century.   If immortality is solved, it will only apply to "that distorted thing those humans turn into".   Note that this is something the stereotypical Victorian would understand completely - there may be biological similarities with today's humans, but they're culturally a different species.

I would guess that the range of things people propose for the shell game is tractable to get a good survey of. It'd be interesting to try to plot out the system as a causal graph with recurrence so one can point to, "hey look, this kind of component is present in a lot of places", and see if one can get that causal graph visualization to show enough that it starts to feel clear to people why this is a problem. I doubt I'll get to this, but if I play with this, I might try to visualize it [edit: probably with the help of a skilled human visual artist to mak... (read more)

He appears to be arguing against a thing, while simultaneously criticizing people; but I appreciate that he seems to do it in ways that are not purely negative, also mentioning times things have gone relatively well (specifically, updating on evidence that folks here aren't uniquely correct), even if it's not enough to make the rest of his points not a criticism.

I entirely agree with his criticism of the strategy he's criticizing. I do think there are more obviously tenable approaches than the "just build it yourself lol" approach or "just don't let anyone... (read more)

[Edit: crash found in the conversations referenced, we'll talk more in DM but not in a hurry. This comment retracted for now]

By "AGI" I mean the thing that has very large effects on the world (e.g., it kills everyone) via the same sort of route that humanity has large effects on the world. The route is where you figure out how to figure stuff out, and you figure a lot of stuff out using your figure-outers, and then the stuff you figured out says how to make powerful artifacts that move many atoms into very specific arrangements.

delete "it kills everyon... (read more)

4TsviBT
As I mentioned, my response is here https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#_We_just_need_X__intuitions: I haven't heard a response / counterargument to this yet, and many people keep making this logic mistake, including AFAICT you.
6TsviBT
My definition is better than yours, and you're too triggered or something to think about it for 2 minutes and understand what I'm saying. I'm not saying "it's not AGI until it kills us", I'm saying "the simplest way to tell that something is an AGI is that it kills us; now, AGI is whatever that thing is, and could exist some time before it kills us".
4TsviBT
What do you mean? According to me we barely started the conversation, you didn't present evidence, I tried to explain that to you, we made a bit of progress on that, and then you ended the conversation.

@daniel k I just can never remember your last name's spelling, sorry, heh. My point in saying this is that my prediction approach up to 2020 was similar to, though not as refined as, yours, and that instead of trying to argue my views (which differ from yours in a few trivial ways that are mostly not relevant) I'd rather just point people to your arguments of yours.

When predicting timelines, it matters which benchmark in the compounding returns curve you pick. Your definition minus doom happens earlier, even if the minus doom version is too late to avert in literally all worlds (I doubt that, it's likely more that the most powerful humans[1]'s ELO against AIs falls and falls but takes a while to be indistinguishable from zero).

  1. ^

    such as their labs' CEOs, major world leaders, highly skilled human strategists, etc

2TsviBT
You refered to " others' definition (which is similar but doesn't rely on the game over clause) ", and I'm saying no, it's not relevantly similar, and it's not just my definition minus doom.

Your definition of AGI is "that which completely ends the game", source in your link. By that definition I agree with you. By others' definition (which is similar but doesn't rely on the game over clause) I do not.

My timelines have gotten slightly longer since 2020, I was expecting TAI when we got GPT4, and I have recently gone back and discovered I have chatlogs showing I'd been expecting that for years and had specific reasons. I would propose Daniel K. is particularly a good reference.

2the gears to ascension
@daniel k I just can never remember your last name's spelling, sorry, heh. My point in saying this is that my prediction approach up to 2020 was similar to, though not as refined as, yours, and that instead of trying to argue my views (which differ from yours in a few trivial ways that are mostly not relevant) I'd rather just point people to your arguments of yours.
TsviBT103

I also dispute that genuine HLMI refers to something meaningfully different from my definition. I think people are replacing HLMI with "thing that can do all stereotyped, clear-feedback, short-feedback tasks", and then also claiming that this thing can replace many human workers (probably true of 5 or 10 million, false of 500 million) or cause a bunch of unemployment by making many people 5x effective (maybe, IDK), and at that point IDK why we're talking about this, when X-risk is the important thing.

I should also add:

I'm pretty worried that we can't understand the universe "properly" even if we're in base physics! It's not yet clearly forbidden that the foundations of philosophy contain unanswerable questions, things where there's a true answer that affects our universe in ways that are not exposed in any way physically, and can only be referred to by theoretical reasoning; which then relies on how well our philosophy and logic foundations actually have the real universe as a possible referent. Even if they do, things could be annoying. In particular,... (read more)

I think that if our future goes well, it will be because we found ways to align AI well enough, and/or because we coordinated politically to slow or stop AI advancement long enough to accomplish the alignment part

Agree

not because researchers avoided measured AI's capabilities.

But differential technological development matters, as does making it clear that when you make a capability game like this, you are probably just contributing to capabilities, not doing alignment. I won't say you should never do that, but I'll say that's what's being done. I pe... (read more)

5eggsyntax
Agreed. I would distinguish between measuring capabilities and improving capabilities. I agree that the former can motivate the latter, but they still seem importantly different. I continue to think that the alternative of not measuring capabilities (or only measuring some small subset that couldn't be used as training benchmarks) just means we're left in the dark about what these models can do, which seems pretty straightforwardly bad from a safety perspective. I agree that it's definitely not doing alignment, and that working on alignment is the most important goal; I intend to shift toward directly working on alignment as I feel clearer about what work is a good bet (my current leading candidate, which I intend to focus on after this experiment: learning to better understand and shape LLMs' self-models). I very much appreciate the thoughtful critique, regardless of whether or not I'm convinced by it.

Decision theory as discussed here heavily involves thinking about agents responding to other agents' decision processes

3mattmacdermott
The notion of ‘fairness’ discussed in e.g. the FDT paper is something like: it’s fair to respond to your policy, i.e. what you would do in any counterfactual situation, but it’s not fair to respond to the way that policy is decided. I think the hope is that you might get a result like “for all fair decision problems, decision-making procedure A is better than decision-making procedure B by some criterion to do with the outcomes it leads to”. Without the fairness assumption you could create an instant counterexample to any such result by writing down a decision problem where decision-making procedure A is explicitly penalised e.g. omega checks if you use A and gives you minus a million points if so.

Sims are very cheap compared to space travel, and you need to know what you're dealing with in quite a lot of detail before you fly because you want to have mapped the entire space of possible negotiations in an absolutely ridiculous level of detail.

Sims built for this purpose would still be a lot lower detail than reality, but of course that would be indistinguishable from inside if the sim is designed properly. Maybe most kinds of things despawn in the sim when you look away, for example. Only objects which produce an ongoing computation that has influen... (read more)

We have to infer how reality works somehow.

I've been poking at the philosophy of math recently. It really seems like there's no way to conceive of a universe that is beyond the reach of logic except one that also can't support life. Classic posts include unreasonable effectiveness of mathematics, what numbers could not be, a few others. So then we need epistemology.

We can make all sorts of wacky nested simulations and any interesting ones, ones that can support organisms (that is, ones that are Turing complete), can also support processes for predicting ou... (read more)

1AynonymousPrsn123
Thank you, I feel inclined to accept that for now. But I'm still not sure, and I'll have to think more about this response at some point. Edit: I'm still on board with what you're generally saying, but I feel skeptical of one claim: My intuition tells me there will probably be superior methods of gathering information about superintelligent aliens. To me, it seems like the most obvious reason to create sims would be to respect the past for some bizarre ethical reason, or for some weird kind of entertainment, or even to allow future aliens to temporarily live in a more primitive body. Or perhaps for a reason we have yet to understand. I don't think any of these scenarios would really change the crux of your argument, but still, can you please justify your claim for my curiosity?

If we have no grasp on anything outside our virtualized reality, all is lost. Therefore I discard my attempts to control those possible worlds.

However, the simulation argument relies on reasoning. To go through requires a number of assumptions hold. Those in turn rely on: why would we be simulated? It seems to me the main reason is because we're near a point of high influence in original reality and they want to know what happened - the simulations then are effectively extremely high resolution memories. Therefore, thank those simulating us for the additio... (read more)

1AynonymousPrsn123
I think I understand your point. I agree with you: the simulation argument relies on the assumption that physics and logic are the same inside and outside the simulation. In my eyes, that means we may either accept the argument's conclusion or discard that assumption. I'm open to either. You seem to be, too—at least at first. Yet, you immediately avoid discarding the assumption for practical reasons: I agree with this statement, and that's my fear. However, you don't seem to be bothered by the fact. Why not? The strangest thing is that I think you agree with my claim: "The simulation argument should increase our credence that our entire understanding of everything is flawed." Yet somehow, that doesn't frighten you. What do you see that I don't see? Practical concerns don't change the territory outside our false world. Second: That's surely possible, but I can imagine hundreds of other stories. In most of those stories, altruism from within the simulation has no effect on those outside it. Even worse, is that there are some stories in which inflicting pain within a simulation is rewarded outside of it. Here's a possible hypothetical: Imagine humans in base reality create friendly AI. To respect their past, the humans ask the AI to create tons of sims living in different eras. Since some historical info was lost to history, the sims are slightly different from base reality. Therefore, in each sim, there's a chance AI never becomes aligned. Accounting for this possibility, base reality humans decide to end sims in which AI becomes misaligned and replace those sims with paradise sims where everyone is happy. In the above scenario, both total and average utilitarianism would recommend intentionally creating misaligned AI so that paradise ensues. I'm sure you can craft even more plausible stories.  My point is, even if our understanding of physics and logic is correct, I don't see why we ought to privilege the hypothesis that simulations are memories. I also don't

willingness seems likely to be understating it. a context where the capability is even part of the author context seems like a prereq. finetuning would produce that, with fewshot one has to figure out how to make it correlate. I'll try some more ideas.

Load More