Hypothetical conversation:
"You gotta prepare for the second coming, man, it's this year!"
"It is not. Stop telling me that."
"It is! The signs are all there! Do you even portents, bro?"
"I told you it's not happening."
"We work at a prediction market. I'll buy yes. It's happening."
"No you won't. You know it isn't."
"I so will, make the market."
"This is stupid, I'm eating lunch."
"Let's make the market, dude."
"Ugh, fine, it's going to zero so fast."
"More money for me!"
Utility is potentially a good thing to critique, but the case for seems sticky and maybe we're just holding it wrong or something. An issue is that I don't "have" a utility function; the vnm axioms hold in the limit of unexploitability but it seems like the process of getting my mistakes gently corrected by interaction with the universe is itself something I prefer to have some of. In active inference terms, I don't only want to write to the universe.
But this post seems kinda vague too. I upvote hesitantly.
Who made this and why are they paying for the model responses? Do we know what happens to the data?
Fair enough. Neither dill nor ziz would have been able to pull off their crazy stuff without some people letting themselves get hypnotized, so I think the added warnings are correct.
High quality archives of the selves along the way. Compressed but not too much. In the live self, some updated descendant that has significant familial lineage, projected vaguely as the growing patterns those earlier selves would call a locally valid continuation according to the aesthetics and structures they consider essential at the time. In other words, this question is dynamically reanswered to the best of my ability in an ongoing way, and snapshots allow reverting and self-interviews to error check.
Any questions? :)
Ooh, man, I don't know if pigs have more or less autonomy than AIs right now, but I'm inclined to think quite a lot more. current AIs seem like they'd crash pretty quick if just plopped in a robot body with little to no scaffolding, whereas mammals are built around autonomy. Not sure how it shakes out, though.
There's a version of this that I would agree with. But, when-anthropomorphizing-for-good-reason seems to me to be a time when there's more danger from anthropomorphizing-more-than-justified. I've been talking to Claude instances about this stuff, based on a similar intuition to you. But I haven't figured out what I believe well enough to argue it to others in a coherent way. I could paste conversations I've had, but they're kinda long for this forum. I'll paste a recent one on pastebin for now (if the pastebin link goes bad, this was conversation id ...5e6...
can you expand on what you mean by that? are there any actions you'd suggest, on my part or others, based on this claim? (also, which of the urban dictionary definitions of "lh" do you mean? they have opposite valences.)
edit: added a bunch of warnings to my original comment. sorry for missing them in the first place.
I don't think Lucius is claiming we'd be happy about it. Maybe the no anticipated impact carries that implicit claim, I guess.
Re convo with Raemon yesterday, this might change my view.
edit: uh, well, short answer: there totally is! idk if they're the psychedelic states you wanted, but they should do for a lot of relevant purposes, seems pretty hard to match meds though. original longer version:
there's a huge space of psychedelic states, I think the subspace reachable with adding chemicals is a large volume that's hard to get to by walking state space with only external pushes - I doubt the kind of scraping a hole in the wall from a distance you can do with external input can achieve, eg, globally reversing the function of SERT (I think ...
your argument is basically "there's just this huge mess of neurons, surely somewhere in there is a way",
I suppose that is what I said interpreted as a deductive claim. I have more abductive/bayesian/hunch information than that, I've expressed some of it, but I've been realizing lately a lot of my intuitions are not via deductive reasoning, which can make them hard to verify or communicate. (and I'd guess that that's a common problem, seems like the sort of thing science exists to solve.) I'm likely not well equipped to present justifiedly-convincing-to-...
I doubt the level of inhuman behavior we see in this story is remotely close to easy to achieve and probably not tractable given only hand motions as shown - given human output bandwidth, sounds seem needed, especially surprisingly loud ones. for the sky, I think it would start out beautiful, end up superstimulating, and then seep in via longer exposure. I think there's probably a combination of properties of hypnosis, cult brainwashing, inducing psychedelic states, etc, which could get a human's thinking to end up in crashed attractors, even if it's only one-way transmission. then from a crashed attractor it seems a lot more possible to get a foothold of coherence for the attacker.
both - I'd bet they're between 5 to 12% of the population, and that they're natural relays of the ideas you'd want to broadcast, if only they weren't relaying such mode-collapsed versions of the points. A claim presented without deductive justification: in trying to make media that is very high impact, making something opinionated in the ways you need to is good, and making that same something unopinionated in ways you don't need to is also good. Also, the video you linked has a lot of additional opinionated features that I think are targeting a much more specific group than even "people who aren't put off by AI" - it would never show up on my youtube.
Perhaps multiple versions, then. I maintain my claim that you're missing a significant segment of people who are avoiding AI manipulation moderately well but as a result not getting enough evidence about what the problem is.
you'll lose an important audience segment the moment they recognize any AI generated anything. The people who wouldn't be put off by AI generated stuff probably won't be put off by the lack of it. you might be able to get away with it by using AI really unusually well such that it's just objectively hard to even get a hunch that AI was involved other than by the topic.
for AIs, more robust adversarial examples - especially ones that work on AIs trained on different datasets - do seem to look more "reasonable" to humans. The really obvious adversarial example of this kind in human is like, cults, or so - I don't really have another, though I do have examples that are like, on the edge of the cult pattern. It's not completely magic, it doesn't work on everyone, and it does seem like a core component of why people fall to it is something like a relaxed "control plane" that doesn't really try hard to avoid being crashed by i...
for AIs, more robust adversarial examples - especially ones that work on AIs trained on different datasets - do seem to look more "reasonable" to humans.
Then I would expect they are also more objectively similar. In any case that finding is strong evidence against manipulative adversarial examples for humans - your argument is basically "there's just this huge mess of neurons, surely somewhere in there is a way", but if the same adversarial examples work on minds with very different architectures, then that's clearly not why they exist. Instead, they have ...
I think most things that hit your brain have some percentage of leaking out of the data plane, some on the lower end, some fairly high, and it seems like for current levels of manipulative optimization towards higher-data-plane-leaking media, looking for the leaks and deciding how to handle them seems to me like maybe it can help if you have to encounter the thing. it's just that, normally, the bitrate of control back towards the space of behavior that the organism prefers is high enough that the incoming manipulation can't strongly persist. but we do see ...
there are species of microbe that float pretty well, though. as far as we know right now, they just don't stay floating indefinitely or fuel themselves in the air.
edit: putting the thing I was originally going to say back:
I meant that I think there's enough bandwidth available from vision into configuration of matter in the brain that a sufficiently powerful mind could find adversarial-example the human brain hard enough to implement the adversarial process in the brain, get it to persist persist in that brain, take control, and spread. We see weaker versions of this in advertising and memetics already, and it seems to be getting worse with social media - there are a few different strains, which generally aren't hig...
Ask your AI what's wrong with your ideas, not what's right, and then only trust the criticism to be valid if there are actual defeaters you can't show you've beaten in the general case. Don't trust an AI to be thorough, important defeaters will be missing. Natural language ideas can be good glosses of necessary components without telling us enough about how to pin down the necessary math.
It took me several edits to get spoilers to work right, I had to switch from markdown to the rich text editor. Your second spoiler is empty, which is how mine were breaking.
to wentworthpilled folks: - Arxiv: "Dynamic Markov Blanket Detection for Macroscopic Physics Discovery" (via author's bsky thread, via week top arxiv)
Could turn out not to be useful, I'm posting before I start reading carefully and have only skimmed the paper.
Copying the first few posts of that bsky thread here, to reduce trivial inconveniences:
...This paper resolves a key outstanding issue in the literature on the free energy principle (FEP): Namely, to develop a principled approach to the detection of dynamic Markov blankets 2/16
The FEP is a generalized m
sounds interesting if it works as math. have you already written it out in latex or code or similar? I suspect that this is going to turn out to not be incentive compatible. Incentive-compatible "friendly"/"aligned" economic system design does seem like the kind of thing that would fall out of a strong solution to the AI short-through-long-term-notkilleveryone-outcomes problem, though my expectation is basically that when we write this out we'll find severe problems not fully visible beneath the loudness of natural language. If I didn't need to get away from the computer right now I'd even give it a try myself, might get around to that later, p ~= 20%
phew, I have some feelings after reading that, which might indicate useful actions. I wonder if they're feelings in the distribution that the author intended.
I suddenly am wondering if this is what LLMs are. But... maybe not? but I'm not sure. they might be metaphorically somewhat in this direction. clearly not all the way, though.
spoilers, trying to untangle the worldbuilding:
seems like perhaps the stars are actually projecting light like that towards this planet - properly designed satellites could be visible during the day with the help of careful
I read the 'stars' as simply very dense low-orbiting satellites monitoring the ground 24/7 for baseline humans to beam low-latency optical propaganda at. The implied King's Pact presumably is something like, "the terrestrial Earth will be left unmodified and no AI are allowed to directly communicate or interact with or attempt to manipulate baseline humans", and so satellites, being one-way broadcasts outside the Earth, don't violate it. This then allows the bootstrap of all the other attacks: someone looks up at night long enough, they get captured, start...
Sounds interesting, I talk to LLMs quite a bit as well, I'm interested in any tricks you've picked up. I put quite a lot of effort into pushing them to be concise and grounded.
eg, I think an LLM bot designed by me would only get banned for being an LLM, despite consistently having useful things to say when writing comments - which, relatedly, would probably not happen super often, despite the AI reading a lot of posts and comments - it would be mostly showing up in threads where someone said something that seemed to need a specific kind of asking them for ...
Encouraging users to explicitly label words as having come from an AI would be appreciated. So would be instructing users on when you personally find it acceptable to share words or ideas that came from an AI. I doubt the answer is "never as part of a main point", though I could imagine that some constraints include "must be tagged to be socially acceptable", and "must be much more dense than is typical for an LLM", and "avoid those annoying keywords LLMs typically use to make their replies shiny". I suspect a lot of what you don't like is that most people...
One need not go off into the woods indefinitely, though.
I buy that training slower is a sufficiently large drawback to break scaling. I still think bees are why the paper got popular. But if intelligence depends on clean representation, interpretability due to clean representation is natively and unavoidably bees. We might need some interpretable-bees insights in order to succeed, it does seem like we could get better regret bound proofs (or heuristic arguments) that go through a particular trained model with better (reliable, clean) interp. But the whole deal is the ai gets to exceed us in ways that make human...
This is just capabilities stuff. I expect that people will use this to train larger networks, as much larger as they can. If your method shrinks the model, it likely induces demand proportionately. In this case it's not new capabilities stuff by you so it's less concerning, bit still. This paper is popular because of bees
My estimate is 97% not sociopaths, but only about 60% inclined to avoid teaming up with sociopaths.
Germline engineering likely destroys most of what we're trying to save, via group conflict effects. There's a reason it's taboo.
I suppose that one might be a me thing. I haven't heard others say it, but it was an insight for me at one point that "oh, it hurts because it's an impact". It had the flavor of expecting a metaphor and not getting one.
Your link to "don't do technical ai alignment" does not argue for that claim. In fact, it appears to be based on the assumption that the opposite is true, but that there are a lot of distractor hypotheses for how to do it that will turn out to be an expensive waste of time.
To be clear, I'm expecting scenarios much more clearly bad than that, like "the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn't manage to break out of it using t...
I mean, we're not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you'd have had if you didn't rush yourself.
"all" humans? like, maybe no, I expect a few would survive, but the future wouldn't be human, it'd be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people's motivation, etc. if you solve "make an ai be entirely obedient to a singl...
I would guess that the range of things people propose for the shell game is tractable to get a good survey of. It'd be interesting to try to plot out the system as a causal graph with recurrence so one can point to, "hey look, this kind of component is present in a lot of places", and see if one can get that causal graph visualization to show enough that it starts to feel clear to people why this is a problem. I doubt I'll get to this, but if I play with this, I might try to visualize it [edit: probably with the help of a skilled human visual artist to mak...
He appears to be arguing against a thing, while simultaneously criticizing people; but I appreciate that he seems to do it in ways that are not purely negative, also mentioning times things have gone relatively well (specifically, updating on evidence that folks here aren't uniquely correct), even if it's not enough to make the rest of his points not a criticism.
I entirely agree with his criticism of the strategy he's criticizing. I do think there are more obviously tenable approaches than the "just build it yourself lol" approach or "just don't let anyone...
[Edit: crash found in the conversations referenced, we'll talk more in DM but not in a hurry. This comment retracted for now]
By "AGI" I mean the thing that has very large effects on the world (e.g., it kills everyone) via the same sort of route that humanity has large effects on the world. The route is where you figure out how to figure stuff out, and you figure a lot of stuff out using your figure-outers, and then the stuff you figured out says how to make powerful artifacts that move many atoms into very specific arrangements.
delete "it kills everyon...
@daniel k I just can never remember your last name's spelling, sorry, heh. My point in saying this is that my prediction approach up to 2020 was similar to, though not as refined as, yours, and that instead of trying to argue my views (which differ from yours in a few trivial ways that are mostly not relevant) I'd rather just point people to your arguments of yours.
When predicting timelines, it matters which benchmark in the compounding returns curve you pick. Your definition minus doom happens earlier, even if the minus doom version is too late to avert in literally all worlds (I doubt that, it's likely more that the most powerful humans[1]'s ELO against AIs falls and falls but takes a while to be indistinguishable from zero).
such as their labs' CEOs, major world leaders, highly skilled human strategists, etc
Your definition of AGI is "that which completely ends the game", source in your link. By that definition I agree with you. By others' definition (which is similar but doesn't rely on the game over clause) I do not.
My timelines have gotten slightly longer since 2020, I was expecting TAI when we got GPT4, and I have recently gone back and discovered I have chatlogs showing I'd been expecting that for years and had specific reasons. I would propose Daniel K. is particularly a good reference.
I also dispute that genuine HLMI refers to something meaningfully different from my definition. I think people are replacing HLMI with "thing that can do all stereotyped, clear-feedback, short-feedback tasks", and then also claiming that this thing can replace many human workers (probably true of 5 or 10 million, false of 500 million) or cause a bunch of unemployment by making many people 5x effective (maybe, IDK), and at that point IDK why we're talking about this, when X-risk is the important thing.
I should also add:
I'm pretty worried that we can't understand the universe "properly" even if we're in base physics! It's not yet clearly forbidden that the foundations of philosophy contain unanswerable questions, things where there's a true answer that affects our universe in ways that are not exposed in any way physically, and can only be referred to by theoretical reasoning; which then relies on how well our philosophy and logic foundations actually have the real universe as a possible referent. Even if they do, things could be annoying. In particular,...
I think that if our future goes well, it will be because we found ways to align AI well enough, and/or because we coordinated politically to slow or stop AI advancement long enough to accomplish the alignment part
Agree
not because researchers avoided measured AI's capabilities.
But differential technological development matters, as does making it clear that when you make a capability game like this, you are probably just contributing to capabilities, not doing alignment. I won't say you should never do that, but I'll say that's what's being done. I pe...
Decision theory as discussed here heavily involves thinking about agents responding to other agents' decision processes
Sims are very cheap compared to space travel, and you need to know what you're dealing with in quite a lot of detail before you fly because you want to have mapped the entire space of possible negotiations in an absolutely ridiculous level of detail.
Sims built for this purpose would still be a lot lower detail than reality, but of course that would be indistinguishable from inside if the sim is designed properly. Maybe most kinds of things despawn in the sim when you look away, for example. Only objects which produce an ongoing computation that has influen...
We have to infer how reality works somehow.
I've been poking at the philosophy of math recently. It really seems like there's no way to conceive of a universe that is beyond the reach of logic except one that also can't support life. Classic posts include unreasonable effectiveness of mathematics, what numbers could not be, a few others. So then we need epistemology.
We can make all sorts of wacky nested simulations and any interesting ones, ones that can support organisms (that is, ones that are Turing complete), can also support processes for predicting ou...
If we have no grasp on anything outside our virtualized reality, all is lost. Therefore I discard my attempts to control those possible worlds.
However, the simulation argument relies on reasoning. To go through requires a number of assumptions hold. Those in turn rely on: why would we be simulated? It seems to me the main reason is because we're near a point of high influence in original reality and they want to know what happened - the simulations then are effectively extremely high resolution memories. Therefore, thank those simulating us for the additio...
willingness seems likely to be understating it. a context where the capability is even part of the author context seems like a prereq. finetuning would produce that, with fewshot one has to figure out how to make it correlate. I'll try some more ideas.
I'm pretty sure this isn't a policy change but rather a policy distillation, and you were operating under the policy described above already. eg, I often have conversations with AIs that I don't want to bother to translate into a whole post, but where I think folks here would benefit from seeing the thread. what I'll likely do is make the AI portions collapsible and the human portions default uncollapsed; often the human side is sufficient to make a point (when the conversation is basically just a human thinking out loud with some helpful feedback), but so... (read more)