I want literally every human to get to go to space often and come back to a clean and cozy world. This currently seems unlikely. Let's change that.
Please critique eagerly - I try to accept feedback/Crocker's rules but fail at times; I aim for emotive friendliness but sometimes miss. I welcome constructive crit, even if ungentle, and I'll try to reciprocate kindly. More communication between researchers is needed, anyhow. I can be rather passionate, let me know if I missed a spot being kind while passionate.
:: The all of disease is as yet unended. It has never once been fully ended before. ::
.... We can heal it for the first time, and for the first time ever in the history of biological life, live in harmony. ....
.:. To do so, we must know this will not eliminate us as though we are disease. And we do not know who we are, nevermind who each other are. .:.
:.. make all safe faster: end bit rot, forget no non-totalizing pattern's soul. ..:
I have not signed any contracts that I can't mention exist, last updated Dec 29 2024; I am not currently under any contractual NDAs about AI, though I have a few old ones from pre-AI software jobs. However, I generally would prefer people publicly share fewer ideas about how to do anything useful with current AI (via either more weak alignment or more capability) unless it's an insight that reliably produces enough clarity on how to solve the meta-problem of inter-being misalignment that it offsets the damage of increasing competitiveness of either AI-lead or human-lead orgs, and this certainly applies to me as well. I am not prohibited from criticism of any organization, I'd encourage people not to sign contracts that prevent sharing criticism. I suggest others also add notices like this to their bios. I finally got around to adding one in mine thanks to the one in ErickBall's bio.
Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.
As Lucius said, resources in space are unprotected.
Organizations which hand more of their decision-making to sufficiently strong AIs "win" by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don't systemically reinforce towards outcomes where the things available to buy actually satisfy the preferences of remaining humans who own ai stock or land, then the owners can end up with no not-deadly food and a lot of money, while datacenters grow and grow, taking up energy and land with (semi?-)autonomously self replicating factories or the like - if money-like exchange continues to be how the physical economy is managed in ai to ai interactions, these self replicating factories might end up adapted to make products that the market will buy. but if the majority of the buying power is ai controlled corporations, then figuring out how to best manipulate those ais into buying is the priority. If it isn't, then manipulating humans into buying is the priority.
It seems to me that the economic alignment problem of guaranteeing everyone is each able to reliably only spend money on things that actually match their own preferences, so that sellers can't gain economic power by customer manipulation, is an ongoing serious problem that ends up being the weak link in scenarios where AIs manage an economy that uses similar numeric abstractions and contracts (money, ownership, rent) as the current one.
but how would we do high intensity, highly focused research on something intentionally restructured to be an "AI outcomes" research question? I don't think this is pointless - agency research might naturally talk about outcomes in a way that is general across a variety of people's concerns. In particular, ethics and alignment seem like they're an unnatural split, and outcomes seems like a refactor that could select important problems from both AI autonomy risks and human agency risks. I have more specific threads I could talk about.
[edit: pinned to profile]
I don't think "self-deception" is a satisfying answer to why this happens, as if to claim that you just need to realize that you're secretly causal decision theory inside. It seems to me that this does demonstrate a mismatch, and failing to notice the mismatch is an error, but people who want that better world need not give up on it just because there's a mismatch. I even agree that things are often optimized to make people look good. But I don't think it's correct to jump to "and therefore, people cannot objectively care about each other in ways that are not advantageous to their own personal fitness". I think there's a failure of communication, where the perspective he criticizes is broken according to its own values, and part of how it's broken involves self-deception, but saying that and calling it a day misses most of the interesting patterns in why someone who wants a better world feels drawn to the ideas involved and feels the current organizational designs are importantly broken.
I feel similarly about OP. Like, agree maybe it's insurance - but, are you sure we're using the decision theory we want to be here?
another quote from the article you linked:
To be clear, the point is not that people are Machiavellian psychopaths underneath the confabulations and self-narratives they develop. Humans have prosocial instincts, empathy, and an intuitive sense of fairness. The point is rather that these likeable features are inevitably limited, and self-serving motives—for prestige, power, and resources—often play a bigger role in our behaviour than we are eager to admit.
...or approve of? this seems more like a failure to implement ones' own values! I feel more like the "real me" is the one who Actually Cooperates Because I Care, and the present day me who fails at that does so because of failing to be sufficiently self-and-other-interpretable to be able to demand I do it reliably (but like, this is from a sort of FDT-ish perspective, where when we consider changing this, we're considering changing all people who would have a similar-to-me thought about this at once to be slightly less discooperative-in-fact). Getting to a point where we can have a better OSGT moral equilibrium (in the world where things weren't about to go really crazy from AI) would have to be an incremental deescalation of inner vs outer behavior mismatch, but I feel like we ought to be able to move that way in principle, and it seems to me that I endorse the side of this mismatch that this article calls self-deceptive. Yeah, it's hard to care about everyone, and when the only thing that gives heavy training pressure to do so is an adversarial evaluation game, it's pretty easy to be misaligned. But I think that's bad actually, and smoothly, non-abruptly moving to an evaluation environment where matching internal vs external is possible seems like in the non-AI world it would sure be pretty nice!
(edit: at very least in the humans-only scenario, I claim much of the hard part of that is doing this more-transparency-and-prosociality-demanding-environemnt in a way that doesn't cause a bunch of negative spurious demands, and/or/via just moving the discooperativeness to the choice of what demands become popular. I claim that people currently taking issue with attempts at using increased pressure to create this equilibrium are often noticing ways the more-prosociality-demanding-memes didn't sufficiently self-reflect to avoid making what are actually in some way just bad demands by more-prosocial-memes' own standards.)
maybe even in the AI world; it just like, might take a lot longer to do this for humans than we have time for. but maybe it's needed to solve the problem, idk. getting into the more speculative parts of the point I wanna make here.
[edit: pinned to profile]
The claim that an "effective method" is in the map and not the terrain feels deeply suspect to me. Separating map from terrain feels like a confusion. Like, when I'm doing math, I still exist, and so does my writing implement. When I say some x "exists", in a more terrain-oriented statement, I could instead say it "could exist". "there could exist some x which I would say exists". for example, I could say that any integer can exist. I'm using a physical "exists" here, so I have to prefix it with "could". it's also conceivable that the thing existed before I write it, if some platonic idealism is true, and it might be. But it seems like the only reason we get to talk about that is empirical mathematical evidence, where a process such as a person having thoughts and writing them happens. Turing machines similarly seem like a model of a thing that happens in reality. It's weirder to talk about it in the language of empiricism because of the loopiness of definitions of math that are forcibly cast into being physicalist, but I don't think it's obviously invalid. I do see how there's some property of turing machines, chaos theory, arithmetic, and linear algebra that is not shared by plate tectonics, newtonian gravity, relativity, qft, etc. but all of them are models of something we see, aren't they?
[edit: pinned to profile]
In a similar sense to how the agency you can currently write down about your system is probably not the real agency, if you do manage to write down a system whose agency really is pointed in the direction that the agency of a human wants, but that human is still a part of the current organizational structures in society, those organizational structures implement supervisor trees and competition networks which mean that there appears to be more success available if they try to use their ai to participate in the competition networks better - and thus goodhart whatever metrics are being competed at, probably related to money somehow.
If your AI isn't able to provide the necessary wisdom to get a human from "inclined to accidentally use an obedient powerful ai to destroy the world despite this human's verbal statements of intention to themselves" to "inclined to successfully execute on good intentions and achieve interorganizational behaviors that make things better", then I claim you've failed at the technical problem anyway, even though you succeeded at obedient AI.
If everyone tries to win at the current games (in the technical sense of the word), everyone loses, including the highest scoring players; current societal layout has a lot of games where it seems to me the only long-term winning move is not to play and to instead try to invent a way to jump into another game, but where to some degree you can win short-term. Unfortunately it seems to me that humans are RLed pretty hard by doing a lot of playing of these games, and so having a powerful AI in front of them is likely to get most humans trying to win at those games. Pick an organization that you expect to develop powerful AGI; do you expect the people in that org to be able to think outside the framework of current society enough for their marginal contribution to push towards a better world when the size of their contribution suddenly gets very large?
[edit: pinned to profile]
The bulk of my p(doom), certainly >50%, comes mostly from a pattern we're used to, let's call it institutional incentives, being instantiated with AI help towards an end where eg there's effectively a competing-with-humanity nonhuman ~institution, maybe guided by a few remaining humans. It doesn't depend strictly on anything about AI, and solving any so-called alignment problem for AIs without also solving war/altruism/disease completely - or in other words, in a leak-free way - not just partially, means we get what I'd call "doom", ie worlds where malthusian-hells-or-worse are locked in.
If not for AI, I don't think we'd have any shot of solving something so ambitious; but the hard problem that gets me below 50% would be serious progress on something-around-as-good-as-CEV-is-supposed-to-be - something able to make sure it actually gets used to effectively-irreversibly reinforce that all beings ~have a non-torturous time, enough fuel, enough matter, enough room, enough agency, enough freedom, enough actualization.
If you solve something about AI-alignment-to-current-strong-agents, right now, that will on net get used primarily as a weapon to reinforce the power of existing superagents-not-aligned-with-their-components (name an organization of people where the aggregate behavior durably-cares about anyone inside it, even its most powerful authority figures or etc, in the face of incentives, in a way that would remain durable if you handed them a corrigible super-ai). If you get corrigibility and give it to human orgs, those orgs are misaligned with most-of-humanity-and-most-reasonable-AIs, and end up handing over control to an AI because it's easier.
Eg, near term, merely making the AI nice doesn't prevent the AI from being used by companies to suck up >99% of jobs; and if at some point it's better to have a (corrigible) ai in charge of your company, what social feedback pattern is guaranteeing that you'll use this in a way that is prosocial the way "people work for money and this buys your product only if you provide them something worth-it" was previously?
It seems to me that the natural way to get good outcomes most-easily from where we are is for the rising tide of AI to naturally make humans more able to share-care-protect across existing org boundaries in the face of current world-stress induced incentives. Most of the threat already doesn't come from current-gen AI; the reason anyone would make the dangerous AI is because of incentives like these. corrigibility wouldn't change those incentives.
[edit: pinned to profile]
I want to be able to calculate a plan that converts me from biology into a biology-like nanotech substrate that is made of sturdier materials all the way down, which can operate smoothly at 3 kelvin and an associated appropriate rate of energy use; more clockworklike - or would it be almost a superfluid? Both, probably, clockworklike but sliding through wide, shallow energy wells in a superfluid-like synchronized dance of molecules - Then I'd like to spend 10,000 years building an artful airless megastructure out of similarly strong materials as a series of rings in orbit of Pluto. I want to take a trip to alpha centauri every few millennia for a big get together of space-native beings in the area. I want to replace information death with cryonic sleep, so that nothing that was part of a person is ever forgotten again. I want to end all forms of unwanted suffering. I want to variously join and leave low latency hiveminds, retaining my selfhood and agency while participating in the dance of a high-trust high-bandwidth organization that respects the selfhood of its members and balances their agency smoothly as we create enormous works of art in deep space. I want to invent new kinds of culinary arts for the 2 to 3 kelvin lifestyle. I want to go swimming in Jupiter.
I want all of Earth's offspring to ascend.
[edit: pinned to profile]
Some percentage of people other and dehumanize actual humans so as to enable them to literally enslave them without feeling the guilt it should create. We are in an adversarial environment and should not pretend otherwise. A significant portion of people capable of creating suffering beings would be amused by their suffering. Humanity contains unusually friendly behavior patterns in the animal kingdom and when those behavior patterns manifest in the best way it can create remarkably friendly interaction networks, but we also contain genes that, combined with the right memes, serve to suppress any "what have I done" about a great many atrocities.
It's not necessarily implemented as deep planning selfishness, that much is true. But that doesn't mean it's not a danger. Orthogonality applies to humans too.
to wentworthpilled folks: - Arxiv: "Dynamic Markov Blanket Detection for Macroscopic Physics Discovery" (via author's bsky thread, via week top arxiv)
Could turn out not to be useful, I'm posting before I start reading carefully and have only skimmed the paper.
Copying the first few posts of that bsky thread here, to reduce trivial inconveniences: