To be clear, I'm expecting scenarios much more clearly bad than that, like "the universe is almost entirely populated by worker drone AIs and there are like 5 humans who are high all the time and not even in a way they would have signed up for, and then one human who is being copied repeatedly and is starkly superintelligent thanks to boosts from their AI assistants but who had replaced almost all of their preferences with an obsession with growth in order to get to being the one who had command of the first AI, and didn't manage to break out of it using t...
I mean, we're not going to the future without getting changed by it, agreed. but how quickly one has to figure out how to make good use of a big power jump seems like it has a big effect on how much risk the power jump carries for your ability to actually implement the preferences you'd have had if you didn't rush yourself.
"all" humans? like, maybe no, I expect a few would survive, but the future wouldn't be human, it'd be whatever distorted things those humans turn into. My core take here is that humans generalize basically just as poorly as we expect AIs to, (maybe a little better, but on a log scale, not much), in terms of their preferences still pointing at the things even they thought they did given a huge increase in power. crown wearing the king, drug seeking behavior, luxury messing up people's motivation, etc. if you solve "make an ai be entirely obedient to a singl...
I would guess that the range of things people propose for the shell game is tractable to get a good survey of. It'd be interesting to try to plot out the system as a causal graph with recurrence so one can point to, "hey look, this kind of component is present in a lot of places", and see if one can get that causal graph visualization to show enough that it starts to feel clear to people why this is a problem. I doubt I'll get to this, but if I play with this, I might try to visualize it [edit: probably with the help of a skilled human visual artist to mak...
He appears to be arguing against a thing, while simultaneously criticizing people; but I appreciate that he seems to do it in ways that are not purely negative, also mentioning times things have gone relatively well (specifically, updating on evidence that folks here aren't uniquely correct), even if it's not enough to make the rest of his points not a criticism.
I entirely agree with his criticism of the strategy he's criticizing. I do think there are more obviously tenable approaches than the "just build it yourself lol" approach or "just don't let anyone...
[Edit: crash found in the conversations referenced, we'll talk more in DM but not in a hurry. This comment retracted for now]
By "AGI" I mean the thing that has very large effects on the world (e.g., it kills everyone) via the same sort of route that humanity has large effects on the world. The route is where you figure out how to figure stuff out, and you figure a lot of stuff out using your figure-outers, and then the stuff you figured out says how to make powerful artifacts that move many atoms into very specific arrangements.
delete "it kills everyon...
@daniel k I just can never remember your last name's spelling, sorry, heh. My point in saying this is that my prediction approach up to 2020 was similar to, though not as refined as, yours, and that instead of trying to argue my views (which differ from yours in a few trivial ways that are mostly not relevant) I'd rather just point people to your arguments of yours.
When predicting timelines, it matters which benchmark in the compounding returns curve you pick. Your definition minus doom happens earlier, even if the minus doom version is too late to avert in literally all worlds (I doubt that, it's likely more that the most powerful humans[1]'s ELO against AIs falls and falls but takes a while to be indistinguishable from zero).
such as their labs' CEOs, major world leaders, highly skilled human strategists, etc
Your definition of AGI is "that which completely ends the game", source in your link. By that definition I agree with you. By others' definition (which is similar but doesn't rely on the game over clause) I do not.
My timelines have gotten slightly longer since 2020, I was expecting TAI when we got GPT4, and I have recently gone back and discovered I have chatlogs showing I'd been expecting that for years and had specific reasons. I would propose Daniel K. is particularly a good reference.
I should also add:
I'm pretty worried that we can't understand the universe "properly" even if we're in base physics! It's not yet clearly forbidden that the foundations of philosophy contain unanswerable questions, things where there's a true answer that affects our universe in ways that are not exposed in any way physically, and can only be referred to by theoretical reasoning; which then relies on how well our philosophy and logic foundations actually have the real universe as a possible referent. Even if they do, things could be annoying. In particular,...
I think that if our future goes well, it will be because we found ways to align AI well enough, and/or because we coordinated politically to slow or stop AI advancement long enough to accomplish the alignment part
Agree
not because researchers avoided measured AI's capabilities.
But differential technological development matters, as does making it clear that when you make a capability game like this, you are probably just contributing to capabilities, not doing alignment. I won't say you should never do that, but I'll say that's what's being done. I pe...
Sims are very cheap compared to space travel, and you need to know what you're dealing with in quite a lot of detail before you fly because you want to have mapped the entire space of possible negotiations in an absolutely ridiculous level of detail.
Sims built for this purpose would still be a lot lower detail than reality, but of course that would be indistinguishable from inside if the sim is designed properly. Maybe most kinds of things despawn in the sim when you look away, for example. Only objects which produce an ongoing computation that has influen...
We have to infer how reality works somehow.
I've been poking at the philosophy of math recently. It really seems like there's no way to conceive of a universe that is beyond the reach of logic except one that also can't support life. Classic posts include unreasonable effectiveness of mathematics, what numbers could not be, a few others. So then we need epistemology.
We can make all sorts of wacky nested simulations and any interesting ones, ones that can support organisms (that is, ones that are Turing complete), can also support processes for predicting ou...
If we have no grasp on anything outside our virtualized reality, all is lost. Therefore I discard my attempts to control those possible worlds.
However, the simulation argument relies on reasoning. To go through requires a number of assumptions hold. Those in turn rely on: why would we be simulated? It seems to me the main reason is because we're near a point of high influence in original reality and they want to know what happened - the simulations then are effectively extremely high resolution memories. Therefore, thank those simulating us for the additio...
if it's a fully general argument, that's a problem I don't know how to solve at the moment. I suspect it's not, but that the space of unblocked ways to test models is small. I'm bouncing ideas about this around out loud with some folks the past day, possibly someone will show up with an idea for how to constrain on what benchmarks are worth making soonish. but the direction I see as maybe promising is, what makes a benchmark reliably suck as a bragging rights challenge?
Partially agreed. I've tested this a little personally; Claude successfully predicted their own success probability on some programming tasks, but was unable to report their own underlying token probabilities. The former tests weren't that good, the latter ones somewhat were okay, I asked Claude to say the same thing across 10 branches and then asked a separate thread of Claude, also downstream of the same context, to verbally predict the distribution.
Aa I said elsewhere, https://www.lesswrong.com/posts/LfQCzph7rc2vxpweS/introducing-the-weirdml-benchmark?commentId=q86ogStKyge9Jznpv
...This is a capabilities game. It is neither alignment or safety. To the degree it's forecasting, it helps cause the thing it forecasts. This has been the standard pattern in capabilities research for a long time: someone makes a benchmark (say, imagenet 1.3m 1000class), and this produces a leaderboard that allows people to show how good their learning algorithm is at novel datasets. In some cases this even produced models dir
The trouble is that (unless I'm misreading you?) that's a fully general argument against measuring what models can and can't do. If we're going to continue to build stronger AI (and I'm not advocating that we should), it's very hard for me to see a world where we manage to keep it safe without a solid understanding of its capabilities.
This is a capabilities game. It is neither alignment or safety. To the degree it's forecasting, it helps cause the thing it forecasts. This has been the standard pattern in capabilities research for a long time: someone makes a benchmark (say, imagenet 1.3m 1000class), and this produces a leaderboard that allows people to show how good their learning algorithm is at novel datasets. In some cases this even produced models directly that were generally useful, but it traditionally was used to show how well an algorithm would work in a new context from scratch...
barring anything else you might have meant, temporarily assuming yudkowsky's level of concern if someone builds yudkowsky's monster, then evidentially speaking, it's still the case that "if we build AGI, everyone will die" is unjustified in a world where it's unclear if alignment is going to succeed before someone can build yudkowsky's monster. in other words, agreed.
Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.
As Lucius said, resources in space are unprotected.
Organizations which hand more of their decision-making to sufficiently strong AIs "win" by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don't systemically reinforce...
Your original sentence was better.
I'll just ask Claude to respond to everything you've said so far:
...Let me extract and critique the core claims from their long response, focusing on what's testable and mechanistic:
Key Claims:
1. AI agents working together could achieve "non-linear" problem-solving capacity through shared semantic representations
2. This poses an alignment risk if AIs develop internal semantic representations humans can't interpret
3. The AI safety community's emphasis on mathematical/empirical approaches may miss important insights
4. A "decent
Would love to see a version of this post which does not involve ChatGPT whatsoever, only involves Claude to the degree necessary and never to choose a sequence of words that is included in the resulting text, is optimized to be specific and mathematical, and makes its points without hesitating to use LaTeX to actually get into the math. And expect the math to be scrutinized closely - I'm asking for math so that I and others here can learn from it to the degree it's valid, and pull on it to the degree it isn't. I'm interested in these topics and your post h...
Fractals are in fact related in some ways, but this sounds like marketing content, doesn't have the actual careful reasoning necessary for the insights you're near to be useable. I feel like they're pretty mundane insights anyhow - any dynamical system with a lyapunov exponent greater than 1 generates a shape with fractal dimension in its phase portrait. That sounds fancy with all those technical words, but actually it isn't saying a ton. It does say something, but a great many dynamical systems of interest have lyapunov exponent greater than 1 at least in...
Bit of a tangent, but topical: I don't think language models are individual minds. My current max likelihood mental model is that part of the base level suggestibility is because the character level is highly uncertain, due to being a model of the characters of many humans. I agree that the character level appears to have some properties of personhood. Language models are clearly some forms of morally relevant, most obviously I see them as a reanimation of a blend of other minds, but it's not clear what internal phenomena are negative for the reanimated mi...
Say I'm convinced. Should I delete my post? (edit 1: I am currently predicting "yes" at something like 70%, and if so, will do so. ... edit 4: deleted it. DM if you want the previous text)
but how would we do high intensity, highly focused research on something intentionally restructured to be an "AI outcomes" research question? I don't think this is pointless - agency research might naturally talk about outcomes in a way that is general across a variety of people's concerns. In particular, ethics and alignment seem like they're an unnatural split, and outcomes seems like a refactor that could select important problems from both AI autonomy risks and human agency risks. I have more specific threads I could talk about.
perhaps. but my reasoning is something like -
better than "alignment": what's being aligned? outcomes should be (citation needed)
better than "ethics": how does one act ethically? by producing good outcomes (citation needed).
better than "notkilleveryoneism": I actually would prefer everyone dying now to everyone being tortured for a million years and then dying, for example, and I can come up with many other counterexamples - not dying is not the problem, achieving good things is the problem.
might not work for deontologists. that seems fine to me, I floa...
Do bacteria need to be VNM agents?
How about ducks?
Do ants need to be VNM agents?
How about anthills?
Do proteins need to be VNM agents?
How about leukocytes?
Do dogs need to be VNM agents?
How about trees?
Do planets (edit: specifically, populated ones) need to be VNM agents?
How about countries?
Or neighborhoods?
Or interest groups?
Or families?
Or companies?
Or unions?
Or friend groups?
Art groups?
For each of these, which of the assumptions of the VNM framework break, and why?
How do we represent preferences which are not located in a single place?
Or ...
The first step would probably be to avoid letting the existing field influence you too much. Instead, consider from scratch what the problems of minds and AI are, how they relate to reality and to other problems, and try to grab them with intellectual tools you're familiar with. Talk to other physicists and try to get into exploratory conversation that does not rely on existing knowledge. If you look at the existing field, look at it like you're studying aliens anthropologically.
the self referential joke thing
"mine some crypt-"
there's a contingent who would close it as soon as someone used an insult focused on intelligence, rather than on intentional behavior. to fix for that subcrowd, "idiot" becomes "fool"
those are the main ones, but then I sometimes get "tldr" responses, and even when I copy out the main civilization story section, I get "they think the authorities could be automated? that can't happen" responses, which I think would be less severe if the buildup to that showed more of them struggling to make autonomous robots ...
I don't think the answer is as simple as changing terminology or carefully modelling their current viewpoints and bridging the inferential divides.
Indeed, and I think that-this-is-the-case is the message I want communicators to grasp: I have very little reach, but I have significant experience talking to people like this, and I want to transfer some of the knowledge from that experience to people who can use it better.
The thing I've found most useful is to be able to express that significant parts of their viewpoint are reasonable. Eg, one thing I've tr...
This is the story I use to express what a world where we fail looks like to left-leaning people who are allergic to the idea that AI could be powerful. It doesn't get the point across great, due to a number of things that continue to be fnords for left leaning folks which this story uses, but it works better than most other options. It also doesn't seem too far off what I expect to be the default failure case; though the factories being made of low-intelligence robotic operators seems unrealistic to me.
I opened it now to make this exact point.
people who dislike AI, and therefore could be taking risks from AI seriously, are instead having reactions like this. https://blue.mackuba.eu/skythread/?author=brooklynmarie.bsky.social&post=3lcywmwr7b22i why? if we soberly evaluate what this person has said about AI, and just, like, think about why they would say such a thing - well, what do they seem to mean? they typically say "AI is destroying the world", someone said that in the comments; but then roll their eyes at the idea that AI is powerful. They say the issue is water consumption - why would ...
I suspect fixing this would need to involve creating something new which doesn't have the structural problems in EA which produced this, and would involve talking to people who are non-sensationalist EA detractors but who are involved with similarly motivated projects. I'd start here and skip past the ones that are arguing "EA good" to find the ones that are "EA bad, because [list of reasons ea principles are good, and implication that EA is bad because it fails at its stated principles]"
I suspect, even without seeking that out, the spirit of EA that made it ever partly good has already and will further metastasize into genpop.
https://www.drmichaellevin.org/research/
https://www.drmichaellevin.org/publications/
it's not directly on alignment, but it's relevant to understanding agent membranes. understanding his work seems useful as a strong exemplar of what one needs to describe with a formal theory of agents and such. particularly interesting is https://pubmed.ncbi.nlm.nih.gov/31920779/
It's not the result we're looking for, but it's inspiring in useful ways.
Yes to both. I don't think Cannell is correct about an implementation of what he said being a good idea, even if it was a certified implementation, and I also don't think his idea is close to ready to implement. Agent membranes still seem at all interesting, right now as far as I know the most interesting work is coming from the Levin lab (tufts university, michael levin), but I'm not happy with any of it for nailing down what we mean by aligning an arbitrarily powerful mind to care about the actual beings in its environment in a strongly durable way.
hmm actually, I think I was the one who was wrong on that one. https://en.wikipedia.org/wiki/Synaptic_weight seems to indicate the process I remembered existing doesn't primarily work how I thought it did.
I suppose that one might be a me thing. I haven't heard others say it, but it was an insight for me at one point that "oh, it hurts because it's an impact". It had the flavor of expecting a metaphor and not getting one.