Wiki Contributions

Comments

These are valid concerns! I presume that if "in the real timeline" there was a consortium of AGI CEOs who agreed to share costs on one run, and fiddled with their self-inserts, then they... would have coordinated more? (Or maybe they're trying to settle a bet on how the Singularity might counterfactually might have happened in the event of this or that person experiencing this or that coincidence? But in that case I don't think the self inserts would be allowed to say they're self inserts.)

Like why not re-roll the PRNG, to censor out the counterfactually simulable timelines that included me hearing from any of the REAL "self inserts of the consortium of AGI CEOS" (and so I only hear from "metaphysically spurious" CEOs)??

Or maybe the game engine itself would have contacted me somehow to ask me to "stop sticking causal quines in their simulation" and somehow I would have been induced by such contact to not publish this?

Mostly I presume AGAINST "coordinated AGI CEO stuff in the real timeline" along any of these lines because, as a type, they often "don't play well with others". Fucking oligarchs... maaaaaan.

It seems like a pretty normal thing, to me, for a person to naturally keep track of simulation concerns as a philosophic possibility (its kinda basic "high school theology" right?)... which might become one's "one track reality narrative" as a sort of "stress induced psychotic break away from a properly metaphysically agnostic mental posture"?

That's my current working psychological hypothesis, basically.

But to the degree that it happens more and more, I can't entirely shake the feeling that my probability distribution over "the time T of a pivotal acts occurring" (distinct from when I anticipate I'll learn that it happened which of course must be LATER than both T and later than now) shouldn't just include times in the past, but should actually be a distribution over complex numbers or something...

...but I don't even know how to do that math? At best I can sorta see how to fit it into exotic grammars where it "can have happened counterfactually" or so that it "will have counterfactually happened in a way that caused this factually possible recurrence" or whatever. Fucking "plausible SUBJECTIVE time travel", fucking shit up. It is so annoying.

Like... maybe every damn crazy AGI CEO's claims are all true except the ones that are mathematically false?

How the hell should I know? I haven't seen any not-plausibly-deniable miracles yet. (And all of the miracle reports I've heard were things I was pretty sure the Amazing Randi could have duplicated.)

All of this is to say, Hume hasn't fully betrayed me yet!

Mostly I'll hold off on performing normal updates until I see for myself, and hold off on performing logical updates until (again!) I see a valid proof for myself <3

For most of my comments, I'd almost be offended if I didn't say something surprising enough to get a "high interestingness, low agreement" voting response. Excluding speech acts, why even say things if your interlocutor or full audience can predict what you'll say?

And I usually don't offer full clean proofs in direct word. Anyone still pondering the text at the end, properly, shouldn't "vote to agree", right? So from my perspective... its fine and sorta even working as intended <3

However, also, this is currently the top-voted response to me, and if William_S himself reads it I hope he answers here, if not with text then (hopefully? even better?) with a link to a response elsewhere?

((EDIT: Re-reading everything above his, point, I notice that I totally left out the "basic take" that might go roughly like "Kurzweil, Altman, and Zuckerberg are right about compute hardware (not software or philosophy) being central, and there's a compute bottleneck rather than a compute overhang, so the speed of history will KEEP being about datacenter budgets and chip designs, and those happen on 6-to-18-month OODA loops that could actually fluctuate based on economic decisions, and therefore its maybe 2026, or 2028, or 2030, or even 2032 before things pop, depending on how and when billionaires and governments decide to spend money".))

Pulling honest posteriors from people who've "seen things we wouldn't believe" gives excellent material for trying to perform aumancy... work backwards from their posteriors to possible observations, and then forwards again, toward what might actually be true :-)

I look forward to your reply!

(And regarding "food cost psychology" this is an area where I think Neo Stoic objectivity is helpful. Rich people can pick up a lot of hedons just from noticing how good their food is, and formerly poor people have a valuable opportunity to re-calibrate. There are large differences in diet between socio-economic classes still, and until all such differences are expressions of voluntary preference, and "dietary price sensitivity has basically evaporated", I won't consider the world to be post-scarcity. Each time I eat steak, I can't help but remember being asked in Summer Camp as a little kid, after someone ask "if my family was rich" and I didn't know, about this... like the very first "objective calibrating response" accessible to us as children was the rate of my family's steak consumption. Having grown up in some amount of poverty, I often see "newly rich people" eating as if their health is not the price of slightly more expensive food, or their health is "not worth avoiding the terrible terrible sin of throwing food in the garbage (which my aunt who lived through the Great Depression in Germany yelled at me, once, with great feeling, for doing, when I was child and had eaten less than ALL the birthday cake that had been put on my plate)". Cultural norms around food are fascinating and, in my opinion, are often rewarding to think about.)

What are your timelines like? How long do YOU think we have left?

I know several CEOs of small AGI startups who seem to have gone crazy and told me that they are self inserts into this world, which is a simulation of their original self's creation. However, none of them talk about each other, and presumably at most one of them can be meaningfully right?

One AGI CEO hasn't gone THAT crazy (yet), but is quite sure that the November 2024 election will be meaningless because pivotal acts will have already occurred that make nation state elections visibly pointless.

Also I know many normies who can't really think probabilistically and mostly aren't worried at all about any of this... but one normy who can calculate is pretty sure that we have AT LEAST 12 years (possibly because his retirement plans won't be finalized until then). He also thinks that even systems as "mere" as TikTok will be banned before the November 2024 election because "elites aren't stupid".

I think I'm more likely to be better calibrated than any of these opinions, because most of them don't seem to focus very much on "hedging" or "thoughtful doubting", whereas my event space assigns non-zero probability to ensembles that contain such features of possible futures (including these specific scenarios).

Answer by JenniferRMMay 03, 2024284

It was a time before LSTMs or Transformers, a time before Pearlian Causal Graphs, a time before computers.

Indeed, it was even a time before Frege or Bayes. It was a time and place where even arabic numerals had not yet memetically infected the minds of people to grant them the powers of swift and easy mental arithmetic, and where non-syllabic alphabets (with distinct consonants and vowels) were still kinda new...

...in that time, someone managed to get credit for inventing the formalization of the syllogism! And he had a whole school for people to get naked and talk philosophy with each other. And he took the raw material of a simple human boy, and programmed that child into a world conquering machine whose great act of horror was to sack Thebes. (It is remarkable how many philosophers are "causally upstream, though a step or two removed" from giant piles of skulls. Hopefully, the "violent tragedy part" can be avoided this time around.)

Libertinism, logic, politics, and hypergraphia were his tools. His name was Aristotle. (Weirdly, way more people name their own children after the person-shaped-machine who was programmed to conquer the world, rather than the person-shaped programmer. All those Alexes and Alexandras, and only a very few Aristotles.)

I  appreciate this response because it stirred up a lot of possible responses, in me, in lots of different directions, that all somehow seems germane to the core goal of securing a Win Conditions for the sapient metacivilization of earth! <3

(A) Physical reality is probably hyper-computational, but also probably amenable to pulling a nearly infinite stack of "big salient features" from a reductively analyzable real world situation. 

My intuition says that this STOPS being "relevant to human interests" (except for modern material engineering and material prosperity and so on) roughly below the level of "the cell".

Other physics with other biochemistry could exist, and I don't think any human would "really care"?

Suppose a Benevolent SAI had already replaced all of our cells with nanobots without our permission AND without us noticing because it wanted to have "backups" or something like that... 

(The AI in TMOPI does this much less elegantly, because everything in that story is full of hacks and stupidity. The overall fact that "everything is full of hacks and stupidity" is basically one of the themes of that novel.)

Contingent on a Benevoent SAI having thought it had good reason to do such a thing, I don't think that once we fully understand the argument in favor of doing it that we would really have much basis for objecting?

But I don't know for sure, one way or the other...

((To be clear, in this hypothetical, I think I'd volunteer to accept the extra risk to be one of the last who was "Saved" this way, and I'd volunteer to keep the secret, and help in a QA loop of grounded human perceptual feedback, to see if some subtle spark of magical-somethingness had been lost in everyone transformed this way? Like... like hypothetically "quantum consciousness" might be a real thing, and maybe people switched over to running atop "greygoo" instead of our default "pinkgoo" changes how "quantum consciousness" works and so the changeover would non-obviously involve a huge cognitive holocaust of sorts? But maybe not! Experiments might be called for... and they might need informed consent? ...and I think I'd probably consent to be in "the control group that is unblinded as part of the later stages of the testing process" but I would have a LOT of questions before I gave consent to something Big And Smart that respected "my puny human capacity to even be informed, and 'consent' in some limited and animal-like way".))

What I'm saying is: I think maybe NORMAL human values (amongst people with default mental patterns rather than weirdo autists who try to actually be philosophically coherent and ended up with utility functions that have coherently and intentionally unbounded upsides) might well be finite, and a rule for granting normal humans a perceptually indistinguishable version of "heaven" might be quite OK to approximate with "a mere a few billion well chosen if/then statements".

To be clear, the above is a response to this bit:

As such, I think the linear separability comes from the power of the "lol stack more layers" approach, not from some intrinsic simple structure of the underlying data. As such, I don't expect very much success for approaches that look like "let's try to come up with a small set of if/else statements that cleave the categories at the joints instead of inelegantly piling learned heuristics on top of each other".

And:

I don't think that such a model would succeed because it "cleaves reality at the joints" though, I expect it would succeed because you've managed to find a way that "better than chance" is good enough and you don't need to make arbitrarily good predictions.

Basically, I think "good enough" might be "good enough" for persons with finite utility functions?

(B) A completely OTHER response here is that you should probably take care to NOT aim for something that is literally mathematically impossible...

Unless this is part of some clever long term cognitive strategy, where you try to prove one crazy extreme, and then its negation, back and forth, as a sort of "personally implemented GAN research process" (and even then?!)...

...you should probably not spend much time trying to "prove that 1+1=5" nor try to "prove that the Halting Problem actually has a solution". Personally, any time I reduce a given plan to "oh, this is just the Halting Problem again" I tend to abandon that line of work.

Perfectly fine if you're a venture capitalist, not so great if you're seeking adversarial robustness.

Past a certain point, one can simply never be adversarially robust in a programmatic and symbolically expressible way.

Humans would have to have non-Turing-Complete souls, and so would any hypothetical Corrigible Robot Saint/Slaves, in order to literally 100% prove that literally infinite computational power won't find a way to make things horrible.

There is no such thing as a finitely expressible "Halt if Evil" algorithm...

...unless (I think?) all "agents" involved are definitely not Turing Complete and have no emotional attachments to any questions whose answers partake of the challenges of working with Turing Complete systems? And maybe someone other than me is somehow smart enough to write a model of "all the physics we care about" and "human souls" and "the AI" all in some dependently typed language that will only compile if the compiler can generate and verify a "proof that each program, and ALL programs interacting with each other, halt on all possible inputs"?

My hunch is that that effort will fail, over and over, forever, but I don't have a good clean proof that it will fail.

Note that I'm pretty sure A and B are incompatible takes.

In "take A" I'm working from human subjectivity "down towards physics (through a vast stack of sociology and biology and so on)" and it just kinda seems like physics is safe to throw away because human souls and our humanistically normal concerns are probably mostly pretty "computational paltry" and merely about securing food, and safety, and having OK romantic lives?

In "take B" I'm starting with the material that mathematicians care about, and noticing that it means the project is doomed if the requirement is to have a mathematical proof about all mathematically expressible cares or concerns.

It would be... kinda funny, maybe, to end up believing "we can secure a Win Condition for the Normies (because take A is basically true), but True Mathematicians are doomed-and-blessed-at-the-same-time to eternal recursive yearning and Real Risk (because take B is also basically true)" <3

(C) Chaos is a thing! Even (and especially) in big equations, including the equations of mind that big stacks of adversarially optimized matrices represent!

This isn't a "logically deep" point. I'm just vibing with your picture where you imagine that the "turbulent looking" thing is a metaphor for reality.

In observable practice, the boundary conditions of the equations of AI also look like fractally beautiful turbulence!

I predict that you will be surprised by this empirical result. Here is the "high church papering" of the result:

TITLE: The boundary of neural network trainability is fractal

Abstract: Some fractals -- for instance those associated with the Mandelbrot and quadratic Julia sets -- are computed by iterating a function, and identifying the boundary between hyperparameters for which the resulting series diverges or remains bounded. Neural network training similarly involves iterating an update function (e.g. repeated steps of gradient descent), can result in convergent or divergent behavior, and can be extremely sensitive to small changes in hyperparameters. Motivated by these similarities, we experimentally examine the boundary between neural network hyperparameters that lead to stable and divergent training. We find that this boundary is fractal over more than ten decades of scale in all tested configurations.

Also, if you want to deep dive on some "half-assed peer review of this work" hacker news chatted with itself about this paper at length.

I actually kind of expect this.

Basically, I think that we should expect a lot of SGD results to result in weights that do serial processing on inputs, refining and reshaping the content into twisted and rotated and stretched high dimensional spaces SUCH THAT those spaces enable simply cutoff based reasoning to "kinda really just work".

Like the prototypical business plan needs to explain "enough" how something is made (cheaply) and then explain "enough" how it will be sold (for more money) over time with improvements in the process (according to some growth rate?) with leftover money going back to investors (with corporate governance hewing to known-robust patterns for enabling the excess to be redirected to early investors rather than to managers who did a corruption coup, or a union of workers that isn't interested in sharing with investors and would plausibly decide play the dictator game at the end in an unfair way, or whatever). So if the "governance", "growth rate", "cost", and "sales" dimensions go into certain regions of the parameter space, each one could strongly contribute to a "don't invest" signal, but if they are all in the green zone then you invest... and that's that?

If, after reading this, you still disagree, I wonder if it is more because (1) you don't think that SGD can find space stretching algorithms with that much semantic flexibility or because (2) you don't think any list of less than 20 concepts like this could be found whose thresholds could properly act as gates on an algorithm for making prudent startup investment decisions... or is it something totally else you don't buy (and if so, what)?

I'm impressed by Gaziv et al's "adversarial examples that work on humans" enough to not pause and carefully read the paper, but rather to speculate on how it could be a platform for building things :-)

The specific thing that jumped to mind is Davidad's current request for proposals looking to build up formal languages within which to deploy "imprecise probability" formalisms such that AI system outputs could come with proofs about safely hitting human expressible goals, in these languages, like "solve global warming" while still "avoiding extinction, genocide, poverty, or other dystopian side effects".

I don't know that there is necessarily a lot of overlap in the vocabularies of these two efforts... yet? But the pictures in my head suggest that the math might not actually be that different. There's going to be a lot of "lines and boundaries and paths in a high dimensional space" mixed with fuzzing operations, to try to regularize things until the math itself starts to better match out intuitions around meaning and safety and so on.

This bit caught my eye:

This strong response made me fairly sure that most cheap olive oils in both the US and the UK are (probably illegally) cut with rapeseed oil.

I searched for [is olive oil cut with canola oil] and found that in the twenty teens organized crime was flooding the market with fake olive oil, but in 2022 an EU report suggested that uplabeling to "extra virgin" was the main problem they caught (still?).

Coming from the other direction, in terms of a "solid safe cheap supply"... I can find reports of Extra Virgin Olive Oil being sold by Costco under their Kirkland brand that is particularly well sourced and tested, and my priors say that this stuff is likely to be weirdly high quality for a weirdly low price (because, in general, "kirklandization" is a thing that food producers with a solid product and huge margins worry about). I'm kinda curious if you have access to Kirkland EVOO and if it gives you "preflux"?

Really any extra data here (where your sensitive palate gives insight into the current structure of the food economy) would be fascinating :-)

I wonder what he would have thought was the downside of worshiping a longer list of things...

For the things mentioned, it feels like he thinks "if you worship X then the absence of X will be constantly salient to you in most moments of your life".

It seems like he claims that worshiping some version of Goodness won't eat you alive, but in my experiments with that, I've found that generic Goodness Entities are usually hungry for martyrs, and almost literally try to get would-be saints to "give their all" (in some sense "eating" them). As near as I can tell, it is an unkindness to exhort the rare sort of person who is actually self-editing and scrupulous enough to even understand or apply the injunction in that direction without combining it with an injunction that success in this direction will lead to altruistic self harm unless you make the demands of Goodness "compact" in some way.

Zvi mentions ethics explicitly so I'm pretty sure readings of this sort are "intended". So consider (IF you've decided to try to worship an ethical entity) that one should eventually get ready to follow Zvi's advice in "Out To Get You" for formalized/externalized ethics itself so you can enforce some boundaries on whatever angel you summon (and remember, demons usually claim to be angels (and in the current zeitgeist it is SO WEIRD that so many "scientific rationalists" believe in demons without believing in angels as well)).

Anyway. Compactification (which is possibly the same thing as "converting dangerous utility functions into safe formulas for satisficing"):

Get Compact when you find a rule you can follow that makes it Worth It to Get Got.

The rule must create an acceptable max loss. A well-chosen rule transforms Out to Get You for a lot into Out to Get You for a price you find Worth It. You then Get Got.

This works best using a natural point beyond which lies clear diminishing returns. If no such point exists, be suspicious.

A simple way is a budget. Spend at most $25,000 on this car, or $5,000 on this vacation package. This creates an obvious max dollar loss.

Many budgets should be $0. Example: free to play games. Either it’s worth playing for free or it isn’t. It isn’t.

The downside of budgets is often spending exactly your maximum, especially if others figure out what it is. Do your best to avoid this. Known bug.

An alternative is restriction on type. Go to a restaurant and avoid alcohol, desert and appetizers. Pay in-game only for full game unlocks and storage space.

Budgets can be set for each purchase. Hybrid approaches are good.

Many cap their charitable giving at 10%. Even those giving more reserve some amount for themselves. Same principle.

For other activities, max loss is about time. Again, you can use a (time) budget or limit your actions in a way that restricts (time) spent, or combine both.

Time limits are crude but effective. Limiting yourself to an hour of television or social media per day maxes loss at an hour. This risks making you value the activity more. Often time budgets get exactly spent same as dollar budgets. Try to let unspent time roll over into future periods, to avoid fear or ‘losing’ unspent time.

When time is the limiting factor, it is better where possible to engineer your environment and options to make the activity compact. You’ll get more out of the time you do spend and avoid feeling like you’re arbitrarily cutting yourself off.

Decide what’s worth watching. Watch that.

For Facebook, classify a handful of people See First. See their posts. No others. Look at social media only on computers. Don’t comment. Or post.

A buffet creates overeating. Filling up one plate (or one early to explore, then one to exploit) ends better.

Unlimited often requires limitation.

Outside demands follow the pattern. To make explanation and justification easier, choose good enough rules that sound natural, simple and reasonable.

Experiments need a chance, but also a known point where you can know to call it quits. Ask whether you can get a definitive negative result in reasonable time. Will I worry I did it wrong? Will others claim or assume I did it wrong or didn’t give it a fair chance?

For myself, I have so far found it much easier to worship wisdom than pure benevolence.

Noticing ways that I am a fool is kinda funny. There are a lot of them! So many that patching each such gap would be an endless exercise! The wise thing, of course, would be to prioritize which foolishnesses are most prudent to patch, at which times. A nice thing here is that wisdom basically assimilates all valid criticism as helpful, and often leads to teaching unskilled critics to criticize better, and this seems to make "living in the water" more pleasant (at least in my experience so far).

Load More