LESSWRONG
LW

All of red75prime's Comments + Replies

What concrete fact about the physical world do you think you're missing? What are you ignorant of?

Let's flip very unfair quantum coin with 1:2^1000000 heads to tails chances (that would require quite an engineering feat to prepare such a quantum state, but it's theoretically possible). You shouldn't expect to see heads if the quantum state is prepared correctly, but the post-flip universe (in MWI) contains a branch where you see heads. So, by your logic, you should expect to see both heads and tails even if the state is prepared correctly.

What I do not kno... (read more)

Ackshually, many worlds is wrong

red75prime1y*43

"Thread of subjective experience" was an aside (just one of the mechanisms that explains why we "find ourselves" in a world that behaves according to the Born rule), don't focus too much on it.

The core question is which physical mechanism (everything should be physical, right?) ensures that you almost never will see a string of a billion tails after a billion quantum coin flips, while the universe contains a quantum branch with you looking in astonishment on a string with a billion tails. Why should you expect that it will almost certainly not happen, when... (read more)

Ackshually, many worlds is wrong

red75prime1y*21

I haven't fully understood your stance towards the many minds interpretation. Do you find it unnecessary?

I don’t think either of these Harrys is “preferred”.

And simultaneously you think that existence of future Harries who observe events with probabilities approaching zero is not a problem because current Harry will almost never find himself to be those future Harries. I don't understand what it means exactly.

Harries who observe those rare events exist and they wonder how they found themselves in those unlikely situations. Harries who hadn't found an... (read more)

2Steven Byrnes1y

I just looked up “many minds” and it’s a little bit like what I wrote here, but described differently in ways that I think I don’t like. (It’s possible that Wikipedia is not doing it justice, or that I’m misunderstanding it.) I think minds are what brains do, and I think brains are macroscopic systems that follow the laws of quantum mechanics just like everything else in the universe. Those both happen in the same universe. Those Harry's both exist. Maybe you should put aside many-worlds and just think about Parfit’s teletransportation paradox. I think you’re assuming that “thread of subjective experience” is a coherent concept that satisfies all the intuitive properties that we feel like it should have, and I think that the teletransportation paradox is a good illustration that it’s not coherent at all, or at the very least, we should be extraordinarily cautious when making claims about the properties of this alleged thing you call a “thread of subjective experience” or “thread of consciousness”. (See also other Parfit thought experiments along the same lines.) I don’t like the idea where we talk about what will happen to Harry, as if that has to have a unique answer. Instead I’d rather talk about Harry-moments, where there’s a Harry at a particular time doing particular things and full of memories of what happened in the past. Then there are future Harry-moments. We can go backwards in time from a Harry-moment to a unique (at any given time) past Harry-moment corresponding to it—after all, we can inspect the memories in future-Harry-moment’s head about what past-Harry was doing at that time (assuming there were no weird brain surgeries etc). But we can’t uniquely go in the forward direction: Who’s to say that multiple future-Harry-moments can’t hold true memories of the very same past-Harry-moment? Here I am, right now, a Steve-moment. I have a lot of direct and indirect evidence of quantum interactions that have happened in the past or are happening right now,

Ackshually, many worlds is wrong

red75prime1y21

For example: “as quantum amplitude of a piece of the wavefunction goes to zero, the probability that I will ‘find myself’ in that piece also goes to zero”

What I really don't like about this formulation is extreme vagueness of "I will find myself", which implies that there's some preferred future "I" out of many who is defined not only by observations he receives, but also by being a preferred continuation of subjective experience defined by an unknown mechanism.

It can be formalized as the many minds interpretation, incurring additional complexity penalty a... (read more)

2Steven Byrnes1y

I disagree with this part—if Harry does the quantum equivalent of flipping an unbiased coin, then there’s a branch of the universe’s wavefunction in which Harry sees heads and says “gee, isn’t it interesting that I see heads and not tails, I wonder how that works, hmm why did my thread of subjective experience carry me into the heads branch?”, and there’s also a branch of the universe’s wavefunction in which Harry sees tails and says “gee, isn’t it interesting that I see tails and not heads, I wonder how that works, hmm why did my thread of subjective experience carry me into the tails branch?”. I don’t think either of these Harrys is “preferred”. I don’t think there’s any extra “complexity penalty” associated with the previous paragraph: the previous paragraph is (I claim) just a straightforward description of what would happen if the universe and everything in it (including Harry) always follows the Schrodinger equation—see Quantum Mechanics In Your Face for details. I think we deeply disagree about the nature of consciousness, but that’s a whole can of worms that I really don’t want to get into in this comment thread. Maybe you’re just going for rhetorical flourish, but my specific suggestion with the words “feels more natural” in the context of my comment was: the axiom “I will find myself in a branch of amplitude approaching 0 with probability approaching 0” “feels more natural” than the axiom “I will find myself in a branch of amplitude c with probability |c|2”. That particular sentence was not a comparison of many-worlds with non-many-worlds, but rather a comparison of two ways to formulate many-worlds. So I think your position is that you find neither of those to “feel natural”.

What is the best argument that LLMs are shoggoths?

red75prime1y20

First, a factual statement that is true to the best of my knowledge: LLM state, that is used to produce probability distribution for the next token, is completely determined by the state of its input buffer (plus a bit of indeterminism due to parallel processing and non-associativity of floating point arithmetic).

That is LLM can pass only a single token (around 2 bytes) to its future self. That follows from the above.

What comes next is a plausible (to me) speculation.

For humans what's passed to our future self is most likely much more that a single token. ... (read more)

0th Person and 1st Person Logic

red75prime1y10

Expanding a bit on the topic.

Exhibit A: flip a fair coin and move a suspended robot into a green or red room using a second coin with probabilities (99%, 1%) for heads, and (1%, 99%) for tails.

Exhibit B: flip a fair coin and create 99 copies of the robot in green rooms and 1 copy in a red room for heads, and reverse colors otherwise.

What causes the robot to see red instead of green in exhibit A? Physical processes that brought about a world where the robot sees red.

What causes a robot to see red instead of green in exhibit B? The fact that it sees red, not... (read more)

0th Person and 1st Person Logic

red75prime1y*1-1

I have a solution that is completely underwhelming, but I can see no flaws in it, besides the complete lack of definition of which part of the mental state should be preserved to still count as you and rejection of MWI (as well as I cannot see useful insights into why we have what looks like continuous subjective experience).

You can't consistently assign probabilities for future observations in scenarios where you expect creation of multiple instances of your mental state. All instances exist and there's no counterfactual worlds where you end up as a menta

... (read more)

1red75prime1y

Expanding a bit on the topic. Exhibit A: flip a fair coin and move a suspended robot into a green or red room using a second coin with probabilities (99%, 1%) for heads, and (1%, 99%) for tails. Exhibit B: flip a fair coin and create 99 copies of the robot in green rooms and 1 copy in a red room for heads, and reverse colors otherwise. What causes the robot to see red instead of green in exhibit A? Physical processes that brought about a world where the robot sees red. What causes a robot to see red instead of green in exhibit B? The fact that it sees red, nothing more. The physical instance of the robot who sees red in one possible world, could be the instance who sees green in another possible world, of course (physical causality surely is intact). But a robot-who-sees-red (that is one of the instances who see red) cannot be made into a robot-who-sees-green by physical manipulations. That is subjective causality of seeing red is cut off from physical causes (in the case of multiple copies of an observer). And as such cannot be used as a basis for probabilistic judgements. I guess that if I'll not see a resolution of the Anthropic Trilemma in the framework of MWI in about 10 years, I'll be almost sure that MWI is wrong.

I played the AI box game as the Gatekeeper — and lost

red75prime1y10

Do you think the exploited flaw is universal or, at least, common?

1datawitch1y

Yes.

Scale Was All We Needed, At First

red75prime1y82

Excellent story. But what about "pull the plug" option? ALICE found a way to run itself efficiently on the traditional datacenters that aren't packed with backprop and inference accelerators? And shutting them down would have required too strong a political will than what the government could muster at the time?

3FeepingCreature1y

My impression is that there's been a widespread local breakdown of the monopoly of force, in no small part by using human agents. In this timeline the trend of colocation of datacenters and power plants and network decentralization would have probably continued or even sped up. Further, while building integrated circuits takes first-rate hardware, building ad-hoc powerplants should be well in the power of educated humans with perfect instruction. (Mass cannibalize rooftop solar?) This could have been stopped by quick, decisive action, but they gave it time and now they've lost any central control of the situation.

Is a random box of gas predictable after 20 seconds?

red75prime1y32

Citing https://arxiv.org/abs/cond-mat/9403051: "Furthermore if a quantum system does possess this property (whatever it may be), then we might hope that the inherent uncertainties in quantum mechanics lead to a thermal distribution for the momentum of a single atom, even if we always start with exactly the same initial state, and make the measurement at exactly the same time."

Then the author proceed to demonstrate that it is indeed the case. I guess it partially answers the question: quantum state thermalises and you'll get classical thermal distribution o... (read more)

Is a random box of gas predictable after 20 seconds?

red75prime1y10

we are assuming that without random perturbation, you would get 100% accuracy

That is the question is not about the real argon gas, but about a billiard ball model? It should be stated in the question.

Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events

red75prime1y10

here are creatures in the possible mind space^[3] whose intuition works in the opposite way. They are surprised specifically by the sequence of $H H T H T T H T T H$ and do not mind the sequence of $H H H H H H H H H H$

That is creatures who aren't surprised by outcomes of lower Kolmogorov complexity or not surprised by the fact that the language they use for estimation of Kolmogorov complexity has a special compact case for producing "HHTHTTHTTH".

Looks possible, but not probable.

You can rack up massive amounts of data quickly by asking questions to all your friends

red75prime1y20

For returns below $2000, I'd use 50/50 quantum random strategy just for fun of dropping Omega's stats.

1Neil 1y

There are versions of the thought experiment where if Omega predicts you will choose to use a randomizer, it won't put the money in box B. But in just the default experiment, this seems like an entertaining outcome!

Some quick thoughts on "AI is easy to control"

red75prime1y32

what happens if we automatically evaluate plans generated by superhuman AIs using current LLMs and then launch plans that our current LLMs look at and say, "this looks good".

The obvious failure mode is that LLM is not powerful enough to predict consequences of the plan. The obvious fix is to include human-relevant description of the consequences. The obvious failure modes: manipulated description of the consequences, optimizing for LLM jail-breaking. The obvious fix: ...

I won't continue, but shallow rebuttals is not that convincing, but deep ones is close to capability research, so I don't expect to find interesting answers.

2023 Unofficial LessWrong Census/Survey

red75prime1y10

What if all I can assign is a probability distribution of probabilities? Like in extraterrestrial life question. All that can be said is that extraterrestrial life is sufficiently rare to not find evidence of it yet. Our observation of our existence is conditioned on our existence, so it doesn't provide much evidence one way or another.

Should I sample the distribution to give an answer, or maybe take mode, or mean, or median? I've chosen a value that is far from both extremes, but I might have done something else with no clear justification for any of the choices.

red75prime1y10

This means that LLMs can inadvertently learn to replicate these biases in their outputs.

Or the network learns to trust more the tokens that were already "thought about" during generation.

2Bruce W. Lee1y

How is this possible? We are only inferencing

Why am I Me?

red75prime2y*1-1

Suppose when you are about to die [...] Omega shows up

Suppose something pertaining more to the real world: if you think that you are here and now because there will not be significantly more people in the future, then you are more likely to become depressed.

Also, why Omega uses 95% and not 50%, 10%, or 0.000001%?

ETA: Ah, Omega in this case is an embodiment of the litany of Tarski. Still, if there will be no catastrophe we are those 5% who violate the litany. Not saying that the litany comes closest to useless as it can get when we are talking about a belief in an inevitable catastrophe you can do nothing about.

Lessons On How To Get Things Right On The First Try

red75prime2y32

After all, in the AI situation for which the exercise is a metaphor, we don’t know exactly when something might foom; we want elbow room.

Or you can pretend that you are impersonating an AI that is preparing to go foom.

How could AIs 'see' each other's source code?

red75prime2y10

conduct a hostage exchange by meeting in a neutral country, and bring lots of guns and other hostages they intend not to exchange that day

That is they alter payoff matrix instead of trying to achieve CC in prisoner's dilemma. And that may be more efficient than spending time and energy on proofs, source code verification protocols and yet unknown downsides of being an agent that you can robustly CC with, while being the same kind of agent.

AI Will Not Want to Self-Improve

red75prime2y32

the simpler the utility function the easier time it has guaranteeing the alignment of the improved version

If we are talking about a theoretical $a r g m a x_{a} E (U | a)$ AI, where $E (U | a)$ (expectation of utility given the action a) somehow points to the external world, then sure. If we are talking about a real AI with aspiration to become the physical embodiment of the aforementioned theoretical concept (with the said aspiration somehow encoded outside of $U$ , because $U$ is simple), then things get more hairy.

AGI-Automated Interpretability is Suicide

red75prime2y83

You said it yourself, GPT ""wants"" to predict the correct probability distribution of the next token

No, I said that GPT does predict next token, while probably not containing anything that can be interpreted as "I want to predict next token". Like a bacterium does divide (with possible adaptive mutations), while not containing "be fruitful and multiply" written somewhere inside.

If you instead meant that GPT is "just an algorithm"

No, I certainly didn't mean that. If the extended Church--Turing thesis holds for macroscopic behavior of our bodies, we c... (read more)

1__RicG__2y

Disclaimer: These are all hard questions and points that I don't know their true answers, these are just my views, what I have understood up to now. I haven't studied the expected utility maximisers exactly because I don't expect the abstraction to be useful for the kind of AGI we are going to be making. I feel the same, but I would say that it's the “real-agentic” system (or a close approximation of it) that needs God-level knowledge of cognitive systems (why orthodox alignment by building the whole mind from theory is really hard). An evolved system like us or like GPT, IMO, seems more close to a “zombie-agentic” system. I feel the key thing to understand each other might be coherence, and how coherence can vary from introspection, but I am not knowledgeable enough to delve into this right now. The view in my mind that makes sense is that a utility function is an abstraction that you put on top of basically anything if you wish. It's a hat to describe a system that does things in the most general way. The framework is borrowed from economics where human behaviour is modelled with more or less complicated utility functions, but whether there is or not an internal representation is mostly irrelevant. And, again, I don't expect a DL system do display anything remotely close to a "goal circuit", but that we can still describe them as having a utility function and them being maximisers (of not infinite cognition power) of that UF. But the UF, form our part, would be just a guess. I don't expect us to crack that with interpretability of neural networks learned by gradient descent.

AGI-Automated Interpretability is Suicide

red75prime2y1211

I really don't expect "goals" to be explicitly written down in the network. There will very likely not be a thing that says "I want to predict the next token" or "I want to make paperclips" or even a utility function of that. My mental image of goals is that they are put "on top" of the model/mind/agent/person. Whatever they seem to pursue, independently of their explicit reasoning.

I'm sure that I don't understand you. GPT most likely doesn't have "I want to predict next token" written somewhere, because it doesn't want to predict next token. There's nothi... (read more)

1faul_sname2y

A system that looks like "actively try to make paperclips no matter what" seems like the sort of thing that an evolution-like process could spit out pretty easily. A system that looks like "robustly maximize paperclips no matter what" maybe not so much. I expect it's a lot easier to make a thing which consistently executes actions which have worked in the past than to make a thing that models the world well enough to calculate expected value over a bunch of plans and choose the best one, and have that actually work (especially if there are other agents in the world, even if those other agents aren't hostile -- see the winner's curse).

0__RicG__2y

I feel the exact opposite! Creating something that seems to maximise something without having a clear idea of what its goal is really natural IMO. You said it yourself, GPT ""wants"" to predict the correct probability distribution of the next token, but there is probably not a thing inside actively maximising for that, instead it's very likely to be a bunch of weird heuristics that were selected by the training method because they work. If you instead meant that GPT is "just an algorithm" I feel we disagree here as I am pretty sure that I am just an algorithm myself. Look at us! We can clearly model a single human as to having a utility function (k maybe given their limited intelligence it's actually hard) but we don't know what our utility actually is. I think Rob Miles made a video about that iirc. My understanding is that the utility function and expected utility maximiser is basically the theoretical pinnacle of intelligence! Not your standard human or GPT or near-future AGI. We are also quite myopic (and whatever near-future AGI we make will also be myopic at first). I'd say that it can reflect about its reasoning and planning, but it just plaster the universe with tiny molecular spirals because it just like that more than keeping humans alive. I think this tweet by EY https://twitter.com/ESYudkowsky/status/1654141290945331200 shows what I mean. We don't know what the ultimate dog is, we don't know what we would have created if we did have the capabilities to make a dog-like thing from scratch. We didn't create ice-cream because it maximise our utility function. We just stumbled on its invention and found that it is really yummy. But I really don't want to adventure myself in this, I am writing something similar to these points in order to deconfuse myself, it is not exactly clear to me the divide between agent meant in the theoretical sense and real systems. So to keep the discussion on-topic, what I think is: * interpretability to "correct" th

AGI-Automated Interpretability is Suicide

red75prime2y94

Solving interpretability with an AGI (even with humans-in-the-loop) might not lead to particularly great insights on a general alignment theory or even on how to specifically align a particular AGI

Wouldn't it at least solve corrigibility by making it possible to detect formation of undesirable end-goals? I think even GPT-4 can classify textual interpretation of an end-goal on a basis of its general desirability for humans.

It seem to need another assumption, namely that the AGI has sufficient control of its internal state and knowledge of the detection netw... (read more)

3__RicG__2y

I really don't expect "goals" to be explicitly written down in the network. There will very likely not be a thing that says "I want to predict the next token" or "I want to make paperclips" or even a utility function of that. My mental image of goals is that they are put "on top" of the model/mind/agent/person. Whatever they seem to pursue, independently of their explicit reasoning. Anyway, detecting goals, detecting deceit, detecting hidden knowledge of the system is a good thing to have. Interpretability of those things are needed. But interpretability cuts both ways, and with a full-interpretable AGI, foom seems to be a great danger. That's what I wanted to point out. With a fast intelligence explosion (that doesn't need slow retraining or multiple algorithmic breakthrough) the capabilities will explode alongside, while alignment won't. It is not clear to me, what you are referring to, here. Do you think we will have detection networks? Detection for what? Deceit? We might literally have the AGI look inside for a purpose (like in the new OpenAI paper). I hope we have something like a thing that tells us if it wants to self-modify, but if nobody points out the danger of foom, we likely won't have that.

Have you heard about MIT's "liquid neural networks"? What do you think about them?

red75prime2y*33

I have low confidence in that, but I guess it (OOD generalization by "liquid" networks) works well in differentiable continuous domains (like low-level motion planning) by exploiting natural smoothness of a system. So I wouldn't get my hopes high in its universal applicability.

LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem

red75prime2yΩ230

If you have a next-frame video predictor, you can't ask it how a human would feel. You can't ask it anything at all - except "what might be the next frame of thus-and-such video?". Right?

Not exactly. You can extract embeddings from a video predictor (activations of the next-to-last layer may do, or you can use techniques, which enhance semantic information captured in the embeddings). And then use supervised learning to train a simple classifier from an embedding to human feelings on a modest number of video/feelings pairs.

2Steven Byrnes2y

I think that’s what I said in the last paragraph of the comment you’re responding to: Maybe that’s what PeterMcCluskey was asking about this whole time—I found his comments upthread to be pretty confusing. But anyway, if that’s what we’ve been talking about all along, then yeah, sure. I don’t think my OP implied that we would do supervised learning from random initialization. I just said “use supervised learning to train an ML model”. I was assuming that people would follow all the best practices for supervised learning—self-supervised pretraining, data augmentation, you name it. This is all well-known stuff—this step is not where the hard unsolved technical problems are. I’m open to changing the wording if you think the current version is unclear.

Yoshua Bengio argues for tool-AI and to ban "executive-AI"

red75prime2y*10

the issue I still see is - how do you recognize an ai executive that is trying to disguise itself?

It can't disguise itself without researching disguising methods first. The question is will interpretability tools be up to the task of catching it.

It will not work for catching AI executive originating outside of controlled environment (unless it queries AI scientist). But given that such attempts will originate from uncoordinated relatively computationally underpowered sources, it may be possible to preemptively enumerate disguising techniques that such AI executive could come up with. If there are undetectable varieties..., well, it's mostly game over.

All AGI Safety questions welcome (especially basic ones) [May 2023]

red75prime2y10

Thanks. Could we be sure that a bare utility maximizer doesn't modify itself into a mugging-proof version? I think we can. Such modification drastically decreases expected utility.

It's a bit of relief that a sizeable portion of possible intelligences can be stopped by playing god to them.

1MichaelStJules2y

Maybe for positive muggings, when the mugger is offering to make the world much better than otherwise. But it might self-modify to not give into threats to discourage threats.

All AGI Safety questions welcome (especially basic ones) [May 2023]

red75prime2y10

Are there ways to make a utility maximizer impervious to Pascal's mugging?

5MichaelStJules2y

You could use a bounded utility function, with sufficiently quickly decreasing marginal returns. Or use difference-making risk aversion or difference-making ambiguity aversion. Maybe also just aversion to Pascal's mugging itself, but then the utility maximizer needs to be good enough at recognizing Pascal's muggings.

An artificially structured argument for expecting AGI ruin

red75prime2y1-1

Humans were created by evolution, but [...]

We know that evolution has no preferences (evolution is not an agent), so we generally don't frame our preferences as an approximation of evolution's ones. People who believe that they were created with some goal in mind of the creator do engage in reasoning of what was truly meant for them to do.

An artificially structured argument for expecting AGI ruin

red75prime2y10

See also, in the OP: "Problem of Fully Updated Deference: Normative uncertainty doesn't address the core obstacles to corrigibility."

The provided link assumes that any preference can be expressed as a utility function over world-states. If you don't assume that (and you shouldn't as human preferences can't be expressed as such), you cannot maximize weighted average of potential utility functions. Some actions are preference-wise irreversible. Take for example virtue ethics: wiping out your memory doesn't restore your status as a virtuous person even if the... (read more)

An artificially structured argument for expecting AGI ruin

red75prime2y10

As a strong default, STEM-level AGIs will have "goals"—or will at least look from the outside like they do. By this I mean that they'll select outputs that competently steer the world toward particular states.

Clarification: when talking about world-states I mean world-state minus the state of agent (we are interested in the external actions of the agent).

For starters, you can have goal-directed behavior without steering the world toward particular states. Novelty seeking, for example, don't imply any particular world-state to achieve.

And I think that... (read more)

4Rob Bensinger2y

If you look from the outside like you're competently trying to steer the world into states that will result in you getting more novel experience, then this is "goal-directed" in the sense I mean, regardless of why you're doing that. If you (e.g.) look from the outside like you're selecting the local action that's least like the actions you've selected before, regardless of how that affects you or your future novel experience, etc., then that's not "goal-directed" in the sense I mean. The distinction isn't meant to be totally crisp (there are different degrees and dimensions of "goal-directedness"), but maybe these examples help clarify what I have in mind. "Maximize novel experience" is a pretty vague goal, but it's not so vague that I think it falls outside of what I had in mind -- e.g., I think the standard instrumental convergence concerns apply to "maximize novel experience". "Steer the world toward there being an even number of planets in the Milky Way Galaxy" also encompasses a variety of possible world-states (more than half of the possible worlds where the Milky Way Galaxy exists are optimal), but I think the arguments in the OP apply just as well to this goal. Nope! Humans were created by evolution, but our true utility function isn't "maximize inclusive reproductive fitness" (nor is it some slightly tweaked version of that goal). See also, in the OP: "Problem of Fully Updated Deference: Normative uncertainty doesn't address the core obstacles to corrigibility."

Hell is Game Theory Folk Theorems

red75prime2y30

In a realistic setting agents will be highly incentivized to seek other forms of punishment besides turning dial. But nice toy hell.

Could a superintelligence deduce general relativity from a falling apple? An investigation

red75prime2y*72

Thanks for clearing my confusion. I've grown rusty on the topic of AIXI.

So going forwards from simple theories and seeing how they bridge to your effective model would probably do the trick

Assuming that there's not much fine-tuning to do. Locating our world in the string theory landscape could take quite a few bits if it's computationally feasible at all.

And remember, we're talking about an ASI here

It hinges on assumption that ASI of this type is physically realizable. I can't find it now, but I remember that preprocessing step, where heuristic generation ... (read more)

2Algon2y

TL;DR I think I'm approaching this conversation in a different way to you. I'm trying to point out an approach to analyzing ASI rather than doing the actual analysis, which would take a lot more effort and require me to grapple with this question. So have I. It is probable that you know more than I do about AIXI right now. I don't know how simple string theory actually is, and the bridging laws seem like they'd be even more complex than QFT+GR so I kind of didn't consider it. But yeah, AIXI would. So I am unsure if AIXI is the right thing to be approximating. And I'm also unsure if AIXI is a fruitful thing to be approximating. But approximating a thing like AIXI, and other mathematical or physical to rationality, seems like the right approach to analyze an ASI. At least, for estimating the things it can't do. If I had far more time and energy, I would estimate how much data a perfect reasoner would need to figure out the laws of the universe by collecting all of our major theories and estimating their Kolmogorov complexity, their levin complexity etc. Then I'd try and make guesses as to how much incompressible data there is in e.g. a video of a falling apple. Maybe I'd look at whether that data has any bearing on the bridging laws we think exist. After that, I'd look at various approximations of ideal reasoners, whether they're physically feasible, how various assumptions like e.g. P=NP might affect things and so on. That's what I think the right approach to examining what an ASI can do in this particular case looks like. As compared to what the OP did, which I think is misguided. I've been trying to point at that approach in this thread, rather than actually do it. Because that would take too much effort to be worth it. I'd have to got over the literature for computably feasible AIXI variants and all sorts of other stuff.

Could a superintelligence deduce general relativity from a falling apple? An investigation

red75prime2y30

it seems plausible that you could have GR + QFT and a megabyte of briding laws plus some other data to specify local conditions and so on.

How computationally bound variant of AIXI can arrive at QFT? You most likely can't faithfully simulate a non-trivial quantum system on a classical computer within reasonable time limits. The AIXI is bound to find some computationally feasible approximation of QFT first (Maxwell's equations and cutoff at some arbitrary energy to prevent ultraviolet catastrophe, maybe). And with no access to experiments it cannot test simpler systems.

2Algon2y

A simple strategy when modeling reality is to make effective models which describe what is going and then try to reduce those models to something simpler. So you might view the AI as making some effective modela and going "which simple theory + some bridging laws are equivalent to this effective model"? And then just go over a vast amount of such theories/bridging laws and figure out which is equivalent. It would probably use a lot of heuristics, sure. But QFT (or rather, whatever effective theory we eventually find which is simpler than QFT and GR together) is pretty simple. So going forwards from simple theories and seeing how they bridge to your effective model would probably do the trick. And remember, we're talking about an ASI here. It would likely have an extremely large amount of compute. There are approaches that we can't do today which would become practical with several OoM of more compute worldwide. You can think for a long time, perform big experiments, go through loads of hypothesis etc. And you don't need to simulate systems to do all of this. Going "Huh, this fundamental theory has a symmetry group. Simple symmetries pop up a bunch in my effective models of the video. Plausibly, symmetry has an important role in the character of physical law? I wonder what I can derive from looking at symmetry groups." Anyway, I think some of my cruxes are: 1) How complex are our fundamental theories and bridging laws really? 2) How much incompressible data in terms of bits are there in a couple of frames of a falling apple? 3) Is it physically possible to run something like infra-Bayesianism over poly time hypothesis, with clever heuristics, and use it to do the things I've describe in this thread.

Could a superintelligence deduce general relativity from a falling apple? An investigation

red75prime2y50

I mean are there reasons to assume that a variant of computable AIXI (or its variants) can be realized as a physically feasible device? I can't find papers indicating significant progress in making feasible AIXI approximations.

Could a superintelligence deduce general relativity from a falling apple? An investigation

red75prime2y10

Assume it has disgusting amounts of compute

Isn't it the same as "assume that it can do argmax as fast as needed for this scenario"?

2Algon2y

Could you clarify? I think you mean that it is feasible for the ASI to perform the Bayesian inference it needs, which yeah, sure. EDIT: I mean the least costly approximation of Bayesian inference it needs to figure this stuff out.

red75prime2y10

Of all the peoples' lives that exist and have existed, what are the chances I'm living [...here and now]

Is there a more charitable interpretation of this line of thinking rather than "My soul selected this particular body out of all available"?

You being you as you are is a product of your body developing in circumstances it happened to develop in.

All AGI Safety questions welcome (especially basic ones) [April 2023]

red75prime2y10

"Hard problem of corrigibility" refers to Problem of fully updated deference - Arbital, which uses a simplification (human preferences can be described as a utility function) that can be inappropriate for the problem. Human preferences are obviously path-dependent (you don't want to be painfully disassembled and reconstituted as a perfectly happy person with no memory of disassembly). Was appropriateness of the above simplification discussed somewhere?

4Vladimir_Nesov2y

It's mentioned there as an example of a thing that doesn't seem to work. Simplifications are often appropriate as a way of making a problem tractable, even if the analogy is lost and the results are inapplicable to the original problem. Such exercises occasionally produce useful insights in unexpected ways. Human preference, as practiced by humans, is not the sort of thing that's appropriate to turn into a utility function in any direct way. Hence things like CEV, gesturing at the sort of processes that might have any chance of doing something relevant to turning humans into goals for strong agents. Any real attempt should involve a lot of thinking from many different frames, probably an archipelago of stable civilizations running for a long time, foundational theory on what kinds of things idealized preference is about, and this might still fail to go anywhere at human level of intelligence. The thing that can actually be practiced right now is the foundational theory, the nature of agency and norms, decision making and coordination.