AI alignment researcher supported by MIRI and LTFF. Working on the learning-theoretic agenda. Based in Israel. See also LinkedIn.
E-mail: vanessa DOT kosoy AT {the thing reverse stupidity is not} DOT org
Can you say more? What are "anabolic effects"? What does "cycling" mean in this context?
Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to "plain" IBRL regret bounds when we consider the core and the envelope as the "inside" of the agent.
Assume that the action and observation sets factor as and , where is the interface with the external environment and is the interface with the envelope.
Let be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:
However, requiring low regret w.r.t. neither of these is equivalent to low regret w.r.t :
Therefore, metacognitive regret bounds hit a "sweep spot" of stength vs. feasibility which produces a genuinely more powerful agents than IBRL[1].
More precisely, more powerful than IBRL with the usual sort of hypothesis classes (e.g. nicely structured crisp infra-RDP). In principle, we can reduce metacognitive regret bounds to IBRL regret bounds using non-crsip laws, since there's a very general theorem for representing desiderata as laws. But, these laws would have a very peculiar form that seems impossible to guess without starting with metacognitive agents.
The topic of this thread is: In naive MWI, it is postulated that all Everett branches coexist. (For example, if I toss a quantum fair coin times, there will be branches with all possible outcomes.) Under this assumption, it's not clear in what sense the Born rule is true. (What is the meaning of the probability measure over the branches if all branches coexist?)
Your reasoning is invalid, because in order to talk about updating your beliefs in this context, you need a metaphysical framework which knows how to deal with anthropic probabilities (e.g. it should be able to answer puzzles in the vein of the anthropic trilemma according to some coherent, well-defined mathematical rules). IBP is such a framework, but you haven't proposed any alternative, not to mention an argument for why that alternative is superior.
The problem is this requires introducing a special decision-theory postulate that you're supposed to care about the Born measure for some reason, even though Born measure doesn't correspond to ordinary probability.
Not sure what you mean by "this would require a pretty small universe".
If we live in naive MWI, an IBP agent would not care for good reasons, because naive MWI is a "library of babel" where essentially every conceivable thing happens no matter what you do.
Also not sure what you mean by "some sort of sampling". AFAICT, quantum IBP is the closest thing to a coherent answer that we have, by a significant margin.
The solution is here. In a nutshell, naive MWI is wrong, not all Everett branches coexist, but a lot of Everett branches do coexist s.t. with high probability all of them display expected frequencies.
My model is that the concept of "morality" is a fiction which has 4 generators that are real:
Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.
In the following, all infradistributions are crisp.
Fix finite action set and finite observation set . For any and , let
be defined by
In other words, this kernel samples a time step out of the geometric distribution with parameter , and then produces the sequence of length that appears in the destiny starting at .
For any continuous[1] function , we get a decision rule. Namely, this rule says that, given infra-Bayesian law and discount parameter , the optimal policy is
The usual maximin is recovered when we have some reward function and corresponding to it is
Given a set of laws, it is said to be learnable w.r.t. when there is a family of policies such that for any
For we know that e.g. the set of all communicating[2] finite infra-RDPs is learnable. More generally, for any we have the learnable decision rule
This is the "mesomism" I taked about before.
Also, any monotonically increasing seems to be learnable, i.e. any s.t. for we have . For such decision rules, you can essentially assume that "nature" (i.e. whatever resolves the ambiguity of the infradistributions) is collaborative with the agent. These rules are not very interesting.
On the other hand, decision rules of the form are not learnable in general, and so are decision rules of the form for monotonically increasing.
Open Problem: Are there any learnable decision rules that are not mesomism or monotonically increasing?
A positive answer to the above would provide interesting generalizations of infra-Bayesianism. A negative answer to the above would provide an interesting novel justification of the maximin. Indeed, learnability is not a criterion that was ever used in axiomatic constructions of decision theory[3], AFAIK.
We can try considering discontinuous functions as well, but it seems natural to start with continuous. If we want the optimal policy to exist, we usually need to be at least upper semicontinuous.
There are weaker conditions than "communicating" that are sufficient, e.g. "resettable" (meaning that the agent can always force returning to the initial state), and some even weaker conditions that I will not spell out here.
I mean theorems like VNM, Savage etc.
Thanks for this!
Does it really make sense to see a dermatologist for this? I don't have any particular problem I am trying to fix other than "being a woman in her 40s (and contemplating the prospect of her 50s, 60s etc with dread)". Also, do you expect the dermatologist to give better advice than people in this thread or the resources they linked? (Although, the dermatologist might be better familiar with specific products available in my country.)