You are of course perfectly right. What I meant was: so that their convex hull is full-dimensional and contains the origin. I fixed it. Thanks for spotting this!
Exactly! Thanks for providing this concise summary in your words.
In the next post we generalize the target from a single point to an interval to get even more freedom that we can use for increasing safety further.
In our current ongoing work, we generalize that further to the case of multiple evaluation metrics, in order to get closer to plausible real-world goals, see our teaser post.
Alex Turner's post you referenced first convinces me that his arguments about "orbit-level power-seeking" apply to maximizers and quantilizers/satisficers. Let me reiterate that we are not suggesting quantilizers/satisficers are a good idea, but that I firmly believe explicit safety criteria rather than plain randomization should be used to select plans.
He also claims in that post that the "orbit-level power-seeking" issue affects all schemes that are based on expected utility: "There is no clever EU-based scheme which doesn't have orbit-level power-seekin...
Thank you for the warm encouragement.
We tried to be careful not to claim that merely making the decision algorithm aspiration-based is already sufficient to solve the AI safety problem, but maybe we need to add an even more explicit disclaimer in that direction. We explore this approach as a potentially necessary ingredient for safety, not as a complete plan for safety.
In particular, I perfectly agree that conflicting goals are also a severe problem for safety that needs to be addressed (while I don't believe there is a unique problem for safety that...
"Hence the information what I will do cannot have been available to the predictor." If the latter statement is correct, then how can could have "often correctly predicted the choices of other people, many of whom are similar to you, in the particular situation"?
There's many possible explanations for this data. Let's say I start my analysis with the model that the predictor is guessing, and my model attaches some prior probability for them guessing right in a single case. I might also have a prior about the likelihood of being lied about the predictor's suc...
Take a possible world in which the predictor is perfect (meaning: they were able to make a prediction, and there was no possible extension of that world's trajectory in which what I will actually do deviates from what they have predicted). In that world, by definition, I no longer have a choice. By definition I will do what the predictor has predicted. Whatever has caused what I will do lies in the past of the prediction, hence in the past of the current time point. There is no point in asking myself now what I should do as I have no longer causal influenc...
Can you please explain the "zero-probability possible world"?
Hi Nathan,
I'm not sure. I guess it depends on what your definition of "agent" is. In my personal definition, following Yann LeCun's recent whitepaper, the "agent" is a system with a number of different modules, one of it being a world model (in our case, an MDP that it can use to simulate consequences of possible policies), one of it being a policy (in our case, an ANN that takes states as inputs and gives action logits as outputs), and one module being a learning algorithm (in our case, a variant of Q-learning that uses the world model to learn a policy t...
Excellent! I have three questions
How would we get to a certain upper bound on ?
As collisions with the boundary happen exactly when one action's probability hits zero, it seems the resulting policies are quite large-support, hence quite probabilistic, which might be a problem in itself, making the agent unpredictable. What is your thinking about this?
Related to 2., it seems that while your algorithm ensures that expected true return cannot decrease, it might still lead to quite low true returns in individual runs. So do you agree that this type of
I'm sorry but I fail to see the analogy to momentum or adam, in neither of which the vector or distance from the current point to the initial point plays any role as far as I can see. It is also different from regularizations that modify the objective function, say to penalize moving away from the initial point, which would change the location of all minima. The method I propose preserves all minima and just tries to move towards the one closest to the initial point. I have discussed it with some mathematical optimization experts and they think it's new.
I like the clarity of this post very much! Still, we should be aware that all this hinges on what exactly we mean by "the model".
If "the model" only refers to one or more functions, like a policy function pi(s) and/or a state-value function V(s) and/or a state-action -value function Q(s,a) etc., but does not refer to the training algorithm, then all you write is fine. This is how RL theory uses the word "model".
But some people here also use the term "the model" in a broader sense, potentially including the learning algorithm that adjusts said functions, an...
replacing the SGD with something that takes the shortest and not the steepest path
Maybe we can design a local search strategy similar to gradient descent which does try to stay close to the initial point x0? E.g., if at x, go a small step into a direction that has the minimal scalar product with x – x0 among those that have at most an angle of alpha with the current gradient, where alpha>0 is a hyperparameter. One might call this "stochastic cone descent" if it does not yet have a name.
roughly speaking, we gradient-descend our way to whatever point on the perfect-prediction surface is closest to our initial values.
I believe this is not correct as long as "gradient-descend" means some standard version of gradient descent because those are all local, can go highly nonlinear paths, and do not memorize the initial value to try staying close to it.
But maybe we can design a local search strategy similar to gradient descent which does try to stay close to the initial point x0? E.g., if at x, go a small step into a direction that has the minimal...
Does the one-shot AI necessarily aim to maximize some function (like the probability of saving the world, or the expected "savedness" of the world or whatever), or can we also imagine a satisficing version of the one-shot AI which "just tries to save the world" with a decent probability, and doesn't aim to do any more, i.e., does not try to maximize that probability or the quality of that saved world etc.?
I'm asking this because
Definition 4: Expectation w.r.t. a Set of Sa-Measures
This definition is obviously motivated by the plan to later apply some version of maximin rule, so that only the inf matters.
I suggest that we also study versions what employ other decision-under-ambiguity rules such as Hurwicz' rule or Savage's minimax regret rule.
From my reading of quantilizers, they might still choose "near-optimal" actions, just only with a small probability. Whereas a system based on decision transformers (possibly combined with a LLM) could be designed that we could then simply tell to "make me a tea of this quantity and quality within this time and with this probability" and it would attempt to do just that, without trying to make more or better tea or faster or with higher probability.
even when the agents are unable to explicitly bargain or guarantee their fulfilment of their end by external precommitments
I believe there is a misconception here. The actual game you describe is the game between the programmers, and the fact that they know in advance that the others' programs will indeed be run with the code that their own program has access to does make each program submission a binding commitment to behave in a certain way.
Game Theory knows since long that if binding commitments are possible, most dilemmas can be solved easily. In...
I just stumbled upon this and noticed that a real-world mechanism for international climate policy cooperation that I recently suggested in this paper can be interpreted as a special case of your (G,X,Y) framework.
Assume a fixed game G where
(Many public goods games, such as the Prisoners' Dilemma, have such a structure)
Let's call an object a Conditional Commitment Function (CCF) iff it i...
Dear Robert, I just found out about your work and absolutely love it.
Has the following idea been explored yet?
Having just read Scott's Geometric Expectation stuff, I want to add that of course another variant of all of this is to replace every occurrence of a mean or expectation by a geometric mean or geometric expectation to make the whole thing more risk-averse.
In its suggested form Maximal Lottery-Lotteries is still a majoritarian system in the sense that a mere majority of 51% of the voters can make sure that candidate A wins regardless how the other 49% vote. For this, they only need to give A a rating of 1 and all other candidates a rating of 0.
One can also turn the system into a non-majoritarian system in which power is distributed proportionally in the sense that any group of x% of the voters can make sure that candidate A gets at least x% winning probability, similar to what is true of the MaxParC voting s...
I think you are right about the representation claim since any quasi-ordering (reflexive and transitive relation) can be represented as the intersection of complete quasi-orderings.