(Sorry for the late response, I hadn't checked my LW inbox much since my previous comments.)
If it were the case that such a function exists but cannot possibly be implemented (any implementation would be implementation as a state), and no other function satisfying the same constraints could possibly be implemented, that seems like it would be a case of it being impossible to have the aligned ASI. (Again, not that I think this is the case, just considering the validity of argument.)
The function that is being demonstrated to exist is the lookup table that produces the appropriate actions, yes? The one that is supposed to be implementable by a finite depth circuit?
It seems to make sense that if hiring an additional employee provides marginal shareholder value, that the company will hire additional employees. So, when the company stops hiring employees, it seems reasonable that this is because the marginal benefit of hiring an additional employee is not positive. However, I don't see why this should suggest that the company is likely to hire an employee that provides a marginal value of 0 or negative.
"Number of employees" is not a continuous variable. When hiring an additional employee, how this changes what the marg...
Not if the point of the argument is to establish that a superintelligence is compatible with achieving the best possible outcome.
Here is a parody of the issue, which is somewhat unfair and leaves out almost all of your argument, but which I hope makes clear the issue I have in mind:
"Proof that a superintelligence can lead to the best possible outcome: Suppose by some method we achieved the best possible outcome. Then, there's no properties we would want a superintelligence to have beyond that, so let's call however we achieved the best possible outcome, 'a...
Yes, I knew the cardinalities in question were finite. The point applies regardless though. For any set X, there is no injection from 2^X to X. In the finite case, this is 2^n > n for all natural numbers n.
If there are N possible states, then the number of functions from possible states to {0,1} is 2^N , which is more than N, so there is some function from the set of possible states to {0,1} which is not implemented by any state.
If your argument is, "if it is possible for humans to produce some (verbal or mechanical) output, then it is possible for a program/machine to produce that output", then, that's true I suppose?
I don't see why you specified "finite depth boolean circuit".
While it does seem like the number of states for a given region of space is bounded, I'm not sure how relevant this is. Not all possible functions from states to {0,1} (or to some larger discrete set) are implementable as some possible state, for cardinality reasons.
I guess maybe that's why you mentioned th...
Yes. I believe that is consistent with what I said.
"not((necessarily, for each thing) : has [x] -> those [x] are such that P_1([x]))"
is equivalent to, " (it is possible that something) has [x], but those [x] are not such that P_1([x])"
not((necessarily, for each thing) : has [x] such that P_2([x]) -> those [x] are such that P_1([x]))
is equivalent to "(it is possible that something) has [x], such that P_2([x]), but those [x] are not sure that P_1([x])" .
The latter implies the former, as (A and B and C) implies (A and C), and so the latter is stronger, not weaker, than the former.
Right?
Doesn't "(has preferences, and those preferences are transitive) does not imply (completeness)" imply (has preferences) does not imply (completeness)" ? Surely if "having preferences" implied completeness, then "having transitive preferences" would also imply completeness?
"Political category" seems, a bit strong? Like, sure, the literal meaning of "processed" is not what people are trying to get at. But, clearly, "those processing steps that are done today in the food production process which were not done N years ago" is a thing we can talk about. (by "processing step" I do not include things like "cleaning the equipment", just steps which are intended to modify the ingredients in some particular way. So, things like, hydrogenation. This also shall not be construed as indicating that I think all steps that were done N years ago were better than steps done today.)
For example, it is not clear to me if once I consider a program that outputs 0101 I will simply ignore other programs that output that same thing plus one bit (e.g. 01010).
No, the thing about prefixes is about what strings encode a program, not about their outputs.
The purpose of this is mostly just to define a prior over possible programs, in a way that conveniently ensures that the total probability assigned over all programs is at most 1. Seeing as it still works for different choices of language, it probably doesn't need to exactly use this kind of defi...
Thanks! The specific thing I was thinking about most recently was indeed specifically about context length, and I appreciate the answer tailored to that, as it basically fully addresses my concerns in this specific case.
However, I also did mean to ask the question more generally. I kinda hoped that the answers might also be helpful to others who had similar questions (as well as if I had another idea meeting the same criteria in the future), but maybe thinking other people with the same question would find the question+answers here, was not super realistic, idk.
Here is my understanding:
we assume a programming language where a program is a finite sequence of bits, and such that no program is a prefix of another program. So, for example, if 01010010 is a program, then 0101 is not a program.
Then, the (not-normalized) prior probability for a program is
Why that probability?
If you take any infinite sequence of bits, then, because no program is a prefix of any other program, at most one program will be a prefix of that sequence of bits.
If you randomly (with uniform distribution) select an infi...
Well, I was kinda thinking of as being, say, a distribution of human behaviors in a certain context (as filtered through a particular user interface), though, I guess that way of doing it would only make sense within limited contexts, not general contexts where whether the agent is physically a human or something else, would matter. And in this sort of situation, well, the action of "modify yourself to no-longer be a quantilizer" would not be in the human distribution, because the actions to do that are not applicable to humans (as humans are, ...
For the "Crappy Optimizer Theorem", I don't understand why condition 4, that if , then , isn't just a tautology[1]. Surely if , then no-matter what is being used,
as , then letting , then , and so .
I guess if the 4 conditions are seen as conditions on a function (where they are written for ), then it no-longer is automatic, and it is just when specifying...
I thought CDT was considered not reflectively-consistent because it fails Newcomb's problem?
(Well, not if you define reflective stability as meaning preservation of anti-Goodhart features, but, CDT doesn't have an anti-Goodhart feature (compared to some base thing) to preserve, so I assume you meant something a little broader?)
Like, isn't it true that a CDT agent who anticipates being in Newcomb-like scenarios would, given the opportunity to do so, modify itself to be not a CDT agent? (Well, assuming that the Newcomb-like scenarios are of the form "a...
Whoops, yes, that should have said , thanks for the catch! I'll edit to make that fix.
Also, yes, what things between and should be sent to, is a difficulty..
A thought I had which, on inspection doesn't work, is that (things between and ) could be sent to , but that doesn't work, because might be terminal, but (thing between and ) isn't terminal. It seems like the only thing that would always work would be for them to be sent to somethin...
A thought on the "but what if multiple steps in the actual-algorithm correspond to a single step in an abstracted form of the algorithm?" thing :
This reminds me a bit of, in the topic of "Abstract Rewriting Systems", the thing that the vs distinction handles. (the asterisk just indicating taking the transitive reflexive closure)
Suppose we have two abstract rewriting systems and .
(To make it match more closely what you are describing, we can suppose that every node has at most one outgoing arrow, to make...
In the line that ends with "even if God would not allow complete extinction.", my impulse is to include " (or other forms of permanent doom)" before the period, but I suspect that this is due to my tendency to include excessive details/notes/etc. and probably best not to actually include in that sentence.
(Like, for example, if there were no more adult humans, only billions of babies grown in artificial wombs (in a way staggered in time) and then kept in a state of chemically induced euphoria until the age of 1, and then killed, that technically wouldn't be...
I want to personally confirm a lot of what you've said here. As a Christian, I'm not entirely freaked out about AI risk because I don't believe that God will allow it to be completely the end of the world (unless it is part of the planned end before the world is remade? But that seems unlikely to me.), but that's no reason that it can't still go very very badly (seeing as, well, the Holocaust happened).
In addition, the thing that seems to me most likely to be the way that God doesn't allow AI doom, is for people working on AI safety to succeed. One shouldn...
I don't understand why this comment has negative "agreement karma". What do people mean by disagreeing with it? Do they mean to answer the question with "no"?
First, I want to summarize what I understand to be what your example is an example of:
"A triple consisting of
1) A predicate P
2) the task of generating any single input x for which P(x) is true
3) the task of, given any x (and given only x, not given any extra witness information), evaluating whether P(x) is true
"
For such triples, it is clear, as your example shows, that the second task (the 3rd entry) can be much harder than the first task (the 2nd entry).
_______
On the other hand, if instead one had the task of producing an exhaustive list of all x such tha...
As you know, there's a straightforward way to, given any boolean circuit, to turn it into a version which is a tree, by just taking all the parts which have two wires coming out from a gate, and making duplicates of everything that leads into that gate.
I imagine that it would also be feasible to compute the size of this expanded-out version without having to actually expand out the whole thing?
Searching through normal boolean circuits, but using a cost which is based on the size if it were split into trees, sounds to me like it would give you the memoizati...
It seems like the 5th sentence has its ending cut off? "it tries to parcel credit and blame for a decision up to the input neurons, even when credit and blame" , seems like it should continue [do/are x] for some x.
When you say "which yields a solution of the form ", are you saying that yields that, or are you saying that yields that? Because, for the former, that seems wrong? Specifically, the former should yield only things of the form .
But, if the latter, then, I would think that the solutions would be more solutions than that?
Like, what about ? (where, say, and
...
As another "why not just" which I'm sure there's a reason for:
in the original circuits thread, they made a number of parameterized families of synthetic images which certain nodes in the network responded strongly to in a way that varied smoothly with the orientation parameter, and where these nodes detected e.g. boundaries between high-frequency and low-frequency regions at different orientations.
If given another such network of generally the same kind of architecture, if you gave that network the same images, if it also had analogous nodes, I'd expect th...
I was surprised by how the fine-tuning was done for the verbalized confidence.
My initial expectation was that it would make the loss be based on like, some scoring rule based on the probability expressed and the right answer.
Though, come to think of it, I guess seeing as it would be assigning logits values to different expressions of probabilities, it would have to... what, take the weighted average of the scores it would get if it gave the different probabilities? And, I suppose that if many training steps were done on the same question/answer pairs, then...
For such that is a mesa=optimizer let be the space it optimizes over, and be its utility function .
I know you said "which we need not notate", but I am going to say that for and , that , and is the space of actions (or possibly, and is the space of actions available in the situation )
(Though maybe you just meant that we need note notate separately from s, the map from X to A which s defines. In which ...
Is this something that the infra-bayesianism idea could address? So, would an infra-bayesian version of AIXI be able to handle worlds that include halting oracles, even though they aren't exactly in its hypothesis class?
Do I understand correctly that in general the elements of A, B, C, are achievable probability distributions over the set of n possible outcomes? (But that in the examples given with the deterministic environments, these are all standard basis vectors / one-hot vectors / deterministic distributions ?)
And, in the case where these outcomes are deterministic, and A and B are disjoint, and A is much larger than B, then given a utility function on the possible outcomes in A or B, a random permutation of this utility function will, with high probability, ha...
My understanding:
One could create a program which hard-codes the point about which it oscillates (as well as some amount which it always eventually goes that far in either direction), and have it buy once when below, and then wait until the price is above to sell, and then wait until price is below to buy, etc.
The programs receive as input the prices which the market maker is offering.
It doesn't need to predict ahead of time how long until the next peak or trough, it only needs to correctly assume that it does oscillate sufficiently, and respond when it does.
The part about Chimera functions was surprising, and I look forward to seeing where that will go, and to more of this in general.
In section 2.1 , Proposition 2 should presumably say that is a partial order on rather than on .
In the section about Non-Dogmatism , I believe something was switched around. It says that if the logical inductor assigns prices converging to $1 to a proposition that cannot be proven, that the trader can buy shares in that proposition at prices of $ and thereby gain infinite potential upside. I believe this should say that if the logical inductor assigns prices converging to $0 to a proposition that can't be dis-proven, instead of prices converging to $1 for a proposition that can't be proven .
(I think that if the price was converging to $1 for ...
You said that you thought that this could be done in a categorical way. I attempted something which appears to describe the same thing when applied to the category FinSet , but I'm not sure it's the sort of thing you meant by when you suggested that the combinatorial part could potentially be done in a categorical way instead, and I'm not sure that it is fully categorical.
Let S be an object.
For i from 1 to k, let be an object, (which is not anything isomorphic to the product of itself with itself, or at least is not the terminal object) .
Let&n...
I've now computed the volumes within the [-a,a]^3 cube for and, or, and the constant 1 function. I was surprised by the results.
(I hadn't considered that the ratios between the volumes will not depend on the size of the cube)
If we select x,y,z uniformly at random within this cube, the probability of getting the and gate is 1/48, the probability of getting the or gate is 2/48, and the probability of getting the constant 1 function is 13/48 (more than 1/4).
This I found quite surprising, because of the constant 1 function requiring 4 half planes to express th...
For the volumes, I suppose that because scaling all of these parameters by the same positive constant doesn't change the function computed, it would make sense to compute the volumes of the corresponding regions of the cube, and this would handle the issues with these regions having unbounded size.
(this would still work with more parameters, it would just be a higher dimensional sphere)
Er, would that give the same thing as the limit if we took the parameters within a cube?
Anyway, at least in this case, if we use the "projected onto the sphere" case, we cou...
nitpick : the appendix says possible configurations of the whole grid, while it should say possible configurations. (Similarly for what it says about the number of possible configurations in the region that can be specified.)
This comment I'm writing is mostly because this prompted me to attempt to see how feasible it would be to computationally enumerate the conditions for the weights of small networks like the 2 input 2 hidden layer 1 output in order to implement each of the possible functions. So, I looked at the second smallest case by hand, and enumerated conditions on the weights for a 2 input 1 output no hidden layer perceptron to implement each of the 2 input gates, and wanted to talk about it. This did not result in any insights, so if that doesn't sound interesting, m...
The link in the rss feed entry for this at https://agentfoundations.org/rss goes to https://www.alignmentforum.org/events/vvPYYTscRXFBvdkXe/ai-safety-beginners-meetup which is a broken link (though, easily fixed by replacing "events" with "posts" in the url) .
[edit: it appears that it is no longer in the rss feed? It showed up in my rss feed reader.]
I think this has also happened with other "event" type posts in the rss feed before, but I may be remembering wrong.
I suspect this is some bug in how the rss feed is generated, but possibly it is a known bug wh...
The agent/thinker are limited in the time or computational resources available to them, while the predictor is unlimited.
My understanding is that this is generally situation which is meant. Well, not necessarily unlimited, just with enough resources to predict the behavior of the agent.
I don't see why you call this situation uninteresting.
That something can be modeled using some Turing machine, doesn't imply that it can be any Turing machine.
If I have some simple physical system, such that I can predict how it will behave, well, it can be modeled by a Turing machine, but me being able to predict it doesn't imply that I've solved the halting problem.
A realistic conception of agents in an environment doesn't involve all agents having unlimited compute at every time-step. An agent cannot prevent the universe from continuing simply by getting stuck in a loop and never producing its output for its next action.
Ah, thank you, I see where I misunderstood now. And upon re-reading, I see that it was because I was much too careless in reading the post, to the point that I should apologize. Sorry.
I was thinking that the agents were no longer being trained, already being optimal players, and so I didn't think the judge would need to take into account how their choice would influence future answers. This reading clearly doesn't match what you wrote, at least past the very first part.
If the debaters are still being trained, or the judge can be convinced that the debaters...
I am unsure as to what the judge's incentive is to select the result that was more useful, given that they still have access to both answers? Is it just because the judge will want to be such that the debaters would expect them to select the useful answer so that the debaters will provide useful answers, and therefore will choose the useful answers?
If that's the reason, I don't think you would need a committed deontologist to get them to choose a correct answer over a useful answer, you could instead just pick someone who doesn't think very hard about cert...
This reminds me of the "Converse Lawvere Problem" at https://www.alignmentforum.org/posts/5bd75cc58225bf06703753b9/the-ubiquitous-converse-lawvere-problem a little bit, except that the different functions in the codomain have domain which also has other parts to it aside from the main space .
As in, it looks like here, we have a space of values , which includes things such as "likes to eat meat" or "values industriousness" or whatever, where this part can just be handled as some generic nice space , as one part of ...
Thanks! (The way you phrased the conclusion is also much clearer/cleaner than how I phrased it)
I am trying to check that I am understanding this correctly by applying it, though probably not in a very meaningful way:
Am I right in reasoning that, for , that iff ( (C can ensure S), and (every element of S is a result of a combination of a possible configuration of the environment of C with a possible configuration of the agent for C, such that the agent configuration is one that ensures S regardless of the environment configuration)) ?
So, if S = {a,b,c,d} , then
would have , but, say
...
There are a few places where I believe you mean to write a but instead have instead. For example, in the line above the "Applicability" heading.
I like this.
As an example, I think in the game "both players win if they choose the same option, and lose if they pick different options" has "the two players pick different options, and lose" as one of the feasible outcomes, and it is not on the Pareto frontier, because if they picked the same thing, they would both win, and that would be a Pareto improvement.
What came to mind for me before reading the spoiler-ed options, was a variation on #2, with the difference being that, instead of trying to extract P's hypothesis about B, we instead modify T to get a T' which has P replaced with a P' which is a paperclip minimizer instead of maximizer, and then run both, and only use the output when the two agree, or if they give probabilities, use the average, or whatever.
Perhaps this could have an advantage over #2 if it is easier to negate what P is optimizing for than to extract P's model of B. (ed...
If you are interested in convincing people who so far think "It is impossible for the existence of an artificial superintelligence to produce desirable outcomes" otherwise, you should have a meaning of "an aritifical superintelligence" in mind that is like what they mean by it.
If one suspects that it is impossible for an artificial superintelligence to produce desirable outcomes, then when one considers "among possible futures, the one(s) that have as good or better outcomes than any other possible future", one would suppose that these perhaps are not ones... (read more)