Yeah, sorry that was unclear; there's no need for any form of hypercomputation to get an enumeration of the axioms of U. But you need a halting oracle to distinguish between the axioms and non-axioms. If you don't care about distinguishing axioms from non-axioms, but you do want to get an assignment of truthvalues to the atomic formulas Q(i,j) that's consistent with the axioms of U, then that is applying a consistent guessing oracle to U.
I see that when I commented yesterday, I was confused about how you had defined U. You're right that you don't need a consistent guessing oracle to get from U to a completion of U, since the axioms are all atomic propositions, and you can just set the remaining atomic propositions however you want. However, this introduces the problem that getting the axioms of U requires a halting oracle, not just a consistent guessing oracle, since to tell whether something is an axiom, you need to know whether there actually is a proof of a given thing in T.
I think what you proved essentially boils down to the fact that a consistent guessing oracle can be used to compute a completion of any consistent recursively axiomatizable theory. (In fact, it turns out that a consistent guessing oracle can be used to compute a model (in the sense of functions and relations on a set) of any consistent recursively axiomatizable theory; this follows from what you showed and the fact that an oracle for a complete theory can be used to compute a model of that theory.)
I disagree with
...Philosophically, what I take from this is th
a consistent guessing oracle rather than a halting oracle (which I theorize to be more powerful than a consistent guessing oracle).
This is correct. Or at least, the claim I'm interpreting this as is that there exist consistent guessing oracles that are strictly weaker than a halting oracle, and that claim is correct. Specifically, it follows from the low basis theorem that there are consistent guessing oracles that are low, meaning that access to a halting oracle makes it possible to tell whether any Turing machine with access to the consistent guessing or...
I don't understand what relevance the first paragraph is supposed to have to the rest of the post.
Something that I think it unsatisfying about this is that the rationals aren't previleged as a countable dense subset of the reals; it just happens to be a convenient one. The completions of the diadic rationals, the rationals, and the algebraic real numbers are all the same. But if you require that an element of the completion, if equal to an element of the countable set being completed, must eventually certify this equality, then the completions of the diadic rationals, rationals, and algebraic reals are all constructively inequivalent.
This means that, in particular, if your real happens to be rational, you can produce the fact that it is equal to some particular rational number. Neither Cauchy reals nor Dedekind reals have this property.
perhaps these are equivalent.
They are. To get enumerations of rationals above and below out of an effective Cauchy sequence, once the Cauchy sequence outputs a rational such that everything afterwards can only differ by at most , you start enumerating rationals below as below the real and rationals above as above the real. If the Cauchy sequence converges to , and you have a rational , then once the Cauchy sequence gets to the point where everything after is gauranteed to differ by at most ...
My take-away from this:
An effective Cauchy sequence converging to a real induces recursive enumerators for and , because if , then for some , so you eventually learn this.
The constructive meaning of a set is that that membership should be decidable, not just semi-decidable.
If is irrational, then and are complements, and each semi-decidable, so they are decidable. If is r...
If board members have an obligation not to criticize their organization in an academic paper, then they should also have an obligation not to discuss anything related to their organization in an academic paper. The ability to be honest is important, and if a researcher can't say anything critical about an organization, then non-critical things they say about it lose credibility.
Yeah, I wasn't trying to claim that the Kelly bet size optimizes a nonlogarithmic utility function exactly, just that, when the number of rounds of betting left is very large, the Kelly bet size sacrifices a very small amount of utility relative to optimal betting under some reasonable assumptions about the utility function. I don't know of any precise mathematical statement that we seem to disagree on.
...Well, we've established the utility-maximizing bet gives different expected utility from the Kelly bet, right? So it must give higher expected utility or it
Yeah, I was still being sloppy about what I meant by near-optimal, sorry. I mean the optimal bet size will converge to the Kelly bet size, not that the expected utility from Kelly betting and the expected utility from optimal betting converge to each other. You could argue that the latter is more important, since getting high expected utility in the end is the whole point. But on the other hand, when trying to decide on a bet size in practice, there's a limit to the precision with which it is possible to measure your edge, so the difference between optimal...
I do want to note though that this is different from "actually optimal"
By "near-optimal", I meant converges to optimal as the number of rounds of betting approaches infinity, provided initial conditions are adjusted in the limit such that whatever conditions I mentioned remain true in the limit. (e.g. if you want Kelly betting to get you a typical outcome of in the end, then when taking the limit as the number of bets goes to infinity, you better have starting money , where is the geometric growth rate you ...
The reason I brought this up, which may have seemed nitpicky, is that I think this undercuts your argument for sub-Kelly betting. When people say that variance is bad, they mean that because of diminishing marginal returns, lower variance is better when the mean stays the same. Geometric mean is already the expectation of a function that gets diminishing marginal returns, and when it's geometric mean that stays fixed, lower variance is better if your marginal returns diminish even more than that. Do they? Perhaps, but it's not obvious. And if your marginal...
Correct. This utility function grows fast enough that it is possible for the expected utility after many bets to be dominated by negligible-probability favorable tail events, so you'd want to bet super-Kelly.
If you expect to end up with lots of money at the end, then you're right; marginal utility of money becomes negigible, so expected utility is greatly effected by neglible-probability unfavorable tail events, and you'd want to bet sub-Kelly. But if you start out with very little money, so that at the end of whatever large number of ...
If you bet more than Kelly, you'll experience lower average returns and higher variance.
No. As they discovered in the dialog, average returns is maximized by going all-in on every bet with positive EV. It is typical returns that will be lower if you don't bet Kelly.
The Kelly criterion can be thought of in terms of maximizing a utility function that depends on your wealth after many rounds of betting (under some mild assumptions about that utility function that rule out linear utility). See https://www.lesswrong.com/posts/NPzGfDi3zMJfM2SYe/why-bet-kelly
For two, your specific claims about the likely confusion that Eliezer's presentation could induce in "laymen" is empirically falsified to some degree by the comments on the original post: in at least one case, a reader noticed the issue and managed to correct for it when they made up their own toy example, and the first comment to explicitly mention the missing unitarity constraint was left over 10 years ago.
Some readers figuring out what's going on is consistent with many of them being unnecessarily confused.
I don't think this one works. In order for the channel capacity to be finite, there must be some maximum number of bits N you can send. Even if you don't observe the type of the channel, you can communicate a number n from 0 to N by sending n 1s and N-n 0s. But then even if you do observe the type of the channel (say, it strips the 0s), the receiver will still just see some number of 1s that is from 0 to N, so you have actually gained zero channel capacity. There's no bonus for not making full use of the channel; in johnswentworth's formulation of the problem, there's no such thing as some messages being cheaper to transmit through the channel than others.
We "just" need to update the three geometric averages on this background knowledge. Plausibly how this should be done in this case is to normalize them such that they add to one.
My problem with a forecast aggregation method that relies on renormalizing to meet some coherence constraints is that then the probabilities you get depend on what other questions get asked. It doesn't make sense for a forecast aggregation method to give probability 32.5% to A if the experts are only asked about A, but have that probability predictably increase if the experts are a...
Oh, derp. You're right.
I think the way I would rule out my counterexample is by strengthening A3 to if and then there is ...
Q2: No. Counterexample: Suppose there's one outcome such that all lotteries are equally good, except for the lottery than puts probability 1 on , which is worse than the others.
I'm not sure why you don't like calling this "redundancy". A meaning of redundant is "able to be omitted without loss of meaning or function" (Lexico). So ablation redundancy is the normal kind of redundancy, where you can remove sth without losing the meaning. Here it's not redundant, you can remove a single direction and lose all the (linear) "meaning".
Suppose your datapoints are (where the coordinates and are independent from the standard normal distribution), and the feature you're trying to measure is ...
What you're calling ablation redundancy is a measure of nonlinearity of the feature being measured, not any form of redundancy, and the view you quote doesn't make sense as stated, as nonlinearity, rather than redundancy, would be necessary for its conclusion. If you're trying to recover some feature , and there's any vector and scalar such that for all data (regardless of whether there are multiple such , which would happen if the data is contained in a proper affine subsp...
...Ablating along the difference of the means makes both CCS & Supervised learning fail, i.e. reduce their accuracy to random guessing. Therefore:
- The fact that Recursive CCS finds many good direction is not due to some “intrinsic redundancy” of the data. There exist a single direction which contains all linearly available information.
- The fact that Recursive CCS finds strictly more than one good direction means that CCS is not efficient at locating all information related to truth: it is not able to find a direction which contains as much information
My point wasn't that the equation didnt hold perfectly, but that the discrepancies are very suspicious. Two of the three discrepancies were off by exactly 1 order of magnitude, making me fairly confident that they are the result of a typo. (Not sure what's going on with the other discrepency).
In the table of parameters, compute, and tokens, compute/(parameters*tokens) is always 6, except in one case where it's 0.6, one case where it's 60, and one case where it's 2.75. Are you sure this is right?
Done, thanks.
It would kind of use assumption 3 inside step 1, but inside the syntax, rather than in the metalanguage. That is, step 1 involves checking that the number encoding "this proof" does in fact encode a proof of C. This can't be done if you never end up proving C.
One thing that might help make clear what's going on is that you can follow the same proof strategy, but replace "this proof" with "the usual proof of Lob's theorem", and get another valid proof of Lob's theorem, that goes like this: Suppose you can prove that []C->C, and let n be the number encodi...
If that's how it works, it doesn't lead to a simplified cartoon guide for readers who'll notice missing steps or circular premises; they'd have to first walk through Lob's Theorem in order to follow this "simplified" proof of Lob's Theorem.
The revelation that he spent maybe 10x as much on villas for his girlfriends as EA cause areas
Source?
The idea that he was trying to distance himself from EA to protect EA doesn't hold together because he didn't actually distance himself from EA at all in that interview. He said ethics is fake, but it was clear from context that he meant ordinary ethics, not utilitarianism.
"Having been handed this enormous prize, how do I maximize the probability that I max out on utility?" Hm, but that actually doesn't give back any specific criterion, since basically any strategy that never bets your whole stack will win.
That's not quite true. If you bet more than double Kelly, your wealth decreases. But yes, Kelly betting isn't unique in growing your wealth to infinity in the limit as number of bets increases.
If the number of bets is very large, but due to some combination of low starting wealth relative to the utility bound and slow growth rate, it is not possible to get close to maximum utility, then Kelly betting should be optimal.
I basically endorse what kh said. I do think it's wrong to think you can fit enormous amounts of expected value or disvalue into arbitrarily tiny probabilities.
It is true that in practice, there's a finite amount of credit you can get, and credit has a cost, limiting the practical applicability of a model with unlimited access to free credit, if the optimal strategy according to the model would end up likely making use of credit which you couldn't realistically get cheaply. None of this seems important to me. The easiest way to understand the optimal strategy when maximum bet sizes are much smaller than your wealth is that it maximizes expected wealth on each step, rather than that it maximizes expected log wealt...
Access to credit. In the logarithmic model, you never make bets that could make your net worth zero or negative.
Again, the max being a small portion of your net worth isn't the assumption behind the model; the assumption is just that you don't get constrained by lack of funds, so it is a different model. It's true that if the reason you don't get constrained by lack of funds is that the maximum bets are small relative to your net worth, then this is also consistent with maximizing log wealth on each step. But this isn't relevant to what I brought it up for, which was to use it as a step in explaining the reason for the Kelly criterion in the section after it.
No. The point of the model where acting like your utility is linear is optimal wasn't that this is a more realistic model than the assumptions behind the Kelly criterion; it's just another simplified model, which is slightly easier to analyze, so I was using it as a step in showing why you should follow the Kelly criterion when it is your wealth that constrains the bet sizes you can make. It's also not true that the linear-utility model I described is still just maximizing log wealth; for instance, if the reason that you're never constrained by available funds is that you have access to credit, then your wealth could go negative, and then its log wouldn't even be defined.
Most of the arguments for Kelly betting that you address here seem like strawmen, except for (4), which can be rescued from your objection, and an interpretation of johnswentworth's version of (2), which you actually mention in footnote 3, but seem unfairly dismissive of.
The assumptions according to which your derived utility function is logarithmic is that expected utility doesn't get dominated by negligible-probability tail events. For instance, if you have a linear utility function and you act like it, you almost surely get 0 payout, but your expected p...
But in fact, I expect the honest policy to get significantly less reward than the training-game-playing policy, because humans have large blind spots and biases affecting how they deliver rewards.
The difference in reward between truthfulness and the optimal policy depends on how humans allocate rewards, and perhaps it could be possible to find a clever strategy for allocating rewards such that truthfulness gets close to optimal reward.
For instance, in the (unrealistic) scenario in which a human has a well-specified and well-calibrated probability...
It sounds to me like, in the claim "deep learning is uninterpretable", the key word in "deep learning" that makes this claim true is "learning", and you're substituting the similar-sounding but less true claim "deep neural networks are uninterpretable" as something to argue against. You're right that deep neural networks can be interpretable if you hand-pick the semantic meanings of each neuron in advance and carefully design the weights of the network such that these intended semantic meanings are correct, but that's not what deep learning is. The other t...
This seems related in spirit to the fact that time is only partially ordered in physics as well. You could even use special relativity to make a model for concurrency ambiguity in parallel computing: each processor is a parallel worldline, detecting and sending signals at points in spacetime that are spacelike-separated from when the other processors are doing these things. The database follows some unknown worldline, continuously broadcasts its contents, and updates its contents when it receives instructions to do so. The set of possible ways that the pro...
Wikipedia claims that every sequence is Turing reducible to a random one, giving a positive answer to the non-resource-bounded version of any question of this form. There might be a resource-bounded version of this result as well, but I'm not sure.
Fisherian runaway doesn't make any sense to me.
Suppose that each individual in a species of a given sex has some real-valued variable , which is observable by the other sex. Suppose that, absent considerations about sexual selection by potential mates for the next generation, the evolutionarily optimal value for is 0. How could we end up with a positive feedback loop involving sexual selection for positive values of , creating a new evolutionary equilibrium with an optimal value when taking into account sexual selectio...
I know this was tagged as humor, but taking it seriously anyway,
I'm skeptical that breeding octopuses for intelligence would yield much in the way of valuable insights for AI safety, since octopuses and humans have so much in common that AGI wouldn't. That said, it's hard to rule out that uplifting another species could reveal some valuable unknown unknowns about general intelligence, so I unironically think this is a good reason to try it.
Another, more likely to pay off, benefit to doing this would be as a testbed for genetically engineering humans for hi...
One example of a class of algorithms that can solve its own halting problem is the class of primitive recursive functions. There's a primitive recursive function that takes as input a description of a primitive recursive function and input and outputs if halts, and otherwise: this program is given by , because all primitive recursive functions halt on all inputs. In this case, it is that does not exist.
I think should exist, at least for...
If a group decides something unanimously, and has the power to do it, they can do it. That would take them outside the formal channels of the EU (or in another context of NATO) but I do not see any barrier to an agreement to stop importing Russian gas followed by everyone who agreed to it no longer importing Russian gas. Hungary would keep importing, but that does not seem like that big a problem.
If politicians can blame Hungary for their inaction, then this partially protects them from being blamed by voters for not doing anything. But it doesn't protect ...
If you have a 10-adic integer, and you want to reduce it to a 5-adic integer, then to know its last n digits in base 5, you just need to know what it is modulo . If you know what it is modulo , then you can reduce it module , so you only need to look at the last n digits in base 10 to find its last n digits in base 5. So a base-10 integer ending in ...93 becomes a base-5 integer ending in ...33, because 93 mod 25 is 18, which, expressed in base 5, is 33.
The Chinese remainder theorem tells us that we can go backwards: given a 5-adic in...
I wonder if it might be more effective to fund legal action against OpenAI than to compensate individual ex-employees for refusing to sign an NDA. Trying to take vested equity away from ex-employees who refuse to sign an NDA sounds likely to not hold up in court, and if we can establish a legal precident that OpenAI cannot do this, that might make other ex-employees much more comfortable speaking out against OpenAI than the possibility that third-p... (read more)
Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).