I feel like an important lesson to learn from analogy to air conditioners is that some technologies are bounded by physics and cannot improve quickly.(or at all). I doubt anyone has the data, but I would be surprised if average air conditioning efficiency in BTUs per Watt plotted over the 20th century is not a sigmoid.
For seeing through the fog of war, I'm reminded of the German Tank Problem.
https://en.wikipedia.org/wiki/German_tank_problem
Statistical estimates were ~50x more accurate than intelligence estimates in the cannonical example. When you include the strong and reasonable incentives for all participants to propagandize, it is nearly impossible to get accurate information about an ongoing conflict.
I think as rationalists, if we're going to see more clearly than conventional wisdom, we need to find sources of information that have more fundamental basis. I don't yet know what those would be.
In reality, an AI can use algorithms that find a pretty good solution most of the time.
If you replace "AI" with "ML" I agree with this point. And yep this is what we can do with the networks we're scaling. But "pretty good most of the time" doesn't get you an x-risk intelligence. It gets you some really cool tools.
If the 3 sat algorithm is O(n^4) then this algorithm might not be that useful compared to other approaches.
If 3 SAT is O(n^4) then P=NP and back to Aaronson's point; the fundamental structure of reality is much different t...
I think less than human intelligence is sufficient for an x-risk because that is probably what is sufficient for a takeoff.
If less than human intelligence is sufficient, wouldn't humans have already done it? (or are you saying we're doing it right now?)
How intelligent does an agent need to be to send a HTTP request to the URL
/ldap://myfirstrootkit.com
on a few million domains?)
A human could do this or write a bot to do this.(and they've tried) But they'd also be detected, as would an AI. I don't see this as an x-risk, so much as a manageable pr...
If less than human intelligence is sufficient, wouldn't humans have already done it?
No. Humans are inherently incapable of countless things that software is capable of. To give an earlier example, humans can do things that evolution never could. And just as evolution can only accomplish things like 'going to the moon' by making agents that operate on the next level of capabilities, humans cannot do things like copy themselves billions of times or directly fuse their minds or be immortal or wave a hand to increase their brain size 100x. All of these are ...
I spent some time reading the Grinnblatt paper. Thanks again for the link. I stand corrected on IQ being uncorrelated with stock prediction. One part did catch my eye.
...Our findings relate to three strands of the literature. First, the IQ and trading behavior analysis builds on mounting evidence that individual investors exhibit wealth-reducing behavioral biases. Research, exemplified by Barber and Odean (2000, 2001, 2002), Grinblatt and Keloharju (2001), Rashes (2001), Campbell (2006), and Calvet, Campbell, and Sodini (2007, 2009a, 2009b),
We don't know that, P vs NP is an unproved conjecture. Most real world problems are not giant knapsack problems. And there are algorithms that quickly produce answers that are close to optimal. Actually, most of the real use of intelligence is not a complexity theory problem at all. "Is inventing transistors a O(n) or an O(2^n) problem?"
P vs. NP is unproven. But I disagree that "most real world problems are not giant knapsack problems". The Cook-Levin theorem showed that many of the most interesting problems are reducible to NP-complete problems. &nb...
AlphaGo went from mediocre, to going toe-to-toe with the top human Go players in a very short span of time. And now AlphaGo Zero has beaten AlphaGo 100-0. AlphaFold has arguably made a similar logistic jump in protein folding
Do you know how many additional resources this required?
...Cost of compute has been decreasing at exponential rate for decades, this has meant entire classes of algorithms which straightforward scale with compute also have become exponentially more capable, and this has already had profound impact on our world. At the very lea
Since you bring up selection bias, Grinblatt et al 2012 studies the entire Finnish population with a population registry approach and finds that.
Thanks for the citation. That is the kind of information I was hoping for. Do you think that slightly better than human intelligence is sufficient to present an x-risk, or do you think it needs some sort of takeoff or acceleraton to present an x-risk?
I think I can probably explain the "so" in my response to Donald below.
Overshooting by 10x (or 1,000x or 1,000,000x) before hitting 1.5x is probably easier than it looks for someone who does not have background in AI.
Do you have any examples of 10x or 1000x overshoot? Or maybe a reference on the subject?
Hmmmmm there is a lot here let me see if I can narrow down on some key points.
Once you have the right algorithm, it really is as simple as increasing some parameter or neuron count.
There are some problems that do not scale well(or at all). For example, doubling the computational power applied to solving the knapsack problem will let you solve a problem size that is one element bigger. Why should we presume that intelligence scales like an O(n) problem and not an O(2^n) problem?
...What is happening here? Are both people just looking a
Are we equivocating on 'much better' here?
Not equivocating but if intelligence is hard to scale and slightly better is not a threat, then there is no reason to be concerned about AI risk. (maybe a 1% x-risk suggested by OP is in fact a 1e-9 x-risk)
there are considerable individual differences in weather forecasting performances (it's one of the more common topics to study in the forecasting literature),
I'd be interested in seeing any papers on individual differences in weather forecasting performance (even if IQ is not mentioned). My understand...
I think I'm convinced that we can have human capable AI(or greater) in the next century(or sooner). I'm unconvinced on a few aspects of AI alignment. Maybe you could help clarify your thinking.
(1) I don't see how an human capable or a bit smarter than human capable AI(say 50% smarter) will be a serious threat. Broadly humans are smart because of group and social behavior. So a 1.5 Human AI might be roughly as smart as two humans? Doesn't seem too concerning.
(2) I don't see how a bit smarter than humans scales to superhuman lev...
For example, a human with 150 IQ isn't going to be much better at predicting the weather than a person with 100 IQ.
Are we equivocating on 'much better' here? Because there are considerable individual differences in weather forecasting performances (it's one of the more common topics to study in the forecasting literature), and while off-hand I don't have any IQ cites, IQ shows up all the time in other forecasting topics as a major predictor of performance (as it is of course in general) and so I would be surprised if weather was much different.
Current AI does stochastic search, but it is still search. Essentially PP complexity class, instead of NP/P. (with a fair amount of domain specific heuristics)
Never leave the house without your d20 :-P
But I agree with you. This seems a simple way to do something like satisficing. Avoiding the great computational cost of an optimal decision.
In terms of prior art that is probably the field you want to explore: https://en.m.wikipedia.org/wiki/Satisficing
Not sure if this is helpful, but since you analogized to chip design. In chip design, you typically verify using a constrained random method when the state space grows too large to verify every input exhaustively. That is, you construct a distribution over the set of plausible strings and then sample it and feed it to your design. Then you compare the result to a model in a higher level language.
Of course, standard techniques like designing for modularity can make the state space more manageable too.
First off, Scott’s blog is awesome.
Second, the example of dieting comes to mind when I think of training rationality. While they’re not much connected to the rationality community, they are a large group of people focused on overcoming one particular aspect of our irrationallity. (but without much success)
What basis is there to assume that the distribution of these variables is log uniform? Why, in the toy example, limit the variables to the interval [0,0.2]? Why not [0,1]?
These choices drive the result.
The problem is, for many of the probabilities, we don’t even know enough about them to say what distribution they might take. You can’t infer a meaningful distribution over variables where your sample size is 1 or 0
I’m still not seeing a big innovation here. I’m pretty sure most researchers who look at the Drake equation think “huge sensitivity to parameterization.”
If we have a 5 parameter drake equation then number of civilizations scales with X^5, so if X comes in at 0.01, we’ve got a 1e-10 probability of detectable civilization formation. But if we’ve got a 10 parameter Drake equation and X comes in at 0.01 then it implies a 1e-20 probability. (extraordinary smaller)
So yes, it has a a huge sensitivity, but it is primarily a constructed sensitivity. All the Drake equation really tells us is that we don’t know very much and it probably won’t be useful until we can get N above one for more of the parameters.
I’m not sure I understand why they’re against point estimates. As long as the points match the mean of our estimates for the variables, then the points multiplied should match the expected value of the distribution.
Because people draw incorrect conclusions from the point estimates. You can have high expected value of the distribution (e.g. "millions of civilizations") while at the same time having big part of the probability mass on outcomes with just one civilization, of few civilizations far away.
I think this is an interesting concept and want to see where you go with it. But just devil’s advocating, there are some pretty strong counterexamples for micromanagement. For example, many imperative languages can be ridiculously inefficient. And try solving an NP complete problem with a genetic algorithm and you’ll just get stuck in a local minimum.
Simplicity and emergence are often surprisingly effective but they’re just tools in a large toolbox.
Somewhat ironic that LW is badly in need of better captcha.
I read him, he is just incorrect. “People hate losses more than they hate gains” is not explained by DMU. They dislike losses to an extent far greater than predicted by DMU, and more importantly, this dislike is largely scale invariant.
If you go read papers like the original K&T, you’ll see that their data set is just a bunch of statements that are predicted to be equally preferrable under DMU (because marginal utility doesn’t change much for small changes in wealth). What changes the preference is simply whether K&T phrase the question in terms of a loss or a gain.
So...unsurprisingly, Kahneman is accurately describing the theory that won him the Nobel prize.
The result you got is pretty close to the fft of f(t) = t
Which is roughly what you got from sorting noise.
All finite length sequences exist in any infinite random sequence. So, in the same way that all the works of shakespeare exist inside an infinite random sequence, so too does a complete representation of any finite universe.
I suppose one could argue by the anthropic principle that we happen to exist in a well ordered finite subsequence of an infinite random sequence. But it is sort of like multiverse theories where it lacks the explanatory power or verifiability of simpler theories.
Maybe I’m being dense, and missing the mystery, but I think this reference might be helpful.
I mean...he quotes Kahneman; claiming the guy doesn’t know the implications of his own theory.
Losses hurt more than gains even at scales where DMU predicts that they should not. (because your DMU curve is approximately flat for small losses and gains) Loss aversion is the psychological result which explains this effect.
This is the author’s conclusion: “So, please, don’t go around claiming that behavioral economists are incorporating some brilliant newfound insight that people hate losses more than they like gains. We’ve known about this in price theory since Alfred Marshall’s 1890 Principles of Economics.”
Sorry nope. Alfred Marhall’s Principles would have made the wrong prediction.
That makes a lot of sense to me. Aversion to small losses makes a ton of sense as a blanket rule, when the gamble is: lose: don’t eat today win: eat double today don’t play: eat today
Our ancestors probably faced this gamble since long before humans were even humans. Under those stable conditions, a heuristic accounting for scale would have been needlessly expensive.
In short, the author is wrong. Diminishing marginal utility only really applies when the stakes are on the order of the agent’s total wealth, whereas the loss aversion asymmetry holds true for relatively small sums.
See e.g. a nice paper by Matthew Rabin which quantifies the extent to which diminshing marginal utility is too weak an effect to explain actually-observed risk aversion, by proving statements like this: "If you would turn down a 50:50 gamble between gaining $101 and losing $100 on account of diminishing marginal utility, then you would also turn down a 50:50 gamble between gaining all the money in the world and losing $10,000."
A traditional Turing machine doesn't make a distinction between program and data. The distinction between program and data is really a hardware efficiency optimization that came from the Harvard architecture. Since many systems are Turing complete, creating an immutable program seems impossible to me.
For example a system capable of speech could exploit the Turing completeness of formal grammars to execute de novo subroutines.
A second example. Hackers were able to exploit the surprising Turing completeness of an image compression standard to embed a virtual machine in a gif.
https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-into-nso-zero-click.html