I explicitly asked Anthropic whether they had a policy of not releasing models significantly beyond the state of the art. They said no, and that they believed Claude 3 was noticeably beyond the state of the art at the time of its release.
The situation at Zaporizhzhia (currently) does not seem to be an impending disaster. The fire is/was in an administrative building. Fires at nuclear power plants can be serious, but the reactor buildings are concrete and would not easily catch fire due to nearby shelling or other external factors.
Some click-seekers on Twitter have made comparisons to Chernobyl. That kind of explosion cannot happen accidentally at Zaporizhzhia (it's a safer power plant design with sturdy containment structures surrounding the reactors). If the Russians wanted to cause a mas...
Sounds like something GPT-3 would say...
Alternatively, aging (like most non-discrete phenotypes) may be omnigenic.
Thanks for posting this, it's an interesting idea.
I'm curious about your second-to-last paragraph: if our current evidence already favored SSA or SIA (for instance, if we knew that an event occurred in the past that had a small chance of creating a huge number of copies of each human, but we also know that we are not copies), wouldn't that already have been enough to update our credence in SSA or SIA? Or did you mean that there's some other category of possible observations, which is not obviously evidence one way or the other, but under this UDT framework we could still use it to make an update?
I'm curious who is the target audience for this scale...
People who have an interest in global risks will find it simplistic--normally I would think of the use of a color scale as aimed at the general public, but in this case it may be too simple even for the curious layman. The second picture you linked, on the other hand, seems like a much more useful way to categorize risks (two dimensions, severity vs urgency).
I think this scale may have some use in trying to communicate to policy makers who are unfamiliar with the landscape of GCRs, and in parti...
Note also that non-alphanumeric symbols are hard to google. I kind of guessed it from context but couldn't confirm until I saw Kaj's comment.
Separately, and more important, the way links are displayed currently makes it hard to tell if a link has already been visited. Also if you select text you can't see links anymore.
Firefox 57 on Windows 10.
I am ecountering some kind of error when opening the links here to rationalsphere and single conversational locus. When I open them, a box pops up that says "Complete your profile" and asks me to enter my email address (even though I used my email to log in in the first place). When I type it in and press submit, I get the error: {"id":"app.mutation_not_allowed","value":"\"usersEdit\" on _id \"BSRa9LffXLw4FKvTY\""}
I think this is an excellent approach to jargon and I appreciate the examples you've given. There is too much tendency, I think, for experts in a field to develop whatever terminology makes their lives easiest (or even in some cases makes them "sound smart") without worrying about accessibility to newcomers.
... but maybe ideally hints at a broader ecosystem of ideas
This sounds useful, but very hard to do in practice... do you know of a case where it's successful?
Thanks for posting!
I haven't read your book yet but I find your work pretty interesting. I hope you won't mind a naive question... you've mentioned non-sunlight-dependent foods like mushrooms and leaf tea. Is it actually possible for a human to survive on foods like this? Has anybody self-experimented with it?
By my calculation, a person who needs 1800 kcals/day would have to eat about 5 kg of mushrooms. Tea (the normal kind, anyway) doesn't look any better.
Bacteria fed by natural gas seems like a very promising food source--and one that might even be viab...
You are assuming that all rational strategies are identical and deterministic. In fact, you seem to be using "rational" as a stand-in for "identical", which reduces this scenario to the twin PD. But imagine a world where everyone makes use of the type of supperrationality you are positing here--basically, everyone assumes people are just like them. Then any one person who switches to a defection strategy would have a huge advantage. Defecting becomes the rational thing to do. Since everybody is rational, everybody switches to defecting--because this is just a standard one-shot PD. You can't get the benefits of knowing the opponent's source code unless you know the opponent's source code.
The first section is more or less the standard solution to the open source prisoner's dilemma, and the same as what you would derive from a logical decision theory approach, though with different and less clear terminology than what is in the literature.
The second section, on application to human players, seems flawed to me (as does the claim that it applies to superintelligences who cannot see each other's source code). You claim the following conditions are necessary:
A and B are rational
A and B know each other's preferences
They are each aware of 1
I think many of us "rationalists" here would agree that rationality is a tool for assessing and manipulating reality. I would say much the same about morality. There's not really a dichotomy between morality being "grounded on evolved behavioral patterns" and having "a computational basis implemented somewhere in the brain and accessed through the conscious mind as an intuition". Rather, the moral intuitions we have are computed in our brains, and the form of that computation is determined both by the selection pressures of ev...
I think this is an interesting and useful view, if applied judiciously. In particular, it will always tend to be most relevant for crony beliefs--beliefs that affect the belief-holder's life mainly through other people's opinions of them, like much of politics and some of religion. When it comes to close-up stuff that can cause benefit or harm directly, you will find that most people really do have a model of the world. When you ask someone whether so-and-so would make a good president, the answer is often a signal about their cultural affiliations. Ask th...
This doesn't actually seem to match the description. They only talk about having used one laser, with two stakes, whereas your diagram requires using two lasers. Your setup would be quite difficult to achieve, since you would somehow have to get both lasers perfectly horizontal; I'm not sure a standard laser level would give you this kind of precision. In the version they describe, they level the laser by checking the height of the beam on a second stake. This seems relatively easy.
My guess is they just never did the experiment, or they lied about the result. But it would be kind of interesting to repeat it sometime.
Thanks, that's an interesting perspective. I think even high-level self-modification can be relatively safe with sufficient asymmetry in resources--simulated environments give a large advantage to the original, especially if the successor can be started with no memories of anything outside the simulation. Only an extreme difference in intelligence between the two would overcome that.
Of course, the problem of transmitting values to a successor without giving it any information about the world is a tricky one, since most of the values we care about are linked to reality. But maybe some values are basic enough to be grounded purely in math that applies to any circumstances.
If visible precommitment by B requires it to share the source code for its successor AI, then it would also be giving up any hidden information it has. Essentially both sides have to be willing to share all information with each other, creating some sort of neutral arbitration about which side would have won and at what cost to the other. That basically means creating a merged superintelligence is necessary just to start the bargaining process, since they each have to prove to the other that the neutral arbiter will control all relevant resources to preven...
I've read a couple of Lou Keep's essays in this series and I find his writing style very off-putting. It seems like there's a deep idea about society and social-economic structures buried in there, but it's obscured by a hodgepodge of thesis-antithesis and vague self-reference.
As best I can tell, his point is that irrational beliefs like belief in magic (specifically, protection from bullets) can be useful for a community (by encouraging everyone to resist attackers together) even though it is not beneficial to the individual (since it doesn't prevent deat...
Are you talking about a local game in NY or a correspondence thing?
I like the first idea. But can we really guarantee that after changing its source code to give itself maximum utility, it will stop all other actions? If it has access to its own source code, what ensures that its utility is "maximum" when it can change the limit arbitrarily? And if all possible actions have the same expected utility, an optimizer could output any solution--"no action" would be the trivial one but it's not the only one.
An AI that has achieved all of its goals might still be dangerous, since it would presumably lose all ...
It seems like the ideal leisure activities, then, should combine the social games with games against nature. Sports do this to some extent, but the "game against nature" part is mostly physical rather than intellectual.
Maybe we could improve on that. I'm envisioning some sort of combination of programming and lacrosse, where the field reconfigures itself according to the players' instructions with a 10-second delay...
But more realistically, certain sports are more strategic and intellectual than others. I've seen both tennis and fencing mentione...
AI is good at well-defined strategy games, but (so far) bad at understanding and integrating real-world constraints. I suspect that there are already significant efforts to use narrow AI to help humans with strategic planning, but that these remain secret. For an AGI to defeat that sort of human-computer combination would require considerably superhuman capabilities, which means without an intelligence explosion it would take a great deal of time and resources.
More like driving to the store and driving into the brick wall of the store are adjacent in design space.
Yes, many people intuitively feel that a universe of pleasure and a universe of pain add to a net negative. But I suspect that's just a result of experiencing (and avoiding) lots of sources of extreme pain in our lives, while sources of pleasure tend to be diffuse and relatively rare. The human experience of pleasure is conjunctive because in order to survive and reproduce you must fairly reliably avoid all types of extreme pain. But in a pleasure-maximizing environment, removing pain will be a given.
It's also true that our brains tend to adapt to pleasure over time, but that seems simple to modify once physiological constraints are removed.
Human disutility includes more than just pain too. Destruction of the humanity (the flat plain you describe) carries a great deal of negative utility for me, even if I disappear without feeling any pain at all. There's more disutility if all life is destroyed, and more if the universe as a whole is destroyed... I don't think there's any fundamental asymmetry. Pain and pleasure are the most immediate ways of affecting value, and probably the ones that can be achieved most efficiently in computronium, so external states probably don't come into play much at all if you take a purely utilitarian view.
I'm not sure what you mean here by risk aversion. If it's not loss aversion, and it's not due to decreasing marginal value, what is left?
Would you rather have $5 than a 50% chance of getting $4 and a 50% chance of getting $7? That, to me, sounds like the kind of risk aversion you're describing, but I can't think of a reason to want that.
You will not bet on just one side, you mean. You already said you'll take both bets because of the guaranteed win. But unless your credence is quite precisely 50%, you could increase your expected value over that status quo (guaranteed $1) by choosing NOT to take one of the bets. If you still take both, or if you now decide to take neither, it seems clear that loss aversion is the reason (unless the amounts are so large that decreasing marginal value has a significant effect).
True, you're sure to make money if you take both bets. But if you think the probability is 51% on odd rather than 50%, you make a better expected value by only taking one side.
Let's reverse this and see if it makes more sense. Say I give you a die that looks normal, but you have no evidence about whether it's fair. Then I offer you a two-sided bet: I'll bet $101 to your $100 that it comes up odd. I'll also offer $101 to your $100 that it comes up even. Assuming that transaction costs are small, you would take both bets, right?
If you had even a small reason to believe that the die was weighted towards even numbers, on the other hand, you would take one of those bets but not the other. So if you take both, you are exhibiting a probability estimate of exactly 50%, even though it is "uncertain" in the sense that it would not to make evidence to move that estimate.
Gasoline is an excellent example of this behavior. It consists of a mixture of many different non-polar hydrocarbons with varying densities, some of which would be gaseous outside of solution. It stays mixed indefinitely (assuming you don't let the volatile parts escape) because separation would require a reduction in entropy.
It seems like there's also an issue with risk aversion. In regular betting markets there are enough bets that you can win some and lose some, and the risks can average out. But if you bet substantially on x-risks, you will get only one low-probability payout. Even if you assume you'll actually get that one (relatively large) payout, the marginal value will be greatly decreased. To avoid that problem, people will only be willing to bet small amounts on x-risks. The people betting against them, though, would be willing to make a variety of large bets (each with low payoff) and thereby carry almost no risk.
I guess where we disagree is in our view of how a simulation would be imperfect. You're envisioning something much closer to a perfect simulation, where slightly incorrect boundary conditions would cause errors to propagate into the region that is perfectly simulated. I consider it more likely that if a simulation has any interference at all (such as rewinding to fix noticeable problems) it will be filled with approximations everywhere. In that case the boundary condition errors aren't so relevant. Whether we see an error would depend mainly on whether there are any (which, like I said, is equivalent to asking whether we are "in" a simulation) and whether we have any mechanism by which to detect them.
If it is the case that we are in a "perfect" simulation, I would consider that no different than being in a non-simulation. The concept of being "in a simulation" is useful only insofar as it predicts some future observation. Given the various multiverses that are likely to exist, any perfect simulation an agent might run is probably just duplicating a naturally-occurring mathematical object which, depending on your definitions, already "exists" in baseline reality.
The key question, then, is not whether some simulation of us ...
Does anybody think this will actually help with existential risk? I suspect the goal of "keeping up" or preventing irrelevance after the onset of AGI is pretty much a lost cause. But maybe if it makes people smarter it will help us solve the control problem in time.
I just tried this out for a project I'm doing at work, and I'm finding it very useful--it forces me to think about possible failure modes explicitly and then come up with specific solutions for them, which I guess I normally avoid doing.
Encrypting/obscuring it does help a little bit, but doesn't eliminate the problem, so it's not just that.
I agree with that... personally I have tried several times to start a private journal, and every time I basically end up failing to write down any important thoughts because I am inhibited by the mental image of how someone else might interpret what I write--even though in fact no one will read it. Subconsciously it seems much more "defensible" to write nothing at all, and therefore effectively leave my thoughts unexamined, than to commit to having thought something that might be socially unacceptable.
I've been trying to understand the differences between TDT, UDT, and FDT, but they are not clearly laid out in any one place. The blog post that went along with the FDT paper sheds a little bit of light on it--it says that FDT is a generalization of UDT intended to capture the shared aspects of several different versions of UDT while leaving out the philosophical assumptions that typically go along with it.
That post also describes the key difference between TDT and UDT by saying that TDT "makes the mistake of conditioning on observations" which ...
It does seem like a past tendency to overbuild things is the main cause. Why are the pyramids still standing five thousand years later? Because the only way they knew to build a giant building back then was to make it essentially a squat mound of solid stone. If you wanted to build a pyramid the same size today you could probably do it for 1/1000 of the cost but it would be hollow and it wouldn't last even 500 years.
Even when cars were new they couldn't be overbuilt the way buildings were in prehistory because they still had to be able to move themselves ...
Agreed. There are plenty of liberal views that reject certain scientific evidence for ideological reasons--I'll refrain from examples to avoid getting too political, but it's not a one-sided issue.
This may be partially what has happened with "science" but in reverse. Liberals used science to defend some of their policies, conservatives started attacking it, and now it has become an applause light for liberals--for example, the "March for Science" I keep hearing about on Facebook. I am concerned about this trend because the increasing politicization of science will likely result in both reduced quality of science (due to bias) and decreased public acceptance of even those scientific results that are not biased.
Interesting piece. It seems like coming up with a good human-checkable way to evaluate parsing is pretty fundamental to the problem. You may have noticed already, but Ozora is the only one that didn't figure out "easily" goes with "parse".
The idea that friendly superintelligence would be massively useful is implicit (and often explicit) in nearly every argument in favor of AI safety efforts, certainly including EY and Bostrom. But you seem to be making the much stronger claim that we should therefore altruistically expend effort to accelerate its development. I am not convinced.
Your argument rests on the proposition that current research on AI is so specific that its contribution toward human-level AI is very small, so small that the modest efforts of EAs (compared to all the massive corpor...
I haven't seen any feminists addressing that particular argument (most are concerned with cultural issues rather than genetic ones) but my initial sense is something like this: a successful feminist society would have 1) education and birth control easily available to all women, and 2) a roughly equal division of the burden of child-rearing between men and women. These changes will remove most of the current incentives that seem likely to cause a lower birth rate among feminists than non-feminists. Of course, it could remain true that feminists tend to be ...
I would argue that the closest real-world analogue is computer hacking. It is a rare ability, but it can bestow a large amount of power on an individual who puts in enough effort and skill. Like magic, it requires almost no help from anyone else. The infrastructure has to be there, but since the infrastructure isn't designed to allow hacking, having the infrastructure doesn't make the ability available to everyone who can pay (like, say, airplanes). If you look at the more fantasy-style sci-fi, science is often treated like magic--one smart scientist can do all sorts of cool stuff on their own. But it's never plausible. With hacking, that romanticization isn't nearly as far from reality.
It seems like the key problem described here is that coalitions of rational people, when they form around scientific propositions, cause the group to become non-scientific out of desire to support the coalition. The example that springs to my mind is climate change, where there is social pressure for scientific-minded people (or even those who just approve of science) to back the rather specific policy of reducing greenhouse gas emissions rather than to probe other aspects of the problem or potential solutions and adaptations.
I wonder if we might solve pro...
Hi Jared, Your question about vegetarianism is an interesting one, and I'll give a couple of responses because I'm not sure exactly what direction you're coming from.
I think there's a strong rationalist argument in favor of limiting consumption of meat, especially red meat, on both health and environmental grounds. These issues get more mixed when you look at moderate consumption of chicken or fish. Fish especially is the best available source of healthy fats, so leaving it out entirely is a big trade-off, and the environmental impact of fishing varies a g...
The attempt to analytically model the recalcitrance of Bayesian inference is an interesting idea, but I'm afraid it leaves out some of the key points. Reasoning is not just repeated applications of Bayes' theorem. If it were, everyone would be equally smart except for processing speed and data availability. Rather, the key element is in coming up with good approximations for P(D|H) when data and memory are severely limited. This skill relies on much more than a fast processor, including things like simple but accurate models of the rest of the world, or kn...
And to elaborate a little bit (based on my own understanding, not what they told me) their RSP sort of says the opposite. To avoid a "race to the bottom" they base the decision to deploy a model on what harm it can cause, regardless of what models other companies have released. So if someone else releases a model with potentially dangerous capabilities, Anthropic can't/won't use that as cover to release something similar that they wouldn't have released otherwise. I'm not certain whether this is the best approach, but I do think it's coherent.