I know the above statement might have unfortunate implications in the wrong context, but I would like to see it proven wrong instead of just dismissed, if you think you disagree with it. Do you disagree with the factual accuracy of the statement, or are you disagreeing because of the assumptions you made about my intent?
I didn't downvote, but I don't like your statement. I mostly agree with the biological facts, but you state them as if they apply directly and straightforwardly to the post's question about human affairs. If applied in the most obvious way, they lead to the unfortunate implications, but I don't think that application really makes sense. And I can't help suspecting these apparent implications are a result of motivated stopping.
I think we need to provide some kind of prior regarding unknown features of model and reward if we want the given model and reward to mean anything. Otherwise, for all the AI knows, the true reward has a +2-per-step term that reverses the reward-over-time feature. It can still infer the algorithm generating the sample trajectories, but the known reward is no help at all in doing so.
I think what we want is for the stated reward to function as a hint. One interpretation might be to expect that the stated reward should approximate the true reward well over the problem and solution domains humans have thought about. This works in, for instance, the case where you put an AI in charge of the paper clip factory with the stated reward '+1 per paper clip produced'.
The more advance versions of the fence apply even if the reasons for the fence are unknown or bad (or badly explained).
I'm not sure I've encountered these more advanced versions. is there a link?
Awesome! Stuart Armstrong be our Dungeon Master! :-) I haven't seen you write up your responses to our DM though. I'd like to see them.
I've made a few shots, e.g. at http://lesswrong.com/r/discussion/lw/mfq/presidents_asteroids_natural_categories_and/cjkr and http://lesswrong.com/lw/m25/high_impact_from_low_impact/cah1. There's no explicit role-playing, but I was very much in the mindset of trying to break the protection scheme.
I haven't been keeping up with these posts as well lately.
Founding the NHS, bringing in clear air and water acts, regulating minimum standards for child workers (and then all workers), extending the franchise. All these were done in defiance of precedent and with strong accusations of destroying prosperity.
The creation of the NHS is a good example. Nothing had been done like that before, and most of the predictions (both positive and negative) at the time, were very wrong (for instance, it was predicted that it would reduce medical costs overall!). This strongly implies that nobody really had any idea what was going to happen. And yet it basically worked out; and, in fact, most healthcare systems in developed countries (apart from the USA) seem to average out around the same broad categories of performance and cost, even if they seem to vary considerably in theory, This is evidence that our current systems push both revolutionary innovations and incremental ones, in the vague direction of decent performance,
On another side, many technological innovations completely destroy Chesterton fences existing in society. The whole idea of centralising and sharing knowledge across all different types of communities is something that there were a lot of fences to block; yet it seems to have kinda worked.
But the proper argument would require much more examples, and much defining of what a Chesterton Fence is.
For the Chesterton's Fence objection to properly have applied to the NHS, it would have had to have been the case that no one could explain the historical lack of NHS. Yet I think it's pretty easily explained by governments' values over time: first kleptocratic, then libertarianish, and only becoming utilitarian roughly around the time of the NHS, to simplify heavily.
Exactly how established is the track record of taking down fences without an understanding of why they were put up? A great many of liberalism's target fences over the years have been readily explained by being in the interests of the powerful (e.g. monarchy/aristocracy, slavery).
I'm not sure where to post an idea for AI control research so I do it here. It somehow spun off from your post, the recent treacherous turn post and LW slack discussions.
That is the idea: Could we gameify AI safety research? The approach would be to create a setting where the players have to obey the AI safety rules and still achieve an objective in the in-game world. This can be a simulated virtual world in a computer game or a role playing world. To get sufficient motivation the in-game world would e.g. consist of a population of evil (to a typical human player) beings that interact and your most likely purpose is to make them do things you want (as in many other computer games too). Try to squeeze out as much resources as you can. While still obeying the rules. The game would progress from simple AI control rules like Asimovs robot laws to more advanced AI control rules. And find out whether people can hack these. If people can an AI probably can too.
That's essentially what these posts are to me, except instead of a video game it's pen-and-paper with Stuart Armstrong as DM :).
It might be worth the extra motivation of writing up a framing with evil AI designers applying the proposed controls. I'll consider doing this on future posts.
The desire to look at calibration rather than prediction-score comes from the fact that calibration at least kind of seems like something you could fairly compare across different prediction sets. Comparing Scott's 2015 vs. 2014 prediction scores might just reflect which year had more predictable events. In theory it's also possible that one year's uncertainties are objectively harder to calibrate, but this seems less likely.
The best procedure is probably to just make a good-faith effort to choose predictions based on interest and predict as though maximizing prediction score. If one wanted to properly align incentives, one might try the following procedure: 1) Announce a set of things to predict, but not the predictions themselves 2) Have another party pledge to reward you (with cash or charity donation, probably) in proportion to your prediction score*, with a multiplier based on how hard they think your prediction topics are. 3) Make your predictions.
- There's a bit of a hurdle in that the domain is negative infinity to zero. One solution would be to set a maximum allowed confidence to make the range finite--for instance, if 99% is the maximum, the worst possible score would be ln(0.01) =~ -4.6, so a reward of (4.6 + score) would produce the right incentives.
If we're doing the virtue ethical banning, then as long as we agree that the people in question deserved a ban, the specific reasons given for the ban aren't very important. The moderator may be reacting to a pattern that's clearly ban-worthy, but nonetheless hard to verbalize exactly, and thus misreport their real reason. Verbal reporting is hard.
The moderator may be reacting to a pattern that's clearly ban-worthy, but nonetheless hard to verbalize exactly, and thus misreport their real reason. Verbal reporting is hard.
This. If I read the ban announcement legalistically, I disagree with it. But if I read the offending post, together with multiple users' assurances that AA's posts were basically all like that--I don't want that in my garden.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
As someone with no knowledge of NNTP, I'm in favor of this sequence. As far as I'm concerned, much looks like on-topic craft/community material.