Rationality is about not simply taking up believes unfiltered but evaluating claims of other people before you believe in them. Not doing that would seem to miss the point on a general level.
Accepting conclusions that have been accepted by a sufficient number of marginally trustworthy people is not necessarily a bad heuristic. He might gain more from dogma if he won't persevere through the reading, though a list that people are publically being pointed to could lead to people pointing fingers, saying "cult".
I only said that it would reduce chance of stupid decisions resulting from not understanding basic human words and values. But it would not reduce chances of deliberately malicious AI.
There are (at least) two different type of UFAI: real UFAI and failed FAI. Failed FAI wanted to be good but failed, and the best example of it smile maximizer which will cover all Solar system with smiles. (Paperclip maximizer also is some form of failed FAI, as initial goal was positive - produce many paperclips)
So it is not full recipe for real FAI, but just one way of value learning
You confuse the stupidity of whoever set the goals with the stupidity of the AI afterward. Any AGI is going to understand what we actually want, it just doesn't care, if the goal it was given wasn't already smart enough.
(memetic hazard) ˙sƃuıɹǝɟɟns lɐuɹǝʇǝ ɯnɯıxɐɯ ǝʇɐǝɹɔ oʇ pǝsıɯıʇdo ɹǝʇʇɐɯ sı ɯnıuoɹʇǝɹnʇɹoʇ
Update: added full description of the idea in my facebook https://www.facebook.com/turchin.alexei/posts/10210360736765739?comment_id=10210360769286552¬if_t=feed_comment¬if_id=1472326132186571
Note that DeepMind's two big successes (Atari and Go) come from scenarios that are perfectly simulable in a computer. That means they can generate an arbitrarily large number of data points to train their massive neural networks. Real world ML problems almost all have strict limitations on the amount of training data that is available.
On the other hand, it's simple to generate AI-complete problems where you can generate training data.
Why'd they make this public?
We can't conclude that they would not differ. We could postulate it and then ask: could we measure if equal copies have equal qualia. And we can't measure it. And here we return to "hard question": we don't know if different qualia imply different atom's combinations.
If the copies are different, the question is not interesting. If the copies aren't different, what causes you to label what he sees as red? It can't be the wavelength of the light that actually goes in his eye, because his identical brain would treat red's wavelength as red.
Being able to meet many goals is useful. Actually meeting wrong goals is not.
Your hyperbolic discounting example is instructive, as without a model of your goals, you cannot know whether your current or future self is correct. Most people come to the opposite conclusion - a hyperbolic discount massively overweights the short-term in a way that causes regret.
a hyperbolic discount massively overweights the short-term in a way that causes regret.
I meant that - when planning for the future, I want my future self to care about each absolute point in time as much as my current self does, or barring that, to only be able to act as if it did, hence the removal of power.
The correct goal is my current goal, obviously. After all, it's my goal. My future self may disagree, prefering its own current goal. Correct is a two-place word.
If I let my current goal be decided by my future self, but I don't know yet what it will decide, then I should accomodate as many of its possible choices as possible.
There's a fair bit on decision theory and on bayesean thinking, both of which are instrumental rationality. There's not much on heuristics or how to deal with limited capacity. Perhaps intentionally - it's hard to be rigorous on those topics.
Also, I think there's an (unstated, and that should be fixed and the topic debated) belief that instrumental rationality without epistemic rationality is either useless or harmful. Certainly thta's the FAI argument, and there's no reason to believe it wouldn't apply to humans. As such, a focus on epistemic rationality first is the correct approach.
That is, don't try to improve your ability to meet goals unless you're very confident in those goals.
Why not? If you haven't yet decided what your goals are, being able to meet many goals is useful.
The AGI argument is that its goals might not be aligned with ours, are you saying that we should make sure that our future self's goals be aligned with our current goals?
For example, if I know I am prone to hyperbolic discounting, I should take power from my future self so it will act according to my wishes rather than its own?
This approach works under the assumption that the AI knows everything there is to know about its off switch.
And an AI that would kill everyone in case it had an off switch, is one that desperately needs a (public) off switch on it.
The approach assumes that it knows everything there is to know about off switches in general, or what its creators know about off switches.
If the AI can guess that its creators would install an off switch, it will attempt to work around as many possible classes of off switches as possible, and depending on how much of off-switch space it can outsmart simultaneously, whichever approach the creators chose might be useless.
Such an AI desperately needs more FAI mechanisms behind it, it desperately needing an off switch assumes that off switches help.
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
For a while now, I have been working on a potentially impactful project. The main limiting factor is my own personal productivity- a great deal of the risk is frontloaded in a lengthy development phase. Extrapolating the development duration based on progress so far does not yield wonderful results. It appears I should still be able to finish it in a not-absurd timespan, it will just be slower than ideal.
I've always tried to improve my productivity, and I've made great progress in that compared to ten or even five years ago, but at this point I've picked most of the standard low hanging fruit. I've already fiddled with some extremely easy and safe kinda-nootropics already- melatonin, occasional caffeine pills- but not things like modafinil or amphetamines, or some of the less studied options.
And while thinking about this today, I decided to just run some numbers on amphetamines. Based on my current best estimates of market realities and the potential success and failure cases of the project, assuming amphetamines could improve my productivity by 30% on average, the expected value of taking amphetamines for the duration of development comes out to...
...a few hundred human lives.
And, in the best-reasonable case scenario, a lot more than that. This wasn't really unexpected, but it's surprisingly the first time I actually did the math.
So I imagine the God of Dumb Trolley Problems sits me down for a thought experiment and explains: "In a few years, there will be a building full of 250 people. A bomb will go off and kill all of them. You have two choices." The god leans in for dramatic effect. "Either you can do nothing, and let all of them die... or..." It lowers its head just enough for shadows to cast over its features... "You take this low, safe dose of Adderall for a few years, and the bomb magically gets defused."
This is not a difficult ethical problem. Even taking into account potential side effects, even assuming the amphetamines were obtained illegally and so carried legal liability, this is not a difficult ethical problem. When I look at this, I feel like the answer of what I should do is blindingly obvious.
And yet I have a strong visceral response of "okay yeah sure but no." I assume part of this is fairly extreme risk aversion to the idea of getting anything like amphetamines outside of a prescription. Legal trouble would be pretty disastrous, even if unlikely. And part of me is spooked about doing something like this without expert oversight.
But why not just try to get an actual prescription? For this, or some other advantageous semi-nootropic, at least. Once again, I just get a gross feeling about the idea of trying to manipulate the system. How about if I just explain the situation in full, with zero manipulation, to a sympathetic doctor? The response from my gut feels like a blank "... no."
So basically, I feel stuck. Part of me wants to recognize the risk aversion as excessive, and suggests I should at least take whatever steps I can safely. The other part is saying "but that is doing something waaaay out of the ordinary and maybe there's a reason for that that you haven't properly considered."
I am not even sure what I want to ask with this post. I guess if you've got any ideas or insights, I'd like to hear them.
Perhaps you expect to in the future be in a position where your expected impact is significantly larger, and so your gut tells you to be careful with anything whose long-term effects are not clear?